Bi-partitioned feature weighted k-means clustering for detecting insurance fraud claim patterns

Fraudulent insurance claims are a growing concern in Canada and worldwide, with significant financial consequences for both insurers and honest policyholders. In 2022 alone, Canadians lost over $500 million to insurance fraud. Detecting such fraud is challenging due to the complexity and high dimensionality of insurance data. Traditional algorithms like K-means clustering struggle to handle such complex data effectively, especially when different features contribute unequally to fraudulent patterns.

The study introduces a novel algorithm called Bi-Partitioned Feature-Weighted K-Means (BPW K-means). Unlike traditional K-means, this method splits the dataset into two parts based on feature importance and applies different weights to each part. This improves the clustering accuracy and helps uncover meaningful fraud-related patterns. Key findings include:

The BPW K-means algorithm achieved up to 91% clustering accuracy, a 38% improvement over classical K-means, for the data that we considered.
It works well on real-world insurance datasets and performs consistently across other benchmark datasets.
Feature ranking, such as using Random Forest, helps the model identify which variables matter most in detecting fraud.

The proposed method provides insurance companies and fraud investigators with a powerful unsupervised tool to group and identify suspicious claims, even without knowing in advance which ones are fraudulent. This can lead to earlier detection, more targeted investigations, and potentially reduced costs and fairer premiums for honest policyholders. Beyond insurance, the method can be adapted to other industries facing similar data challenges.

This research offers a significant step forward in fighting insurance fraud by improving how machine learning identifies hidden patterns in complex data. By prioritizing important features and allowing flexibility in clustering, the BPW K-means method offers more accurate, interpretable, and practical results than traditional approaches. Future research may explore combining this approach with deep learning or applying it to other types of fraud beyond insurance.Combert, F., Xie, S., Lawniczak, A. (2025). Bi-Partitioned Feature Weighted K-Means Clustering for Detecting Insurance Fraud Claim Patterns (external link, opens in new window) . Mathematics, 13(3), 434. DOI: 10.3390/math13030434

Bi-partitioned feature weighted k-means clustering for detecting insurance fraud claim patterns

Research Resources

Contact Information