You are now in the main content area

Handling Highly Imbalanced Data for Classifying Fatality of Auto Collisions Using Machine Learning Techniques

red and yellow car collision

Overview: Car accidents are a significant public health issue, ranking among the leading causes of death and disability globally. Predicting fatal car accidents is crucial for effective health resource management and road safety. This research focuses on developing and refining predictive models to forecast fatal accidents, aiming to improve resource allocation, reduce fatalities, and enhance overall road safety.

Importance of Prediction: Health organizations can allocate resources more effectively by predicting where and when fatal accidents are likely to occur. For example, if a model predicts a high fatality rate in a specific region, authorities can target that area with road safety campaigns, infrastructure improvements, and enhanced emergency medical services. Accurate predictions also help in planning for medical and rehabilitation services for accident victims, ensuring the necessary resources are available.

Challenges in Predictive Modeling: One of the key challenges in developing these predictive models is the issue of data imbalance. In accident data, fatal cases are rare compared to non-fatal ones, which can cause traditional machine learning models to be biased towards the more common non-fatal outcomes. This imbalance makes it difficult for models to accurately predict fatal accidents, as they tend to overlook the minority class (fatal cases) in favor of the majority (non-fatal cases).

Innovative Approach: This research introduces novel methodologies to address the issue of data imbalance in predictive modeling. It systematically explores various machine learning algorithms and imbalanced data handling techniques to improve the accuracy of predictions. The study leverages data from the National Collision Database of Canada and uses experimental datasets to test different approaches, including logistic regression, support vector machines, decision trees, Random Forest, and XGBoost.

Impact and Applications: The findings from this study have significant implications for road safety and health management. By improving the accuracy of fatality predictions, the research supports better decision-making in resource allocation and accident prevention strategies. Furthermore, the methodologies developed in this study can be applied to other fields where data imbalance is a challenge, such as fraud detection, medical diagnosis, and customer churn prediction. This research contributes to the broader field of management analytics by integrating advanced data handling techniques with machine learning, offering valuable insights for enhancing predictive capabilities across various industries. Xie, S., & Zhang, J. (2024). Handling highly imbalanced data for classifying fatality of auto collisions using machine learning techniques (external link) . Journal of Management Analytics, 1–41.