You are now in the main content area

Improving Interpretability and Reliability of Predictive Modelling: A TOPSIS-Based Comprehensive Measure of Variable Importance

road safety sign on a winding down-hill road

Transportation and insurance systems constantly generate vast amounts of complex data. To understand and predict outcomes in such systems, many factors are potentially useful and used overwhelmingly for the underlying predictive models. As the complexity of the models increases, they become more challenging to interpret and may exhibit reduced accuracy in predicting outcomes, especially when utilizing machine learning-based approaches. Striking a balance between the model's interpretability and prediction accuracy in predictive modelling using machine learning is crucial to ensure reliable predictions for optimally managing and controlling these complex systems. The challenge lies in effectively selecting the most important variables from various available data sources and a myriad of machine-learning methods at our disposal.

This research paper proposes a novel approach to address this challenge. A comprehensive variable importance measure that helps select the most relevant variables for constructing predictive models was developed. The method frames the variable selection process as a multi-criteria decision analysis (MCDA) problem. It leverages the TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution) method to solve this MCDA problem. By doing so, we achieve a systematic way of selecting variables, creating more interpretable predictive models.

The impact of the proposed approach is significant as it offers a powerful solution to the problem of variable selection in high-dimensional data. Identifying the most crucial variables efficiently allows us to build more accurate and understandable predictive models. This enhanced interpretability empowers decision-makers to make informed choices and take appropriate actions for managing complex systems effectively. To validate the effectiveness and robustness of the proposed approach, we conducted extensive simulations with known model characteristics and tested it on different datasets with varying noise scenarios. This rigorous testing ensures that the method can handle real-world data challenges and provides reliable results. We also applied it to the variable selection problem in the national collision database as a practical demonstration of the effectiveness of the approach. By identifying the most significant variables that influence fatal accident rates, this study contributes to improving road safety measures and reducing accidents.

In conclusion, this paper introduces a ground-breaking approach to variable selection, helping strike a balance between model interpretability and prediction accuracy. The method's applications in transportation and insurance systems promise to enhance decision-making processes and overall control and management of complex systems, leading to safer and more efficient operations. To learn more, see the full article:

Shengkun Xie & Jin Zhang (2023). TOPSIS-based comprehensive measure of variable importance in predictive modelling (external link) . Expert Systems with Applications, Volume 232, 120682. DOI: 10.1016/j.eswa.2023.120682