1. Introduction
Road safety is one of the most critical challenges facing modern society, significantly impacting public health, the economy, and the overall quality of life [
1]. Each year in Poland, thousands of traffic incidents result in fatalities and severe injuries among road users. Reports from the Polish National Police Headquarters and other safety analysis institutions indicate that despite preventive measures, the number of such incidents remains alarmingly high [
2,
3]. In this context, implementing advanced solutions to enhance the road safety analysis and management process is essential [
4]. One of the most rapidly evolving technological tools in this domain is machine learning (ML), which can play a pivotal role. ML enables the analysis of large and complex datasets that traditional methods often struggle to process efficiently [
5,
6]. In the context of road safety, ML can identify patterns and correlations between factors contributing to traffic incidents and even predict potential risks and their consequences, such as injuries or fatalities. Globally, ML applications in road safety include developing intelligent traffic management systems, identifying high-risk road segments, analyzing driver behavior, and forecasting infrastructure failures [
7,
8,
9].
In Poland, the use of ML in road safety is particularly relevant due to the rapid expansion of the transportation network and ongoing urbanization [
10,
11]. Additionally, the structure of Poland’s road network, dominated by local roads where the risk of incidents is significantly higher than on expressways, underscores the importance of innovative approaches [
12]. Currently, most road safety research in Poland focuses on the causes of road incidents and their consequences for participants [
13,
14]. The authors of these publications conduct analyses of the injured and casualties by year, while making comparisons against other EU Member States and pointing out changing trends [
15,
16,
17]. The causes of collisions and accidents in road traffic are equally often addressed [
18]. The relationship between weather conditions and the number of incidents is mainly analyzed [
19,
20]. Other studies emphasize that in addition to the weather, driver behavior and traffic volume, which determines, among other things, the increase in the total number of vehicles on the road, have an impact on the increase in incidents [
21]. In addition, the number of accidents has been shown to depend on the time of day [
22]. Due to the multitude of factors that determine the occurrence of accidents in Poland, a methodology for optimizing them is proposed [
23]. Publications often point out potential solutions and areas for improvement to enhance the safety of road users [
24,
25]. Advanced Driver Assistance Systems (ADAS) and the development of autonomous technologies are frequently highlighted as promising approaches to reducing collisions and accidents [
26,
27]. An extremely popular issue in recent years has been the “Vision Zero” strategy of using all the available methods and means to almost completely minimize injuries and casualties [
28]. It is pointed out that complementary to the achievement of this goal can be road infrastructure development, legislative action by the authorities, international cooperation, and the implementation of innovative technologies in vehicles, among others [
29,
30].
Mathematical modeling is a widely used tool in road safety research, particularly for evaluating current safety levels and developing effective solutions [
31,
32]. It is also applied to assess the reliability of transportation systems [
33,
34,
35]. Recently, the significance of machine learning (ML) tools in this area has grown substantially, as evidenced by numerous applications [
36]. However, fully harnessing the potential of ML algorithms requires a unified system for collecting and sharing data on road incidents and traffic volumes [
37]. Establishing such a system should be a priority for policymakers aiming to improve road safety. ML can be used to estimate the probability of traffic incidents, as well as to assess risks in specific locations and conditions [
38,
39]. Predictions are typically based on historical data, such as collision or accident counts, participant injury levels, weather conditions, and traffic volumes [
40]. These analyses are crucial for identifying high-risk areas, including hazardous road segments [
41,
42]. This insight is invaluable for guiding preventive measures, such as redesigning infrastructure to improve safety. Another ML application in road safety involves systems equipped with cameras and sensors that automatically detect incidents, such as vehicle breakdowns, congestions, collisions, or infrastructure damage [
43,
44]. By integrating ML algorithms, these systems can process large datasets efficiently, enabling faster responses to road incidents and, ultimately, improving safety outcomes [
45]. ML is also employed to monitor and analyze driver behaviors, including acceleration, braking, and reactions to obstacles [
46]. Algorithms can identify risky behavior patterns, such as driving while fatigued or distracted [
47], which supports the continuous development of Advanced Driver Assistance Systems (ADAS) [
48]. ML also plays a vital role in optimizing traffic flow, reducing the risk of collisions and accidents caused by congestion, sudden stops, or uneven traffic distribution [
49]. Studies highlight the importance of ML in alleviating congestion, dynamically managing traffic signals, and even planning detours when routes become impassable [
50,
51]. Research also demonstrates the potential of ML to predict the need for transport infrastructure repairs by analyzing data on its condition and environmental factors [
9,
52]. This approach not only aids in making informed decisions about road and structural maintenance investments but also reduces the risk of traffic incidents caused by poor road conditions or excessive wear to infrastructure, such as bridge structures [
53,
54].
However, less attention has been paid in the literature to developing mathematical models based on ML algorithms that identify traffic incidents in Poland and predict their consequences for participants. This gap has become the focus of this article. This publication aims to develop mathematical models based on ML algorithms to describe road safety in Poland. This study is based on two key research questions:
To address these questions, this study first selected the most influential variables to reduce the data dimensionality and model size. This was followed by mathematical modeling of road safety in Poland using selected machine learning algorithms. The results not only identified factors that significantly affect the severity of injuries or the number of fatalities in accidents but, above all, also demonstrated the ability of ML-based mathematical models to predict threats and their consequences.
This article is structured into several sections. The introduction presents a general outline of this study and provides a literature review, highlighting publications that examine the causes and consequences of road incidents and propose solutions to mitigate them. This section also outlines potential applications of ML in road safety research. The second section describes the materials used and the methods applied in this study. In the third section, the variables with the strongest impact on road safety are extracted and analyzed in detail. The fourth section focuses on mathematical modeling using selected ML algorithms and evaluates the quality of the proposed models. Then, the results obtained using all the considered models are discussed. Finally, this paper presents its conclusions, taking into account the limitations and directions of future research.
4. Mathematical Modeling of Road Safety in Poland
This section of the present study introduces three machine learning models: KNN, RF classification with cross-validation, and RPart classification with cross-validation. The dataset was split into a training set and a test set, and a cross-validation controller was prepared. Here, 85% of the data were allocated to the training set, while the remaining 15% were allocated to the test set. To ensure consistency and repeatability during cross-validation, the data were divided into 10 subsets (10-fold cross-validation), with each fold containing training data for a given split. Each fold was used once as a validation set and the other folds as a training set.
4.1. K-Nearest Neighbors (KNN)
K-Nearest Neighbors (KNN) is one of the simplest machine learning algorithms, primarily used for classification tasks. The parameter
k (the number of neighbors) significantly influences the model’s performance. The KNN model was evaluated for different values of the parameter
k. The results are presented in
Table 5.
The accuracy was used to select the optimal model. The proportion of correct predictions relative to all the samples reached the highest accuracy of 64.82% for
k = 9. The Kappa value, a metric that accounts for chance agreement, was also the highest. The value Kappa = 0.4261 indicates moderate agreement between the predicted and actual classes. Thus, for the KNN model,
k = 9 was assumed. The confusion matrix for this model is presented in
Table 6.
For the KNN model, a very low p-value for the McNemar test was obtained (χ2 < 2.2 × 10−16). This result shows that the difference between the number of misclassifications for one class and the misclassifications for the other class is statistically significant. This means that the model makes classification errors in an asymmetric manner. The model demonstrates strong performance in identifying observations for the “no injuries” class, achieving a high sensitivity of 88.9% and a relatively high specificity of 73.62%. The model performs well with this class and is less likely to assign other classes to “no injuries”–the balanced accuracy is 81.26%. For the “minor injuries” class, the sensitivity is 62.72%, and the specificity is 74.49%. The positive predictive value (PPV) of 57.38% indicates that the model frequently misclassifies other classes as “minor injuries”, leading to the balanced accuracy of 68.61%. The model struggles significantly with the “severe injuries” class. The sensitivity is notably low at 14.83%, although the specificity is high at 95.25%, meaning the model rarely misclassifies other classes as “severe injuries”. The balanced accuracy of 55.04% indicates poor performance in this class. The most critical issue lies in the “deaths” class, where the model essentially fails to detect it. The sensitivity is extremely low at 6.23%, although the specificity is very high at 99.34%, indicating that the model almost never misclassifies other classes as “deaths”. The balanced accuracy of 52.79% further underscores the model’s challenges in accurately identifying this class.
4.2. RF Classification with Cross-Validation
Another model is Random Forest with cross-validation. First, the
mtry parameter, which defines the number of randomly selected predictors considered at each node split in the random forest trees, was selected. Various
mtry values were tested, and the results are shown in
Table 7.
The highest accuracy and Kappa values were achieved for the parameter
mtry = 44, and this value was subsequently used in the model. The confusion matrix for Random Forest with cross-validation is presented in
Table 8.
For this model, as for KNN, the value of McNemar’s test statistic was very low (χ2 < 2.2 × 10−16). Random Forest also makes classification errors in an asymmetric manner. The number of cases in which the model incorrectly classified an example as one of the classes differs significantly from the number of cases in which the model incorrectly classified an example as another class. The accuracy of this model is 63.02%, indicating moderate performance, similar to the previous model, KNN. Similarly, the model performs well for the “no injuries” and “minor injuries” classes but struggles significantly with the less frequent “severe injuries” and “deaths” classes.
The results for each class are as follows:
- 1.
“No injuries”:
Sensitivity: 84.95%—the model effectively identifies samples belonging to this class;
Specificity: 79.02%—the model less frequently misclassifies other classes as “no injuries”;
Balanced accuracy: 81.98%—demonstrates strong performance for this class.
- 2.
“Minor injuries”:
Sensitivity: 59.03%—moderate ability to detect this class;
Specificity: 75.03%—the model performs better at excluding other classes;
Balanced accuracy: 67.03%.
- 3.
“Severe injuries”:
Sensitivity: 24.90%—the model struggles to correctly identify this class;
Specificity: 90.87%—the majority of other classes are not misclassified as “severe injuries”;
Balanced accuracy: 57.88%.
- 4.
“Deaths”:
Sensitivity: 12.17%—very low sensitivity indicates that the model hardly detects this class;
Specificity: 98.44%—the model rarely misclassifies other classes as “deaths”;
Balanced accuracy: 55.30%.
4.3. RPart Classification with Cross-Validation
The final proposed model is RPart classification with cross-validation. In this model, the present study began by selecting the
cp complexity parameter, which controls the level of pruning in the decision tree. The higher the value of
cp, the more simplified the tree, which reduces the risk of overlearning but can lead to underlearning. Various
cp values were tested, with the results summarized in
Table 9.
The optimal
cp value, providing the highest accuracy and Kappa, was 0.0396, and therefore, this value was adopted for the model. The confusion matrix for the RPart classification with cross-validation is presented in
Table 10.
For RPart, the value of McNemar’s test statistic was very low (χ2 < 2.2 × 10−16), which indicates that similar to KNN and RF, this model makes classification errors in an asymmetric manner. The accuracy of this model is 62.86%, meaning it is moderate. Similar to the KNN and RF models, RPart performs well for certain classes but struggles to equally identify all of them.
The results for each class are as follows:
- 1.
“No injuries”:
Sensitivity: 97.13%—the model excels in identifying this class, likely due to its high representation in the dataset;
Specificity: 56.18%—the model struggles to distinguish this class from others, particularly “minor injuries”;
Balanced accuracy: 76.66%—while the majority of predictions for this class are correct, the model frequently misclassifies other classes as “no injuries”.
- 2.
“Minor injuries”:
Sensitivity: 56.11%—the model correctly identifies slightly more than half of the cases in this class;
Specificity: 80.31%—the model is effective at recognizing when observations do not belong to “minor injuries” class;
Balanced accuracy: 68.21%—most of the instances assigned to this class are correct.
- 3.
“Severe injuries”:
Sensitivity: 0%—the model completely fails to identify this class, with no cases correctly classified as “severe injuries”;
Specificity: 100%—no observations are misclassified into this class, but at the cost of completely ignoring this class;
Balanced accuracy: 50.00%.
- 4.
“Deaths”:
Sensitivity: 0%—similar to the “severe injuries” class, the model does not identify any instances of this class;
Specificity: 100%—the model never misclassifies other observations as “death”;
Balanced accuracy: 50.00%.
5. Discussion of Results
Research on road safety in Poland reveals that despite preventive measures, the number of accidents and their consequences remain high. To address this issue, this study applied a machine learning (ML) approach, which enabled the analysis of large datasets to identify key factors influencing road safety and develop mathematical models for predicting traffic incidents and their consequences for participants. The key variables identified as having the strongest impact on road safety in Poland include the type of vehicle, driver behavior, voivodeship, type of incident, type of the participant, and day of the week. Other significant factors include the location of the incident, speed, lighting conditions, pedestrian behavior, and sex. All these variables were incorporated into the mathematical modeling process.
This study’s goal of developing mathematical models based on ML algorithms to describe road safety in Poland was successfully achieved. Three proposed models—k-Nearest Neighbors (KNN), Random Forest (RF), and Recursive Partitioning and Regression Trees (RPart)—produced similar results:
- 1.
k-Nearest Neighbors (KNN):
Accuracy—64.2%;
Kappa—0.4162;
Sensitivity “no injuries”—88.90%;
Sensitivity “minor injuries”—62.72%;
Sensitivity “severe injuries”—14.83%;
Sensitivity “deaths”—6.23%.
- 2.
RF classification with cross-validation:
Accuracy—63.02%;
Kappa—0.414;
Sensitivity “no injuries”—84.95%;
Sensitivity “minor injuries”—59.03%;
Sensitivity “severe injuries”—24.90%;
Sensitivity “deaths”—12.17%.
- 3.
RPart classification with cross-validation:
Accuracy—62.86%;
Kappa—0.3664;
Sensitivity “no injuries”—97.13%;
Sensitivity “minor injuries”—56.11%;
Sensitivity “severe injuries”—0%;
Sensitivity “deaths”—0%.
Moreover, the p-value for the McNemar test was very low for each of the models (χ2 < 2.2 × 10−16). This result shows that the difference between the number of misclassifications for one class and the number of misclassifications for the other class is statistically significant. The number of cases in which the model incorrectly classified an example as one of the classes differs significantly from the number of cases in which the model incorrectly classified an example as another class. Therefore, the models make classification errors in an asymmetric manner. This is probably due to the fact that the proposed models may be more “biased” toward one class, more often confusing it with the other, which suggests a problem with the data balance and confirms the uneven distribution of classes in the dataset. Among the models, k-Nearest Neighbors achieved the highest sensitivity, although the difference compared to Random Forest and RPart was small. KNN performed better at classifying the “no injuries” class and demonstrated higher specificity for the “deaths” class. The Kappa coefficient for KNN and RF indicates moderate agreement (Kappa ≈ 0.41), while RPart achieved a slightly lower score (Kappa = 0.383), suggesting its classifications are somewhat more random. Overall, RPart delivered the weakest results, particularly for rare classes like “severe injuries” and “deaths,” where it failed to recognize any cases. However, all the models faced challenges in classifying rare classes, highlighting the need for alternative methods to analyze such datasets. This challenge will be the focus of further research by the authors.
6. Conclusions
In summary, it can be concluded that Random Forest (RF) is the best choice for handling imbalanced datasets, especially when identifying rare classes like “severe injuries” and “deaths” is the primary goal. For situations where achieving the highest overall accuracy is critical, k-Nearest Neighbors (KNN) provides a reasonable compromise. Meanwhile, RPart can serve as a fast baseline model but requires improvements to effectively manage rare classes.
This research demonstrates that ML models have the potential to predict traffic hazards and their consequences. In the long term, these models could become valuable tools for managing road safety. However, there are still limitations, mainly related to the availability, collection and sharing of data, which significantly affect the performance of the proposed models. First of all, many minor collisions are not reported to the police, which means that the statistics may not include all events. Moreover, the data currently provided do not contain details on the traffic volume, weather conditions, quality of infrastructure or technical condition of vehicles. There is no doubt that factors such as atmospheric precipitation or parameters such as the road width, lane width or road surface condition may affect the number of road traffic incidents and their severity for participants. Another issue is the fact that the available statistics are based mainly on reports and notifications, which not only prevents full consideration of all aspects but may also cause inconsistencies related to data collection and even errors resulting from the human factor, e.g., incorrect description of the circumstances of the incident. This only shows that a transformation is necessary toward the use of advanced tools for monitoring road traffic incidents. Moreover, there is a lack of integration of available police statistics with other sources. Fully harnessing the potential of ML algorithms requires a unified system for collecting and sharing data on road incidents, traffic volumes and even weather conditions. Establishing such a system should be a priority for Polish policymakers aiming to improve road safety.
Taking into account the obtained results, future research on the use of machine learning tools in the context of road safety in Poland should focus on developing models that, thanks to detailed and consistent data, will be able to predict road incidents and their consequences depending on the weather conditions, the traffic intensity or the condition and quality of the road infrastructure. Such ML models could not only help identify specific weather conditions conducive to accidents but also assess the importance of the condition of the road surface or geometry in the context of participant safety. Another reasonable direction for future research may be the development of spatial and temporal models based on ML that allow prediction of the place and time of increased risk of accidents. Integration with geographic data (GIS) could identify spatial patterns regarding the probability of road traffic events, along with their potential consequences. Ultimately, from a broader perspective, ML models could positively impact the overall road traffic safety in Poland by finding applications in designing safer road infrastructure, optimizing traffic management, and implementing changes to regulations, such as speed limits.