Can Historical Accident Data Improve Sustainable Urban Traffic Safety? A Predictive Modeling Study
Abstract
1. Introduction
| References | Model | Features | 
|---|---|---|
| [25] | Hurdle Beta Model, Bivariate Probit Model, XGBoost | Visual surroundings (sky, buildings, road surface), Weather conditions, Driver heterogeneity, Traffic density. | 
| [35] | Sparse Spatio-Temporal Dynamic Hypergraph Learning (SST-DHL) | Multi-view spatio-temporal convolution, Cross-regional dynamic hypergraph learning, Two-supervised self-learning paradigm. | 
| [43] | Multigroup Structural Equation Modeling (SEM) | Traffic conflicts propensity, Utilization of CV alerts, Weather conditions, Psychological factors (aggressiveness and unawareness), Driving behaviors. | 
| [26] | Bayesian Random Slope Model | Electric vehicle variable, Restraint system use, Air bag deployed, Ejection and rollover. | 
| [27] | Structural Equation Modeling (SEM) | Safety knowledge, descriptive norms, injunctive norms, perceived behavioral control, and risk perception. | 
| [28] | Latent Class Discrete Outcomes Model (LCDOM) | Environmental characteristics. | 
| [29] | Multiple linear regression model | Socio-demographic characteristics, substance use, driving behavior, peer norms, and psychological traits. | 
| [36] | Automated Machine Learning (AutoML) and SHAP values | Visual risk factors, traffic, and land use factors. | 
| [37] | The mixed logit model | Driver characteristics (such as age, turn signals, backup signals, etc.), vehicle characteristics (such as lack of airbags, vehicle malfunctions, etc.), accident characteristics (such as single-vehicle accidents, multi-vehicle accidents, frontal collisions, etc.), road conditions (such as full control access, lane width, etc.), and environmental and temporal features (such as snowfall weather, morning hours, etc.). | 
| [38] | Geographically weighted Poisson regression (GWPR) | Daily bicycle traffic, general road network length, percentage of commercial and industrial areas, percentage of recreational areas, number of apartments, and lengths of four types of bicycle lanes. | 
| [39] | XGBoost, SHAP | Traffic volume, commercial land use, roadways, pedestrians, and vehicular elements. | 
| [40] | Decision tree | Rider type, gender, age, and traffic environment factors (such as police presence, number of lanes, weather conditions). | 
| [41] | Structural Equation Modeling (SEM) | Driving violations, driving errors, and attentional failures. | 
| [42] | Linear regression and path analysis models | Spatial anxiety, self-regulatory capacity, and risk driving behaviors (including errors, violations, and lapses). | 
- Does the incorporation of historical accident information have a positive impact on the predictive performance of the model?
- What are the interactions between historical accident information and other features, and how do these interactions influence model decisions?
- How does historical accident information influence future accident risk?
2. Data
- Matching accident information with personnel information based on the “Accident ID”.
- Extracting data pertaining to passenger vehicles and removing records with missing values or outliers.
3. Methods
3.1. Research Objective
3.2. Random Forest
- Sample Set Selection: The Random Forest algorithm creates multiple new training sets through bootstrap sampling from the original dataset. For each decision tree, samples are drawn randomly and with replacement from the original training set, such that a sample may be selected multiple times, while some samples may not be selected at all. This process is repeated K times, resulting in K new sample sets {D1, D2, …, DK}.
- Decision Tree Generation: For each decision tree, a subset of features is randomly selected to split nodes. With M features available, during each round of decision tree generation, m features (where m < M) are randomly chosen to form a new feature set, and a decision tree is generated based on this set. This process is repeated K times, resulting in K decision trees. For the k-th tree, the generation process can be represented as:where is the classification outcome of the k-th tree for the sample .
- Decision Tree Model Ensemble: Since the generated decision trees are independent of each other, each tree is considered to have an equal weight. The final classification result is determined by the majority vote of all decision trees. For a new sample, each decision tree classifies it, and the final category is determined based on the classification results of the majority of decision trees. The final classifier is the mode averaging of all decision trees, which can be represented as:If the classification results are in the form of probabilities, then for category the predicted probability of the Random Forest can be represented as:where is the k-th tree’s predicted probability that the sample belongs to category .
3.3. XGBoost
3.4. LightGBM
3.4.1. Gradient-Based One-Side Sampling (GOSS)
- Sort the data instances based on the absolute value of their gradients.
- Retain the top a% of instances with larger gradients (these instances contribute more to information gain).
- Randomly sample b% of the remaining (1 − a)% of instances.
- Introduce a constant multiplier compensation factor − abb1 − a for instances with smaller gradients to adjust their impact. The estimation of information gain can be represented as:where and are the sets of instances with larger gradients on the left and right sides of the split point, respectively, and and are the sets of instances with smaller gradients on the left and right sides of the split point, respectively. and are the number of instances in the left and right nodes, respectively.
3.4.2. Exclusive Feature Bundling (EFB)
- View the feature space as a graph, with features as vertices and edges added between them if two features are not exclusive.
- Use a greedy algorithm to color the graph, bundling features with the same color together, meaning they belong to the same feature bundle.
- Merge the values of bundled features by assigning different bin ranges to each bundle. The construction of feature bundling can be represented as:where is the j-th feature in the feature bundle, and is the offset of that feature within the bundle.
3.5. Binary Logistic Regression
3.6. SHAP
- Initialization: For a dataset with features, initialize the SHAP value vector , where the length of the vector is the same as the number of features.
- Computation Order: The model selects a computation order for the features (usually random or sorted by feature importance).
- Iterative Computation: For each feature ii, in the determined computation order, the feature is sequentially added to the model, and the contribution to the SHAP values is iteratively calculated until all features have been processed. The mathematical representation of the SHAP value for a feature ii is given by:where is the set of all features, is a subset of features not containing feature , is the model’s prediction output on the subset , and is the SHAP value for feature .
3.7. Metrics
3.8. Model Selection
4. Results
4.1. Results of Tree-Based Models
4.2. Results of Logistic Regression Models
5. Discussion
5.1. The Impact of Training Set Proportion on Model Performance
5.2. Feature Importance and SHAP Values
5.2.1. Feature Importance Based on the Classifier
5.2.2. Feature Contribution Based on SHAP Values
5.2.3. Model Decisions
5.2.4. Feature Interaction Effects
5.3. The Impact of Accidents on Future Risk
5.4. Limitations and Future Work
6. Conclusions
- Predictive Power of Historical Accident InformationModel 2, which includes historical accident information, outperforms Model 1 across multiple evaluation metrics, particularly in terms of AUC and AP scores. This demonstrates the significant predictive value of historical accident data for future accident risks.
- Impact of Feature Interaction EffectsThe analysis of feature importance and SHAP values reveals that the interaction effects between historical accident information and other features significantly influence model decisions, thereby enhancing predictive accuracy.
- Differential Impact of Accident Risk MitigationThe findings indicate that, for drivers with 1 to 2 accidents in the past two years, historical accident records have a notable impact on reducing future accident risk. However, for drivers with frequent accidents—those with more than two accidents in the past two years—the mitigating effect is relatively limited.
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- World Health Organization. Global Status Report on Road Safety 2023. Available online: https://www.who.int/teams/social-determinants-of-health/safety-and-mobility/global-status-report-on-road-safety-2023 (accessed on 22 August 2024).
- Mannering, F.; Bhat, C.R.; Shankar, V.; Abdel-Aty, M. Big Data, Traditional Data and the Tradeoffs between Prediction and Causality in Highway-Safety Analysis. Anal. Methods Accid. Res. 2020, 25, 100113. [Google Scholar] [CrossRef]
- Fountas, G.; Sarwar, M.T.; Anastasopoulos, P.C.; Blatt, A.; Majka, K. Analysis of Stationary and Dynamic Factors Affecting Highway Accident Occurrence: A Dynamic Correlated Grouped Random Parameters Binary Logit Approach. Accid. Anal. Prev. 2018, 113, 330–340. [Google Scholar] [CrossRef] [PubMed]
- Zeng, Q.; Gu, W.; Zhang, X.; Wen, H.; Lee, J.; Hao, W. Analyzing Freeway Crash Severity Using a Bayesian Spatial Generalized Ordered Logit Model with Conditional Autoregressive Priors. Accid. Anal. Prev. 2019, 127, 87–95. [Google Scholar] [CrossRef]
- Boggs, A.M.; Wali, B.; Khattak, A.J. Exploratory Analysis of Automated Vehicle Crashes in California: A Text Analytics & Hierarchical Bayesian Heterogeneity-Based Approach. Accid. Anal. Prev. 2020, 135, 105354. [Google Scholar] [CrossRef]
- Tamakloe, R.; Das, S.; Nimako Aidoo, E.; Park, D. Factors Affecting Motorcycle Crash Casualty Severity at Signalized and Non-Signalized Intersections in Ghana: Insights from a Data Mining and Binary Logit Regression Approach. Accid. Anal. Prev. 2022, 165, 106517. [Google Scholar] [CrossRef]
- Waseem, M.; Ahmed, A.; Saeed, T.U. Factors Affecting Motorcyclists’ Injury Severities: An Empirical Assessment Using Random Parameters Logit Model with Heterogeneity in Means and Variances. Accid. Anal. Prev. 2019, 123, 12–19. [Google Scholar] [CrossRef]
- Shi, X.; Wong, Y.D.; Li, M.Z.-F.; Palanisamy, C.; Chai, C. A Feature Learning Approach Based on XGBoost for Driving Assessment and Risk Prediction. Accid. Anal. Prev. 2019, 129, 170–179. [Google Scholar] [CrossRef]
- Parsa, A.B.; Movahedi, A.; Taghipour, H.; Derrible, S.; Mohammadian, A. Toward Safer Highways, Application of XGBoost and SHAP for Real-Time Accident Detection and Feature Analysis. Accid. Anal. Prev. 2020, 136, 105405. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, Y.; Gu, X.; Sze, N.N.; Huang, J. A Proactive Crash Risk Prediction Framework for Lane-Changing Behavior Incorporating Individual Driving Styles. Accid. Anal. Prev. 2023, 188, 107072. [Google Scholar] [CrossRef]
- Guo, Y.; Li, Z.; Wu, Y.; Xu, C. Exploring Unobserved Heterogeneity in Bicyclists’ Red-Light Running Behaviors at Different Crossing Facilities. Accid. Anal. Prev. 2018, 115, 118–127. [Google Scholar] [CrossRef]
- Guo, Y.; Li, Z.; Liu, P.; Wu, Y. Modeling Correlation and Heterogeneity in Crash Rates by Collision Types Using Full Bayesian Random Parameters Multivariate Tobit Model. Accid. Anal. Prev. 2019, 128, 164–174. [Google Scholar] [CrossRef] [PubMed]
- Fu, C.; Sayed, T. Bayesian Dynamic Extreme Value Modeling for Conflict-Based Real-Time Safety Analysis. Anal. Methods Accid. Res. 2022, 34, 100204. [Google Scholar] [CrossRef]
- Rahim, M.A.; Hassan, H.M. A Deep Learning Based Traffic Crash Severity Prediction Framework. Accid. Anal. Prev. 2021, 154, 106090. [Google Scholar] [CrossRef] [PubMed]
- Tang, J.; Liang, J.; Han, C.; Li, Z.; Huang, H. Crash Injury Severity Analysis Using a Two-Layer Stacking Framework. Accid. Anal. Prev. 2019, 122, 226–238. [Google Scholar] [CrossRef]
- Bao, J.; Liu, P.; Ukkusuri, S.V. A Spatiotemporal Deep Learning Approach for Citywide Short-Term Crash Risk Prediction with Multi-Source Data. Accid. Anal. Prev. 2019, 122, 239–254. [Google Scholar] [CrossRef]
- Ali, F.; Ali, A.; Imran, M.; Naqvi, R.A.; Siddiqi, M.H.; Kwak, K.-S. Traffic Accident Detection and Condition Analysis Based on Social Networking Data. Accid. Anal. Prev. 2021, 151, 105973. [Google Scholar] [CrossRef]
- Hu, Z.; Zhou, J.; Huang, K.; Zhang, E. A Data-Driven Approach for Traffic Crash Prediction: A Case Study in Ningbo, China. Int. J. Intell. Transp. Syst. Res. 2022, 20, 508–518. [Google Scholar] [CrossRef]
- Ye, J.; Zhao, J.; Ye, K.; Xu, C. How to Build a Graph-Based Deep Learning Architecture in Traffic Domain: A Survey. IEEE Trans. Intell. Transp. Syst. 2022, 23, 3904–3924. [Google Scholar] [CrossRef]
- Xia, N.; Xie, Q.; Hu, X.; Wang, X.; Meng, H. A Dual Perspective on Risk Perception and Its Effect on Safety Behavior: A Moderated Mediation Model of Safety Motivation, and Supervisor’s and Coworkers’ Safety Climate. Accid. Anal. Prev. 2020, 134, 105350. [Google Scholar] [CrossRef]
- Malin, F.; Norros, I.; Innamaa, S. Accident Risk of Road and Weather Conditions on Different Road Types. Accid. Anal. Prev. 2019, 122, 181–188. [Google Scholar] [CrossRef]
- Singh, H.; Kathuria, A. Analyzing Driver Behavior under Naturalistic Driving Conditions: A Review. Accid. Anal. Prev. 2021, 150, 105908. [Google Scholar] [CrossRef] [PubMed]
- Wu, Y.; Abdel-Aty, M.; Lee, J. Crash Risk Analysis during Fog Conditions Using Real-Time Traffic Data. Accid. Anal. Prev. 2018, 114, 4–11. [Google Scholar] [CrossRef] [PubMed]
- Rod, J.E.; Oviedo-Trespalacios, O.; Senserrick, T.; King, M. Older Adult Pedestrian Trauma: A Systematic Review, Meta-Analysis, and GRADE Assessment of Injury Health Outcomes from an Aggregate Study Sample of 1 Million Pedestrians. Accid. Anal. Prev. 2021, 152, 105970. [Google Scholar] [CrossRef] [PubMed]
- Abdel-Aty, M.; Ugan, J.; Islam, Z. Exploring the Influence of Drivers’ Visual Surroundings on Speeding Behavior. Accid. Anal. Prev. 2024, 198, 107479. [Google Scholar] [CrossRef]
- Yu, Q.; Ma, L.; Yan, X. Modeling Occupant Injury Severities for Electric-Vehicle-Involved Crashes Using a Vehicle-Accident Bi-Layered Correlative Framework with Matched-Pair Sampling. Accid. Anal. Prev. 2024, 199, 107499. [Google Scholar] [CrossRef]
- Qian, Q.; Shi, J. Accustomed or Regulated: Influencing Factors of Two-Wheeler Riders’ Illegal Lane-Transgressing Behavior When Overtaking. Accid. Anal. Prev. 2024, 204, 107648. [Google Scholar] [CrossRef]
- Costa, M.; Lima Azevedo, C.; Siebert, F.W.; Marques, M.; Moura, F. Unraveling the Relation between Cycling Accidents and Built Environment Typologies: Capturing Spatial Heterogeneity Through a Latent Class Discrete Outcome Model. Accid. Anal. Prev. 2024, 200, 107533. [Google Scholar] [CrossRef]
- Huỳnh, C.; Beaulieu-Thibodeau, A.; Fallu, J.-S.; Bergeron, J.; Jacques, A.; Brochu, S. Factors Related to the Low-Risk Perception of Driving After Cannabis Use. Accid. Anal. Prev. 2024, 202, 107584. [Google Scholar] [CrossRef]
- Wang, J.; Xu, W.; Fu, T.; Gong, H.; Shangguan, Q.; Sobhani, A. Modeling Aggressive Driving Behavior Based on Graph Construction. Transp. Res Part C Emerg. Technol. 2022, 138, 103654. [Google Scholar] [CrossRef]
- Yang, T.; Zhang, Y.; Tan, J.; Qiu, T.Z. Research on Forward Collision Warning System Based on Connected Vehicle V2V Communication. In Proceedings of the 2019 5th International Conference on Transportation Information and Safety (ICTIS), Liverpool, UK, 14–17 July 2019; pp. 1174–1181. [Google Scholar]
- Yang, M.; Wang, X.; Quddus, M. Examining Lane Change Gap Acceptance, Duration and Impact Using Naturalistic Driving Data. Transp. Res. Part C Emerg. Technol. 2019, 104, 317–331. [Google Scholar] [CrossRef]
- Wang, C.; Xu, C.; Dai, Y. A Crash Prediction Method Based on Bivariate Extreme Value Theory and Video-Based Vehicle Trajectory Data. Accid. Anal. Prev. 2019, 123, 365–373. [Google Scholar] [CrossRef] [PubMed]
- Han, L.; Yu, R.; Wang, C.; Abdel-Aty, M. Transformer-Based Modeling of Abnormal Driving Events for Freeway Crash Risk Evaluation. Transp. Res. Part C Emerg. Technol. 2024, 165, 104727. [Google Scholar] [CrossRef]
- Cui, P.; Yang, X.; Abdel-Aty, M.; Zhang, J.; Yan, X. Advancing Urban Traffic Accident Forecasting through Sparse Spatio-Temporal Dynamic Learning. Accid. Anal. Prev. 2024, 200, 107564. [Google Scholar] [CrossRef] [PubMed]
- Xue, H.; Guo, P.; Li, Y.; Ma, J. Integrating Visual Factors in Crash Rate Analysis at Intersections: An AutoML and SHAP Approach towards Cycling Safety. Accid. Anal. Prev. 2024, 200, 107544. [Google Scholar] [CrossRef] [PubMed]
- Faisal Habib, M.; Motuba, D.; Huang, Y. Beyond the Surface: Exploring the Temporally Stable Factors Influencing Injury Severities in Large-Truck Crashes Using Mixed Logit Models. Accid. Anal. Prev. 2024, 205, 107650. [Google Scholar] [CrossRef]
- Abbasi, S.; Ko, J. Cycling Safely: Examining the Factors Associated with Bicycle Accidents in Seoul, South Korea. Accid. Anal. Prev. 2024, 206, 107691. [Google Scholar] [CrossRef]
- Yue, H. Investigating the Influence of Streetscape Environmental Characteristics on Pedestrian Crashes at Intersections Using Street View Images and Explainable Machine Learning. Accid. Anal. Prev. 2024, 205, 107693. [Google Scholar] [CrossRef]
- Nguyen-Phuoc, D.Q.; Xuan Mai, N.; Oviedo-Trespalacios, O. Not the Same: How Delivery, Ride-Hailing, and Private Riders’ Roles Influence Safety Behavior. Accid. Anal. Prev. 2024, 208, 107762. [Google Scholar] [CrossRef]
- Austine Taiwo, O.; Asmah Hassan, S.; Bin Mohsin, R.; Mahmud, N. Road Traffic Accidents Involvement among Commercial Taxi Drivers in Nigeria: Structural Equation Modelling Approach. Accid. Anal. Prev. 2024, 208, 107788. [Google Scholar] [CrossRef]
- Traficante, S.; Tinella, L.; Lopez, A.; Koppel, S.; Ricciardi, E.; Napoletano, R.; Spano, G.; Bosco, A.; Caffò, A.O. “Regulating My Anxiety Worsens the Safety of My Driving”: The Synergistic Influence of Spatial Anxiety and Self-Regulation on Driving Behavior. Accid. Anal. Prev. 2024, 208, 107768. [Google Scholar] [CrossRef]
- Alruwaili, A.; Xie, K. Modeling the Influence of Connected Vehicles on Driving Behaviors and Safety Outcomes in Highway Crash Scenarios across Varied Weather Conditions: A Multigroup Structural Equation Modeling Analysis Using a Driving Simulator Experiment. Accid. Anal. Prev. 2024, 199, 107514. [Google Scholar] [CrossRef] [PubMed]
- Ryder, B.; Dahlinger, A.; Gahr, B.; Zundritsch, P.; Wortmann, F.; Fleisch, E. Spatial Prediction of Traffic Accidents with Critical Driving Events—Insights from a Nationwide Field Study. Transp. Res. Part A Policy Pract. 2019, 124, 611–626. [Google Scholar] [CrossRef]
- Fa, H.; Shuai, B.; Yang, Z.; Niu, Y.; Huang, W. Mining the Accident Causes of Railway Dangerous Goods Transportation: A Logistics-DT-TFP Based Approach. Accid. Anal. Prev. 2024, 195, 107421. [Google Scholar] [CrossRef] [PubMed]
- Liao, H.; Li, Y.; Li, Z.; Bian, Z.; Lee, J.; Cui, Z.; Zhang, G.; Xu, C. Real-Time Accident Anticipation for Autonomous Driving through Monocular Depth-Enhanced 3D Modeling. Accid. Anal. Prev. 2024, 207, 107760. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13 August 2016; pp. 785–794. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems 30 (NIPS 2017); Curran Associates Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
- Niu, D.; Sayed, T.; Fu, C.; Mannering, F. A Cross-Comparison of Different Extreme Value Modeling Techniques for Traffic Conflict-Based Crash Risk Estimation. Anal. Methods Accid. Res. 2024, 44, 100352. [Google Scholar] [CrossRef]
- Tahir, H.B.; Yasmin, S.; Haque, M.M. A Poisson Lognormal-Lindley Model for Simultaneous Estimation of Multiple Crash-Types: Application of Multivariate and Pooled Univariate Models. Anal. Methods Accid. Res. 2024, 41, 100315. [Google Scholar] [CrossRef]
- Jeong, H.; Jang, Y.; Bowman, P.J.; Masoud, N. Classification of Motor Vehicle Crash Injury Severity: A Hybrid Approach for Imbalanced Data. Accid. Anal. Prev. 2018, 120, 250–261. [Google Scholar] [CrossRef]
- Prati, G.; Pietrantoni, L.; Fraboni, F. Using Data Mining Techniques to Predict the Severity of Bicycle Crashes. Accid. Anal. Prev. 2017, 101, 44–54. [Google Scholar] [CrossRef]
- Gopinath, V. Traffic Accidents Analysis with Respect to Road Users Using Data Mining Techniques. Int. J. Emerg. Trends Technol. Comput. Sci. 2017, 6, 15–20. [Google Scholar]
- Santos, K.; Dias, J.P.; Amado, C. A Literature Review of Machine Learning Algorithms for Crash Injury Severity Prediction. J. Saf. Res. 2022, 80, 254–269. [Google Scholar] [CrossRef]















| Features | Description | Code | Mean | S.D. | Min | 25% | 50% | 75% | Max | 
|---|---|---|---|---|---|---|---|---|---|
| Vehicle Age | Vehicle usage lifespan | VA | 5.987 | 6.019 | 0 | 3 | 4 | 6 | 41 | 
| Number of Accidents | Number of accidents in 2020 | A0 | 0.254 | 0.614 | 0 | 0 | 0 | 0 | 8 | 
| Number of accidents in 2021 | A1 | 0.356 | 0.764 | 0 | 0 | 0 | 0 | 13 | |
| Illegal Behavior | Unlawful use or failure to use lights | B1 | 1.413 | 2.720 | 0 | 0 | 0 | 2 | 48 | 
| Use of vehicles not complying with standards | B2 | 7.388 | 13.650 | 0 | 0 | 0 | 8 | 252 | |
| Aggressive overtaking | B3 | 2.216 | 4.019 | 0 | 0 | 0 | 3 | 68 | |
| Driving without a license | B4 | 2.171 | 4.053 | 0 | 0 | 0 | 3 | 66 | |
| Failure to pay fines or accept other penalties | B5 | 0.080 | 0.542 | 0 | 0 | 0 | 0 | 36 | |
| Illegal to use horn | B6 | 0.124 | 0.770 | 0 | 0 | 0 | 0 | 22 | |
| Overloading | B7 | 10.231 | 18.099 | 0 | 0 | 2 | 12 | 305 | |
| Violation of warning sign instructions | B8 | 5.844 | 10.171 | 0 | 0 | 1 | 7 | 154 | |
| Failure to yield to vehicle | B9 | 0.163 | 0.817 | 0 | 0 | 0 | 0 | 36 | |
| Speeding | B10 | 2.830 | 5.113 | 0 | 0 | 0 | 3 | 88 | |
| Driving under the influence of intoxicants | B11 | 7.004 | 12.427 | 0 | 0 | 1 | 8 | 200 | |
| Fatigue driving | B12 | 2.513 | 4.486 | 0 | 0 | 0 | 3 | 88 | |
| Failure to yield to pedestrian on sidewalk | B13 | 7.865 | 13.706 | 0 | 0 | 1 | 10 | 208 | 
| Features | Description | Code | Counts | Proportion | 
|---|---|---|---|---|
| Vehicle Status | Normal | 0 | 21,887 | 0.410 | 
| Unprocessed Violations | 1 | 12,237 | 0.229 | |
| Unprocessed Accidents | 2 | 19,282 | 0.361 | |
| Overdue for Inspection or Meeting Scrap Standards | 3 | 12 | 0.000 | |
| Operational Vehicle | No | 1 | 36,112 | 0.676 | 
| Yes | 2 | 17,305 | 0.324 | |
| Others | 3 | 1 | 0.000 | |
| Ownership | Organization | 1 | 40,599 | 0.760 | 
| Individual | 2 | 12,812 | 0.240 | |
| Others | 3 | 7 | 0.000 | |
| New Energy Vehicles | Unknown | 0 | 1 | 0.000 | 
| Yes | 1 | 25,919 | 0.485 | |
| No | 2 | 27,498 | 0.515 | |
| Engine Displacement (mL) | 0~1000 | 0 | 26,701 | 0.500 | 
| 1000~1600 | 1 | 14,747 | 0.276 | |
| 1600~2000 | 2 | 11,910 | 0.223 | |
| >2000 | 3 | 60 | 0.001 | |
| Vehicle Wheelbase (mm) | 2600~2700 | 1 | 37,836 | 0.708 | 
| 2700~2800 | 2 | 6610 | 0.124 | |
| 2400~2500 | 3 | 2454 | 0.046 | |
| 2500~2600 | 4 | 1102 | 0.021 | |
| Others | 5 | 5416 | 0.101 | |
| Accident Frequency * | No Accidents Occurred | 0 | 25,006 | 0.468 | 
| <0.2 | 1 | 2273 | 0.043 | |
| 0.2~0.5 | 2 | 10,112 | 0.189 | |
| >0.5 | 3 | 16,027 | 0.300 | |
| Accident Occurrence in 2022 | Yes | 0 | 40,297 | 0.754 | 
| No | 1 | 13,121 | 0.246 | 
| Classifier | Accuracy | Recall | Precision | F1 Score | AUC | AP | |
|---|---|---|---|---|---|---|---|
| Model 1 | Random Forest | 0.8154 | 0.5292 | 0.6529 | 0.8085 | 0.8712 | 0.6571 | 
| XGBoost | 0.8126 | 0.5114 | 0.6505 | 0.8046 | 0.8688 | 0.6503 | |
| LightGBM | 0.8171 | 0.5333 | 0.6571 | 0.8103 | 0.8739 | 0.6621 | |
| Model 2 | Random Forest | 0.8683 | 0.7872 | 0.7085 | 0.8705 | 0.9378 | 0.777 | 
| XGBoost | 0.862 | 0.7987 | 0.6889 | 0.8653 | 0.933 | 0.7737 | |
| LightGBM | 0.8745 | 0.8205 | 0.7121 | 0.8774 | 0.9443 | 0.8026 | 
| Features | Model 1 | Model 2 | ||||||
|---|---|---|---|---|---|---|---|---|
| Coef. | Std.Err. | p-Value | OR | Coef. | Std.Err. | p-Value | OR | |
| Vehicle Age | −0.142 *** | 0.011 | <0.0001 | 0.868 | −0.057 *** | 0.011 | <0.0001 | 0.945 | 
| Vehicle Status | −0.005 | 0.014 | 0.733 | 0.995 | −0.007 | 0.015 | 0.639 | 0.993 | 
| Operational Vehicle | −0.454 *** | 0.064 | <0.0001 | 0.635 | −0.418 *** | 0.068 | <0.0001 | 0.659 | 
| Ownership | 0.064 | 0.044 | 0.145 | 1.066 | 0.111 * | 0.046 | 0.015 | 1.117 | 
| New Energy Vehicles | −2.112 *** | 0.146 | <0.0001 | 0.121 | −2.972 *** | 0.149 | <0.0001 | 0.051 | 
| Vehicle Wheelbase | −0.045 * | 0.018 | 0.010 | 0.956 | −0.073 *** | 0.019 | <0.0001 | 0.930 | 
| Engine Displacement | 0.815 *** | 0.073 | <0.0001 | 2.260 | 1.116 *** | 0.074 | <0.0001 | 3.052 | 
| B1 | 0.006 | 0.017 | 0.740 | 1.006 | 0.062 ** | 0.019 | 0.001 | 1.064 | 
| B2 | −0.013 * | 0.006 | 0.028 | 0.987 | −0.033 *** | 0.007 | <0.0001 | 0.967 | 
| B3 | 0.015 | 0.013 | 0.247 | 1.015 | 0.053 *** | 0.014 | <0.0001 | 1.054 | 
| B4 | −0.032 | 0.016 | 0.053 | 0.969 | −0.118 *** | 0.018 | <0.0001 | 0.888 | 
| B5 | 0.106 * | 0.047 | 0.025 | 1.112 | 0.219 *** | 0.051 | <0.0001 | 1.245 | 
| B6 | 0.009 | 0.017 | 0.608 | 1.009 | 0.005 | 0.019 | 0.799 | 1.005 | 
| B7 | 0.006 | 0.005 | 0.248 | 1.006 | 0.014 ** | 0.005 | 0.009 | 1.014 | 
| B8 | 0.009 | 0.011 | 0.375 | 1.009 | 0.027 * | 0.011 | 0.019 | 1.027 | 
| B9 | −0.042 | 0.024 | 0.076 | 0.959 | −0.060 * | 0.025 | 0.015 | 0.941 | 
| B10 | 0.014 | 0.009 | 0.148 | 1.014 | 0.040 *** | 0.010 | <0.0001 | 1.040 | 
| B11 | 0.010 | 0.009 | 0.271 | 1.010 | 0.022 * | 0.010 | 0.020 | 1.023 | 
| B12 | 0.027 | 0.020 | 0.169 | 1.027 | 0.044 * | 0.021 | 0.036 | 1.045 | 
| B13 | −0.006 | 0.010 | 0.572 | 0.994 | −0.003 | 0.011 | 0.813 | 0.997 | 
| Accident Frequency | 1.464 *** | 0.021 | <0.0001 | 4.322 | 1.745 ** | 0.024 | <0.001 | 5.727 | 
| A0 (model 2) | −0.688 *** | 0.021 | <0.0001 | 0.503 | ||||
| A1 (model 2) | −0.654 *** | 0.019 | <0.0001 | 0.520 | ||||
| constant | −0.868 ** | 0.169 | <0.001 | 0.420 | −0.607 ** | 0.172 | <0.001 | 0.545 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, J.; Zhao, C.; Liu, Z. Can Historical Accident Data Improve Sustainable Urban Traffic Safety? A Predictive Modeling Study. Sustainability 2024, 16, 9642. https://doi.org/10.3390/su16229642
Wang J, Zhao C, Liu Z. Can Historical Accident Data Improve Sustainable Urban Traffic Safety? A Predictive Modeling Study. Sustainability. 2024; 16(22):9642. https://doi.org/10.3390/su16229642
Chicago/Turabian StyleWang, Jing, Chenhao Zhao, and Zhixia Liu. 2024. "Can Historical Accident Data Improve Sustainable Urban Traffic Safety? A Predictive Modeling Study" Sustainability 16, no. 22: 9642. https://doi.org/10.3390/su16229642
APA StyleWang, J., Zhao, C., & Liu, Z. (2024). Can Historical Accident Data Improve Sustainable Urban Traffic Safety? A Predictive Modeling Study. Sustainability, 16(22), 9642. https://doi.org/10.3390/su16229642
 
        



 
       