Next Article in Journal
Exploiting Content Characteristics for Explainable Detection of Fake News
Next Article in Special Issue
An XGBoost Approach to Predictive Modelling of Rift Valley Fever Outbreaks in Kenya Using Climatic Factors
Previous Article in Journal
Classification and Recognition of Lung Sounds Using Artificial Intelligence and Machine Learning: A Literature Review
Previous Article in Special Issue
Deep-Learning-Driven Turbidity Level Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Factors Affecting Single and Multivehicle Motorcycle Crashes: Insights from Day and Night Analysis Using XGBoost-SHAP Algorithm

by
Panuwat Wisutwattanasak
1,
Chamroeun Se
1,
Thanapong Champahom
2,
Rattanaporn Kasemsri
3,
Sajjakaj Jomnonkwao
4 and
Vatanavongs Ratanavaraha
4,*
1
Institute of Research and Development, Suranaree University of Technology, Nakhon Ratchasima 30000, Thailand
2
Department of Management, Faculty of Business Administration, Rajamangala University of Technology Isan, Nakhon Ratchasima 30000, Thailand
3
School of Civil Engineering, Institute of Engineering, Suranaree University of Technology, Nakhon Ratchasima 30000, Thailand
4
School of Transportation Engineering, Institute of Engineering, Suranaree University of Technology, Nakhon Ratchasima 30000, Thailand
*
Author to whom correspondence should be addressed.
Big Data Cogn. Comput. 2024, 8(10), 128; https://doi.org/10.3390/bdcc8100128
Submission received: 30 August 2024 / Revised: 27 September 2024 / Accepted: 1 October 2024 / Published: 3 October 2024
(This article belongs to the Special Issue Machine Learning and AI Technology for Sustainable Development)

Abstract

:
This study aimed to identify and compare the risk factors associated with motorcycle crash severity during both daytime and nighttime, for single and multivehicle incidents in Thailand using 2021–2024 data. The research employed the XGBoost (Extreme Gradient Boosting) method for statistical analysis and extensively examined the temporal instability of risk factors. The results highlight the importance of features impacting the injury severity of roadway collisions across various conditions. For single motorcycle crashes, the key risk factors included speeding, early morning incidents, off-road events, and long holidays. In multivehicle crashes, rear-end collisions, interactions with large vehicles, and collisions involving other motorcycles or passenger cars were linked to increased injury severity. The findings indicate that the important factors associated with motorcyclist injury severity in roadway crashes vary depending on the type of crash and time of day. These insights are valuable for policymakers and relevant authorities in developing targeted interventions to enhance road safety and mitigate the incidence of severe and fatal motorcycle crashes.

1. Introduction

Motorcycles are a crucial mode of transportation globally, especially in developing countries, where they often serve as the primary means of mobility. However, motorcyclists face a disproportionately high risk of fatal and severe injuries, creating a significant public health concern, as highlighted by the World Health Organization [1]. This problem is particularly severe in countries with insufficient road safety regulations and infrastructure. Data from the National Highway Traffic Safety Administration NHTSA [2] reveal that in 2020, motorcyclist fatalities occurred 28 times more frequently per vehicle mile traveled than those of passenger car occupants. In Thailand, the situation is particularly alarming, with motorcycle registrations exceeding 22 million in 2023 and increasing by 10% annually. This rapid growth poses substantial challenges for road safety authorities, as motorcycle-related crashes, along with their high rates of fatalities and severe injuries, significantly outpace those in other regions globally [1]. Thailand ranks among the countries with the highest road traffic fatality rates globally, with motorcycles involved in over 70% of these incidents [1]. The consequences of these crashes are far-reaching, impacting physical health, emotional well-being, and the economy. Compared to other types of crashes in Thailand, motorcycle accidents consistently account for more than 40% of fatalities and over 50% of combined severe and fatal injuries, underscoring a growing and critical safety concern. This issue closely with the United Nations’ Sustainable Development Goals (SDGs), particularly SDG 3 (Good Health and Well-Being) and SDG 11 (Sustainable Cities and Communities). Road safety management is integral to achieving multiple SDGs by promoting health, safety, sustainability, and inclusivity in transport systems. By reducing road traffic accidents, improving infrastructure, and educating the public, road safety management supports the broader agenda of sustainable development, ensuring that cities and communities are safer and more resilient [3].
Temporal factors have been found to significantly influence variations in motorcyclist injury severity. Previous studies have shown a strong association between the time of day and the severity of injuries sustained by motorcycle riders [4,5,6]. Behnood and Mannering [7] offer two key explanations for this variation. First, human factors such as decision-making, responsiveness, and alertness may fluctuate throughout the day due to fatigue, circadian rhythms, and sleep deprivation. Second, unobserved variables related to visibility and lighting conditions can vary depending on whether a crash occurs during daylight or nighttime hours. Given these factors, many studies on collision severity have carefully examined the impact of the time of day on injury outcomes. The research has explored a range of topics, including the severity of injuries in heavy-truck crashes [7], pedestrian-involved accidents [8,9], work zone collisions [10], and bicycle–vehicle crashes [11]. The empirical evidence from this body of literature consistently suggests that the time of day plays a crucial role in determining the severity of injuries resulting from collisions. Moreover, this influence may extend beyond the basic use of indicator variables that represent different time intervals in statistical models [7].
In addition to temporal factors, several previous studies also have highlighted significant differences in the factors influencing rider injury severity in single motorcycle versus multivehicle motorcycle crashes. The literature indicates that the severity of injuries and the factors affecting them vary notably between these two crash types. This variation can be linked to differences in factors such as lighting conditions, time of day, collision types, use of safety equipment, speeding, and other contributing causes [12,13,14]. Since single and multivehicle crashes are usually analyzed separately to avoid the complexities arising from vehicle occupancies and interactions [15], this study sought to develop statistical models that account for potential variations across these different crash types. The goal was to understand how these factors influence injury severity in motorcycle crashes over different periods, providing a more nuanced analysis of the variables at play. As noted earlier, the existing research has highlighted that crashes occurring during the daytime and nighttime significantly impact the severity of motorcycle crashes in distinct ways [4,10,16]. Moreover, scholars have found that single motorcycle and multivehicle motorcycle crashes exhibit different characteristics and severities [17]. However, no studies have yet combined the classification of day–night and single versus multivehicle crashes, particularly in developing nations where the severity of motorcycle crashes is more pronounced.
In terms of analytical approaches, machine learning techniques, such as Gradient Boosting [18,19], Random Forest (RF) [20], Support Vector Machines (SVM) [21], and Extreme Gradient Boosting (XGBoost) [22], have gained widespread popularity and are now extensively utilized across road safety and various fields. However, the XGBoost method is particularly effective in analyzing traffic crash injury severity due to its capability to handle large datasets, model complex interactions, and deliver high predictive accuracy [22,23]. One significant challenge in motorcycle crash data is class imbalance, where severe injuries or fatalities occur much less frequently than minor injuries. XGBoost addresses this imbalance effectively through techniques such as class weighting or adjusting decision thresholds, making it an appropriate method for injury severity prediction [24,25]. Additionally, XGBoost provides insights into the importance of various features—such as speed, helmet use, road conditions, and more—in predicting injury severity in motorcycle crashes. This feature analysis helps identify key factors contributing to severe injuries, thereby informing policy and intervention strategies. XGBoost’s ability to model non-linear relationships between variables is crucial for understanding the complex factors influencing motorcycle crash severity, such as the interaction between road conditions and weather, or driver behavior and vehicle type. Previous studies conducted in developed countries have successfully applied XGBoost models to analyze road crash injury severity, demonstrating the model’s high analytical potential and forecasting accuracy [22,26]. These references illustrate the robustness, accuracy, and interpretability of XGBoost as a method for understanding and predicting the factors that contribute to the severity of injuries in motorcycle crashes.
As previously noted, this study had two primary objectives: (1) to examine the factors contributing to the injury severity of motorcyclists involved in single and multivehicle crashes, and (2) to identify the risk factors associated with motorcyclist injury severity in daytime and nighttime crashes in Thailand using the XGBoost-SHAP algorithm technique. Although this study does not present a novel methodological contribution, as numerous papers have already compared XGBoost with other models and demonstrated its superiority [21,25], it applied one of the most advanced models in crash severity analysis in alignment with the primary objective. The study’s novelty lies in integrating single- and multi-vehicle crashes within a temporal instability framework (daytime vs. nighttime), an approach that had not previously been explored, to offer practical recommendations for real-world applications. Both concepts—vehicle involvement and time of day—are known to significantly influence the severity of motorcycle accidents [10,16,27,28], and this approach offers a more nuanced understanding of these interactions. By separating the analysis of motorcycle crashes based on time of day and the number of vehicles involved, the study aimed to uncover specific significant factors that can inform safety policies. The findings are intended to provide relevant safety agencies with actionable insights to develop targeted road safety measures for motorcycle users that are contextually appropriate for Thailand.

2. Data Collection

2.1. Obtaining the Crash Report Data

This research utilized the most recent and comprehensive accident data obtained from the Department of Highways (MOT). The study was composed of two key components.
First, it incorporated data on roadway motorcycle crashes across Thailand, covering the period from early 2021 to June 2024, totaling 16,175 cases. This dataset, derived from official reports filed by MOT authorities, includes a wide range of critical information, such as the causes of accidents, involved vehicles, collision types, vehicle characteristics, road conditions, weather conditions, and temporal factors (time), among others.
Second, within this extensive dataset, collisions are carefully categorized into three primary severity levels: fatal injury (resulting in loss of life), severe injury (requiring hospitalization for at least 48 h), and minor injury (involving medical treatment without hospital admission or property damage only) [29]. These severity classifications were strategically employed to highlight the varying degrees of impact and consequences associated with each recorded collision.
Additionally, the study further separated the motorcycle crash data into four distinct datasets: daytime single motorcycle, nighttime single motorcycle, daytime multivehicle motorcycle, and nighttime multivehicle motorcycle incidents. This segmentation aligns with the research objectives, enabling the identification of influencing factors according to the specific vehicle and temporal contexts. The variables and statistics utilized in this study have been coded and are described in detail, with a comprehensive breakdown provided in Table 1. According to the small proportion of some indicators (e.g., driving closely behind, illegal overtaking, etc.), XGBoost is well-equipped to analyze datasets where some indicators have a small proportion. Through its handling of imbalanced data, iterative boosting process, regularization, and its ability to use custom loss functions and interpret SHAP values, XGBoost ensures that even these less frequent indicators are considered in the analysis, leading to more comprehensive and accurate insights [30].

2.2. Empirical Setting

Table 2 and Figure 1 present a detailed proportional distribution of the injury severity levels across the different classification groups. The sample statistics revealed that multivehicle motorcycle crashes during the daytime accounted for the largest proportion of overall crashes (47.34%) within the study period (2021–2024), followed by multivehicle crashes at nighttime (26.27%), and then by single motorcycle crashes at nighttime and daytime. The table highlights the distinct characteristics between single and multivehicle motorcycle crashes. Overall, daytime crashes were more frequent than nighttime crashes, with a 21.05% higher occurrence during the day. However, within single motorcycle crashes, nighttime incidents had a higher proportion compared to those occurring during the day. Furthermore, when separating the data by daytime and nighttime collisions, the statistics indicated that nighttime motorcycle crashes, whether single or multivehicle, tended to result in more severe injuries compared to daytime crashes. Specifically, nighttime single motorcycle crashes showed a fatality rate exceeding 35% (5.03% out of 14.29%), whereas daytime single motorcycle crashes had a lower fatality rate of 23.6% (2.85% out of 12.08%). Similarly, multivehicle crashes at night exhibited the most severe injury outcomes, with a fatality rate of 37.6% (12.17% out of 47.34%) compared to a 25.7% fatality rate for daytime multivehicle crashes.
For this dataset, we also conducted a Chi-square test of independence to examine differences in fatality proportions among single motorcycle crashes (daytime and nighttime) and multivehicle motorcycle crashes (daytime and nighttime). The results revealed a significant difference in the proportions at the 0.01 p-value level. This finding indicates that the severity of motorcycle crashes (categorized as fatal, severe, or minor) varies significantly across these different classifications. Consequently, separating the data into these distinct groups enables more targeted statistical analyses, which in turn allows us to generate context-specific results and recommendations tailored to each type of crash.

3. Methodology

3.1. Overall Research Framework

This research commenced with a comprehensive literature review to identify gaps in prior studies and to explore potential methodologies applicable to the current investigation. Following this, motorcycle crash data from 2021 to 2024 were obtained from the Ministry of Transport (MOT). Four statistical models—Logistic Regression (LR), the RF model, the SVM model, and the XGBoost model—were employed to analyze the data, with model fit evaluations conducted to compare their performance. The results from the superior model were then presented, including the SHAP values and corresponding visualizations. Finally, the research discussion and conclusions are provided. The overall framework is illustrated in Figure 2.

3.2. Model Specification

3.2.1. Extreme Gradient Boosting (XGBoost) Model

XGBoost, developed by [31], was chosen for its robustness and efficiency in handling structured data, making it suitable for predicting factors affecting motorcycle crashes. XGBoost is an implementation of gradient boosted decision trees designed for speed and performance. It incorporates several advanced features that make it a powerful tool for predictive modeling. Gradient boosting is a method of converting weak learners into strong learners by sequentially building models that correct the errors of previous models. The tree ensemble model employs N additive functions to predict the output as follows [31]:
Y ^ i = ϕ X i = n = 1 N f n X i ,       f n F ,
where F = f x = ω q x ( q   :   R m T , ω R T ) denotes the space of the regression trees (specifically, CART), N represents the number of additive decision trees, F represents the space encompassing all trees, q is a tree structure, T denotes the number of tree leaves, f n is an independent tree structure q with leaf weight ω , and q x corresponds to the input data. The objective function is defined as follows:
Λ θ = i l Y ^ i ,   Y i n Ω f n
The regularization term is defined as
Ω f n = γ T + 1 2 λ ω 2
where l denotes a differentiable convex loss function measuring the prediction Y ^ i and the target Y i . The second term Ω penalizes the model’s complexity, which consists of the regression tree functions. T is the number of leaves in the tree, γ and λ are the regularization coefficients, and ω is the leaf weight. The regularization term smooths the weights, helping to mitigate overfitting. The model is then trained in an additive manner according to the loss function
Λ t = i = 1 k l ( Y i , Y ^ i t 1 + f t X i ) + Ω f t
where k represents the total number of training instances (observations) used in the dataset. This equation incrementally adds the function f t that produces the best improvement to the model.
XGBoost operates under several key assumptions that influence how it processes data and fits the model. First, it assumes additivity, meaning the relationship between predictor variables and the target can be captured through the additive combination of weak learners (decision trees). Each tree is built sequentially, focusing on correcting the residuals from previous trees to improve predictions. Second, the model assumes nonlinear relationships between features, such as speed, weather, and road conditions, and the outcome, which in this case is crash severity. Its tree-based structure allows XGBoost to capture complex interactions between these variables. Third, XGBoost assumes data independence, meaning that individual observations, such as motorcycle crashes, are independent from one another. This assumption is crucial to avoid biases that might arise from correlated or dependent data points. Additionally, XGBoost assumes that imbalanced data can be effectively handled by techniques like class weighting or adjusting decision thresholds. This is particularly important for datasets where crash severity classes (e.g., minor, severe, and fatal) are unevenly distributed. Finally, while the model can handle correlations between features, it generally assumes feature independence, where each feature contributes independently to the prediction. This assumption makes the interpretation of feature importance (such as through SHAP values) more reliable when analyzing model outputs.

3.2.2. SHapley Additive exPlanations (SHAP)

SHAP (SHapley Additive exPlanations) is a unified approach for interpreting the output of machine learning models. SHAP values quantify the contribution of each feature to a particular prediction by considering all possible combinations of features. This results in a set of values that sum to the difference between the model’s prediction for a particular instance and the average prediction across all instances, thus ensuring a fair distribution of the contribution. The SHAP value of the variable is simulated as follows [32]:
i = S X \ [ i ] S ! X S 1 ! X ! [ f S i x S i f S x S ]
where i represents the SHAP value, X denotes all variables, with S being a subset of these, and x S indicating the values of features in S. To analyze the impact of a particular variable, one model, f S i , is created including the feature in question, while another model, f S , is trained without the variable of interest. The outcomes of these models are then compared to the existing output, shown as f S i x S i f S x S . Since the feature of interest may be influenced by other variables in the model, this comparison is made across all possible subsets to calculate the differences.
In this study, mean absolute SHAP values were used to rank feature importance. SHAP values quantify each feature’s contribution to the model’s predictions by showing how much each feature increases or decreases the predicted outcome relative to the average prediction [23,24,33]. To rank features, the mean of the absolute SHAP values was calculated across all instances, with higher values indicating greater feature importance. No predefined thresholds were applied; instead, the natural ranking from the SHAP values was used, with the most important features being those with the highest mean absolute SHAP values. This approach ensures a transparent and interpretable method for assessing feature importance.

3.3. Modeling Process and Evaluation

In the analysis of all motorcycle crash models, three classes of injury were tested: minor, severe, and fatal. However, during the training phase, all models encountered difficulties in accurately classifying severe injuries (Table A1 in Appendix A). This challenge arises because the factors contributing to severe and fatal injuries are, to some extent, very similar, making it hard for the algorithm to distinguish between the two. Previous studies have also reported similar findings, highlighting the overlap in the characteristics and contributing factors of severe and fatal injuries [34,35,36]. Consequently, to improve the model’s performance and clarity, severe and fatal injuries were combined into a single category. This adjustment allows the algorithm to better differentiate between minor injuries and the more serious combined category, thereby enhancing the overall classification accuracy.
In the analysis, the dataset was divided into training and testing sets using an 80:20 split to ensure robust model evaluation and prevent overfitting. The training set was used to develop the model, while the testing set was reserved for performance validation. Additionally, cross-validation was applied during the hyperparameter tuning process to further assess the model’s ability to generalize across different subsets of the data. To optimize model performance, this study employed GridSearchCV [37,38] for hyperparameter tuning. GridSearchCV systematically explores a predefined search space of hyperparameter combinations and evaluates the model performance for each configuration. This method was chosen because it exhaustively evaluates all possible combinations of hyperparameters, ensuring the selection of the optimal set for each specific crash scenario. The key hyperparameters tuned in this study included Alpha (range: 0.1 to 0.9), Colsample_bytree (range: 0.1 to 0.9), Gamma (range: 0.1 to 0.9), Reg_lambda (range: 2 to 10), and Max_depth (range: 2 to 15). These hyperparameters were selected due to their significant influence on model performance and interpretability, allowing us to control regularization, tree depth, and feature sampling during training. The final model was evaluated using precision, recall, F1-score, and accuracy on the test set to assess its performance across different crash severity classes. All experiments were conducted using Python, leveraging the Scikit-learn and XGBoost libraries, on an MSI Intel Core i7 10th Gen system with an Intel(R) Core(TM) i7-10875H CPU @ 2.30 GHz and 16 GB of RAM.
In this study, LR was selected as the benchmark model for comparison with XGBoost due to its simplicity, interpretability, and its extensive use in previous traffic safety studies [39,40,41]. LR’s ability to provide clear insights into the relationships between predictor variables and crash severity outcomes makes it a suitable model for comparison. To ensure a comprehensive analysis, additional recently used machine learning models, namely RF and SVM, were also included in the evaluation [21].
Table 3 presents the optimized hyperparameter values for each machine learning model, while Table 4 summarizes their performance in terms of accuracy, precision, recall, and F1-score. The performance of the four models—XGBoost, Logistic Regression, Random Forest, and SVM—was compared across four crash types: nighttime multivehicle, daytime multivehicle, nighttime single-vehicle, and daytime single-vehicle crashes. The F1-score, which balances precision and recall, was chosen as the primary evaluation metric as it provides a more balanced view of a model’s ability to predict both the positive and negative classes, especially in cases of class imbalance.
For nighttime multivehicle crashes, the XGBoost model achieved the highest F1-score of 0.66 on the test data, making it the best-performing model for this scenario. In daytime multivehicle crashes, XGBoost again performed best with an F1-score of 0.63, followed closely by other models, which all scored 0.62. For nighttime single-vehicle crashes, both XGBoost and Random Forest achieved an F1-score of 0.60 on the test data, although XGBoost demonstrated slightly higher precision than Random Forest. Finally, for daytime single-vehicle crashes, XGBoost led the performance with an F1-score of 0.53, outperforming the SVM model, which had an F1-score of 0.51. While the performance for daytime single-vehicle crashes was relatively lower compared to the other crash types, XGBoost still outperformed the competing models in this category.
Overall, XGBoost demonstrated strong performance across all crash types, making it the preferred model for analyzing crash severity in this study. SVM and RF also proved to be viable alternatives in certain cases, but XGBoost’s ability to handle complex, nonlinear relationships and imbalanced data more effectively led to better predictions. This enhanced performance can be attributed to XGBoost’s boosting approach, which iteratively improves model accuracy by handling intricate interactions between features. The comparison highlights the advantages of using advanced machine learning models while offering a clear baseline for assessing XGBoost’s added value. This finding is consistent with those of recent studies [24,42,43].
Given that XGBoost outperformed the other models, SHAP (SHapley Additive exPlanations) was applied to the XGBoost model for detailed interpretation of feature importance, further enhancing the insights gained from the analysis.

4. Result and Discussion

4.1. Factors Influencing Crash Severity of Nighttime Single Motorcycle Crashes

Based on the results from the XGBoost model and SHAP values (Figure 3), 20 factors were identified as key predictors of motorcycle crash severity for single motorcycle crashes occurring at night. Among these, exceeding the speed limit emerged as the most critical factor, followed by alcohol consumption and falling asleep at the wheel. The influence of speeding violations on crash severity during nighttime single motorcycle crashes aligns with the findings from Kardar and Davoodi [44] and Goswamy et al. [25], who also highlighted that speeding significantly increases the severity of motorcycle traffic crashes. Furthermore, nighttime crashes were found to be more severe compared to daytime crashes, likely due to reduced rider visibility and decreased stopping sight distance, as noted by Abdul Manan et al. [45], Champahom et al. [40], and Santos et al. [46]. In addition, riding under the influence of alcohol was identified as a secondary factor affecting motorcycle injury severity, although it was associated with a slightly lower probability of fatal injury. This finding is consistent with that of the study by Adanu et al. [47]. Meanwhile, falling asleep at the wheel significantly increased injury severity during nighttime crashes. The high severity of accidents caused by falling asleep was due to the driver’s loss of consciousness, loss of vehicle control, and inability to protect themselves, leading to severe consequences. This finding corroborates the results of Jafari Anarkooli and Hadji Hosseinlou [48]. Interestingly, major holidays such as New Year and Songkran, which are celebrated as national festivals in Thailand, are characterized by heavy traffic congestion as most of the population uses private vehicles for travel. However, due to the heavy traffic, the severity of motorcycle crashes tended to be lower during these periods compared to times of lighter traffic, as vehicles are generally moving at lower speeds. The results of this study align with those of Se et al. [49], who reported that although the crash rate is higher during holidays, the behavioral characteristics and higher traffic density during these times result in less severe injuries for riders compared to weekdays. This observation also aligns with the earlier finding that speeding significantly influences the severity of motorcycle crashes on highways. Recent literature, such as the work by Fountas and Anastasopoulos [50], supports the idea that heavy traffic leads to reduced driving speeds, which in turn lowers the severity of road accidents. Additionally, the analysis revealed that the severity of nighttime single motorcycle crashes was notably related to road curvature. Crashes occurring on curves tended to result in a significantly higher injury severity compared to those on other road types, consistent with the findings reported by Islam [51]. Curves present a challenge for drivers in maintaining control of their vehicles, particularly at night when visibility is reduced. Consequently, well-designed curves and the installation of appropriate protective devices can significantly mitigate the severity of motorcycle accidents [52]. Moreover, rainy or wet road conditions were associated with higher severity in single motorcycle crashes at night, likely due to the limited visibility and the challenges in vehicle control under such conditions. This results in a greater risk of severe injuries when accidents occur, as noted by Se et al. [4].

4.2. Factors Influencing Crash Severity of Daytime Single Motorcycle Crashes

The analysis results presented in Figure 4 reveal that single motorcycle crashes during the daytime shared similarities with those occurring at night, particularly in the significance of certain factors related to the time of day and the causes of the crash. However, the importance of these factors varied between daytime and nighttime crashes. For instance, exceeding the speed limit was identified as the most critical factor influencing injury severity at night. In contrast, the daytime model showed that time-related factors, such as Songkran, New Year, and rush hour (which represent periods of high traffic volume or AADT), had the greatest impact. These factors positively affected the decrease in crash severity compared to other times. This finding aligns with that from the study by Fountas and Anastasopoulos [50], which reported that crashes during periods of high AADT are less likely to result in fatalities compared to those during periods of lower traffic volume. Additionally, falling asleep at the wheel emerged as a major cause of single motorcycle crashes during the daytime, significantly influencing injury severity. This situation is particularly dangerous as the driver is unconscious and unable to control the vehicle’s direction or speed [48]. Conversely, driving under the influence of alcohol has been found to result in less severe injuries during the daytime compared to other causes [11]. Although drunk driving impairs vehicle control, most intoxicated motorcyclists tend to travel short distances and at lower speeds, leading to crashes that typically involve loss of control but result in fewer injuries. The data revealed that single-vehicle motorcycle crashes due to drunk driving occurred 497 times, with 314 cases (63.18%) resulting in minor injuries and 183 cases leading to serious injuries or fatalities. These findings align with the SHAP value analysis results. In contrast, multi-vehicle crashes involving drunk driving were significantly fewer, with only 184 occurrences. Of these, 73 cases (39.67%) resulted in minor injuries, while 111 cases (60.33%) led to serious injuries or fatalities. Similar to the nighttime model, sharp curves and road curvature were also identified as significant factors that increased injury severity in single motorcycle crashes during the daytime [53]. Given this evidence, enhancing curve protection features and equipment is critically important to prevent single motorcycle crashes.

4.3. Factors Influencing Crash Severity of Nighttime Multivehicle Motorcycle Crashes

In the nighttime multivehicle crash model, the analysis identified rear-end collisions as the most critical factor influencing the severity of motorcycle-involved crashes (as shown in Figure 5). These results are consistent with the findings of Kanitpong et al. [54], who emphasized that although rear-end crashes may occur more frequently than other types (such as crossing or head-on collisions), the severity of rear-end collisions varies due to the differences in speed between the two vehicles. This variation in speed contributes to different levels of crash severity. Similarly, the data from this study revealed that of the 2743 nighttime rear-end collisions, 54% resulted in minor injuries, while the remaining 46% led to serious injuries or fatalities. In contrast, head-on collisions, which were the second most significant factor, significantly increased the likelihood of severe injuries. This observation aligns with Naqvi and Tiwari [55]’s results. Recent studies have found that head-on collisions tend to result in more severe injuries compared to other types of crashes due to the intense impact involved [11]. In low- and middle-income countries, these collisions often involve high-occupancy public transportation vehicles, which contribute to higher fatality rates per crash [47]. Additionally, the results indicated that crashes involving private cars are the most influential factor in determining injury severity in nighttime motorcycle accidents. This is likely because private cars dominate the roadways, accounting for a larger share than buses or trucks. Private cars are also more accessible, faster, and subject to fewer restrictions, leading to more frequent interactions with motorcycles. This increased exposure contributes to the injury severity in such crashes. Adanu et al. [47] similarly found that motorcycle crashes on highways involving private cars tend to result in less severe injuries compared to other types of collisions. In contrast, trucks with ten or more wheels (including trailers) are significantly associated with the severity of motorcycle-involved crashes on highways, a finding consistent with those of previous studies [56,57]. Motorcycles, lacking the protective barriers of enclosed vehicles, leave riders more vulnerable to severe injuries in collisions. The relatively small size of motorcycles, compared to large trucks, exacerbates the severity of impacts, significantly increasing the likelihood of serious or fatal injuries. The disparity in size and weight between motorcycles and trucks results in more forceful collisions, further elevating the risk of severe outcomes. The model results also revealed a significant correlation between temporal indicators and injury severity in motorcycle-related crashes. During peak hours and festival periods such as Songkran, the likelihood of severe injuries was reduced compared to other times of the day. This reduction in severity is likely due to the heavy traffic during these periods, which forces vehicles to travel at lower speeds. Since speed is a major determinant of injury severity in road traffic accidents, the reduced speed during high traffic times contributes to less severe outcomes, consistent with the findings of Fountas and Anastasopoulos [50] and Se et al. [49], who found that increased traffic density during peak time leads to less severe injuries for riders compared to weekdays.

4.4. Factors Influencing Crash Severity of Daytime Multivehicle Motorcycle Crashes

The analysis results of the daytime multivehicle model, as illustrated in Figure 6, aligns with the nighttime results, where rear-end collisions were associated with a decrease in the severity of motorcycle-involved crashes [54]. Conversely, collisions involving motorcycles with other motorcycles or passenger cars tended to result in less fatal injuries compared to collisions with larger vehicles such as trucks and trailers, as previously discussed in the last section and consistent with the relevant literature [47,58]. When considering the type of involved vehicle, large trailers were identified as the most significant factor influencing crash severity due to their substantial size, weight, and inertia. The disparity between trailers and motorcycles during collisions leads to higher severity outcomes. Additionally, crashes involving 6-wheel and 10-wheel trucks also contributed to increased severity in motorcycle collisions [56]. Similar to the nighttime model, head-on collisions were also found to be a significant factor affecting daytime multivehicle motorcycle crash severity [55,59]. Furthermore, exceeding the speed limit was identified as a major factor associated with increased crash severity during the daytime. This finding is consistent with established research highlighting the impact of speeding on crash severity [4,25]. In addition to the findings related to nighttime multivehicle crashes, the model also revealed a significant correlation between road curvature and the severity of motorcycle crashes. This aligns with the findings of Xin et al. [60], who reported that riding on curved roads significantly affects the injury severity of motorcyclists, as they are highly sensitive to vehicle control and are greatly influenced by changes in road conditions. An analysis of the sample statistics found that daytime crashes on curved roads were more prevalent than those occurring at night. Consistent with these observations, the results also demonstrated that motorcycle crashes on curved roads during the daytime had a greater impact on rider severity compared to nighttime crashes, as confirmed by the SHAP values. This finding aligns with those from previous research by Abdul Manan et al. [45], which also highlighted the increased severity associated with crashes on curved roads.

4.5. Measure and Policy Implications

The analysis of differences between single and multivehicle motorcycle crashes on highways revealed distinct factors influencing crash severity for each type. For multivehicle motorcycle crashes, the primary variables affecting severity were the type of collision and the types of involved vehicles. Conversely, for single motorcycle crashes, the key factors included exceeding the speed limit and temporal instability elements, such as long holidays, peak hours, and early morning conditions. Based on these findings, several targeted interventions are proposed to enhance safety measures and effectively reduce the severity of highway motorcycle crashes. These insights can significantly guide and refine intervention strategies aimed at improving road safety.
(1) Speeding crash management: The risks associated with high-speed collisions, especially during nighttime and early morning hours when single motorcycle crashes are more prevalent, as found in this research, need to be addressed. To enhance road safety, it is essential to implement strict speed limit enforcement, particularly in areas prone to high-speed crashes. The use of automated speed cameras, coupled with visible signage, can effectively deter speeding during nighttime and early morning hours. In conjunction with speed enforcement measures, improving lighting in high-risk areas—especially on roads with curves or poor visibility—is crucial to enhancing overall visibility and reducing the likelihood of crashes. Strategically placed and bright street lighting can significantly aid motorcyclists in navigating safely at night, particularly in high-frequency accident spots. Increased night visibility can mitigate crash risks, even when drivers are traveling at high speeds, as evidenced by the findings of this study. Collaborative efforts among policymakers, transportation agencies, and the community are crucial for effectively implementing these measures and ensuring safer roadways for all users.
(2) Prevention of head-on collisions: This study highlighted head-on collisions as a significant factor influencing injury severity. To mitigate this risk, the implementation of physical barriers and median separations on highways—particularly on routes with high occurrences of such crashes or significant motorcycle volumes—is advised. Additionally, enhancing roadway design with clearer lane markings and rumble strips could further reduce the likelihood of head-on collisions and improve overall road safety. Surprisingly, the study revealed a significant correlation between head-on collisions and an increased risk of fatality on curved roads. The data indicated that head-on collisions involving motorcycles in such areas resulted in a fatality rate of 57%, with 18% of incidents leading to serious injuries and 25% resulting in minor injuries. Consequently, in addition to installing curve protection devices, it is crucial to implement clear and visible road signs that indicate sharp curves, no-passing zones, and other potential hazards, thereby assisting motorcyclists in making safer decisions. Furthermore, widening the lanes can provide greater maneuverability, reducing the risk of encroachment into oncoming traffic. Enhancing visibility on curves will also improve riders’ awareness, which aligns well with the findings of this study.
(3) Enhanced rider awareness: Given the high risk of motorcycle collisions with various vehicle types—such as other motorcycles, passenger cars, and trucks—improving rider awareness is crucial. Most motorcycle injuries and fatalities result from interactions with other vehicles. Therefore, increasing safe driving awareness and educating riders about the importance of maintaining a safe distance and understanding vehicle visibility and blind spots is essential. This can be achieved through road safety education campaigns [61] or by incorporating these topics into driving license testing programs.
(4) Curve accident implementation: To proactively improve motorcycle safety, it is essential to ensure that road surfaces are well-maintained, free of debris, and have high skid resistance to minimize the risk of loss of control. This enhances motorcyclists’ ability to maneuver their vehicles safely and reduces the likelihood of accidents, as noted in previous sections. Additionally, implementing and enforcing reduced speed limits on sharp curves is crucial. Dynamic speed limit signs that adjust based on weather or traffic conditions can further enhance safety, particularly as driving speed, like curve characteristics, was found to be a key factor in the severity of motorcycle crashes in this study. Moreover, as a reactive measure, installing safety devices such as guardrails or safety roller barriers on hazardous curves is recommended to protect road users from serious injuries by giving priority to points where off-road incidents or crossing lanes incidents have occurred.
(5) Nighttime driving and fatigue: To enhance visibility for drivers during the nighttime and in conditions of reduced visibility, such as cloudy weather or dust, it is crucial to install high-intensity LED street lighting, reflective road signs, and lane markers. Additionally, public awareness campaigns should be conducted to educate riders on safe nighttime driving practices and the risks associated with driving while fatigued. These campaigns could use various media platforms to inform the public about the importance of regular eye exams for maintaining clear vision and offer practical advice on preventing fatigue, such as taking regular breaks and avoiding heavy meals before driving. The obtained statistics indicate that over 36% of nighttime crashes involving drivers falling asleep occurred during long holiday seasons, with more than 44% resulting in fatal injuries. In response to this data, measures to support long-distance nighttime driving should be proposed. Establishing more rest areas on highways, particularly on high-traffic routes, would provide drivers with safe spaces to take breaks during extended trips. Additionally, the implementation of in-vehicle technologies that monitor driver alertness, such as lane departure warnings and drowsiness detection systems, could help mitigate the risks associated with driver fatigue.
(6) Exclusive motorcycle lanes: With motorcycles representing a significant proportion of road users in Thailand and the broader Asian region, it is essential to consider the development of exclusive motorcycle lanes. These dedicated lanes could separate motorcycles from other vehicles, thereby reducing conflicts and interactions that contribute to crash severity. Despite the relatively high cost of implementation, exclusive motorcycle lanes have been shown to significantly enhance motorcyclist safety [62]. Previous studies support this approach, demonstrating its effectiveness in improving road safety for motorcyclists [63,64].

5. Conclusions and Recommendation for Future Works

Motorcycle crash severity remains a critical issue in road traffic safety. This study employed XGBoost models to analyze motorcycle crashes on Thai highways, representing conditions in developing countries. The data were categorized into four main datasets: daytime single motorcycle, nighttime single motorcycle, daytime multivehicle, and nighttime multivehicle motorcycle, encompassing a total of 16,175 crashes. The aim was to provide analysis results and policy recommendations tailored to crash types and time conditions.
The findings indicated that XGBoost represents a significant advancement in motorcycle crash analysis. Across all datasets—single motorcycle (daytime and nighttime), as well as multivehicle (daytime and nighttime)—XGBoost consistently outperformed LR, RF, and SVM model in predicting highway motorcycle collisions. It demonstrated superior performance in terms of accuracy, precision, recall, and F1-score sensitivity during both model training and testing. These results highlight XGBoost’s effectiveness in accurately classifying crash severity while minimizing false alarms, showcasing its exceptional predictive capability.
To enhance practical applications, the findings of this study provide several actionable recommendations for relevant authorities based on the four main research objectives:
  • Nighttime single motorcycle crashes: Recommended measures include implementing speed management strategies, enhancing nighttime driving support, and installing curve protection devices.
  • Daytime single motorcycle crashes: Rider awareness and education, along with support for nighttime driving, are crucial to addressing the issues associated with high-traffic periods, such as during Songkran, and nighttime incidents.
  • Nighttime multivehicle motorcycle crashes: This study recommends promoting rider awareness and education, as well as implementing exclusive motorcycle lanes [62,64].
  • Daytime multivehicle motorcycle crashes: The study recommends rider education, the implementation of exclusive motorcycle lanes, and the introduction of physical barriers and median separations on hazardous roadways.
In terms of data reliability, it is important to note that all crash information used in this study was derived from police reports, and the recorded cause of the crash was largely based on the judgment of the reporting officers. This introduces potential biases or inaccuracies, particularly in the categorization of crash causes. Therefore, variables related to crash causality should be interpreted with caution. Additionally, police-reported crash data often suffer from under-reporting, particularly in cases involving property damage only (PDO) or minor injuries. This under-reporting may skew the representation of less severe crashes in the dataset. Future research should explore ways to address these limitations by integrating additional data sources, such as hospital records or insurance claims, and by advocating for more comprehensive crash reporting systems that capture a wider range of incidents.
This study has several limitations. Due to data restrictions imposed by the Ministry of Transport (MOT), rider, driver, and additional roadway characteristics could not be included in the analysis. Previous research has shown that these factors are significantly associated with motorcycle crash injury severity [65,66]. While this study effectively utilized XGBoost to analyze motorcycle collision severity across single and multivehicle motorcycle crashes, both during the daytime and nighttime, the absence of rider characteristics limits the comprehensiveness of the analysis. Future research that includes rider attributes, vehicle types, and temporal factors will provide a more detailed and nuanced understanding of the factors influencing crash injury severity.
In terms of methodology, future research could explore advanced machine learning methods such as Graph Convolutional Networks (GCNs) to model spatiotemporal correlations in crash data. GCNs are particularly well-suited for capturing the complex relationships between spatial features (e.g., road networks) and temporal dynamics (e.g., traffic conditions over time) [67]. Additionally, innovative applications of machine learning, including hybrid approaches that combine various models, could offer new perspectives on crash severity analysis. Leveraging recent developments in interpretable machine learning could also help derive causal relationships from the data and provide deeper insights into the factors influencing crash outcomes while enhancing the transparency and interpretability of model results. These directions offer promising avenues to push the boundaries of predictive modeling and improve the effectiveness of road safety interventions.

Author Contributions

Conceptualization, P.W. and C.S.; methodology, P.W. and C.S.; software, T.C. and S.J.; validation, C.S. and T.C.; formal analysis, P.W. and C.S.; investigation, R.K., S.J. and V.R.; resources, R.K., S.J. and V.R.; data curation, P.W. and C.S.; writing—original draft preparation, P.W. and C.S.; writing—review and editing, T.C.; visualization, T.C. and R.K.; supervision, R.K. and V.R.; project administration, R.K., S.J. and V.R. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Suranaree University of Technology (SUT), Thailand Science Research and Innovation (TSRI), and National Science, Research, and Innovation Fund (NRSF) (NRIIS number 195628).

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the ethics committee of the Suranaree University of Technology (COE No. 130/2566, 18 November 2023).

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available on request due to privacy restrictions.

Acknowledgments

The authors would like to thank the Suranaree University of Technology Research and Development Fund.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Model evaluation metrics based on three classes of injury severity using XGBoost training model.
Table A1. Model evaluation metrics based on three classes of injury severity using XGBoost training model.
ModelInjury SeverityTrainingTest
PrecisionRecallF1-ScorePrecisionRecallF1-Score
Daytime single
motorcycle
Minor0.600.990.750.630.990.77
Severe1.000.000.010.000.000.00
Fatal0.710.110.200.620.110.19
Nighttime single
motorcycle
Minor0.560.780.650.510.790.62
Severe0.000.000.000.000.000.00
Fatal0.530.510.520.560.460.51
Daytime multivehicle
motorcycle
Minor0.640.930.760.620.920.74
Severe0.420.020.030.380.010.02
Fatal0.540.290.380.500.270.35
Nighttime multivehicle
motorcycle
Minor0.620.800.700.590.760.66
Severe0.690.080.140.310.030.06
Fatal0.590.580.580.580.570.57

References

  1. World Health Organization. Global Status Report on Road Safety 2023; World Health Organization: Geneva, Switzerland, 2023. [Google Scholar]
  2. NHTSA. Traffic Safety Facts: Motorcycles 2022; U.S. Department of Transportation: Washington, DC, USA, 2022. [Google Scholar]
  3. United Nations. Sustainable Development Goals. Available online: https://www.un.org/sustainabledevelopment/ (accessed on 2 July 2024).
  4. Se, C.; Champahom, T.; Jomnonkwao, S.; Wisutwattanasak, P.; Laphrom, W.; Ratanavaraha, V. Temporal Instability and Transferability Analysis of Daytime and Nighttime Motorcyclist-Injury Severities Considering Unobserved Heterogeneity of Data. Sustainability 2023, 15, 4486. [Google Scholar] [CrossRef]
  5. Hua, C.; Fan, W. Injury severity analysis of time-of-day fluctuations and temporal volatility in reverse sideswipe collisions: A random parameter model with heterogeneous means and heteroscedastic variances. J. Saf. Res. 2023, 84, 74–85. [Google Scholar] [CrossRef]
  6. Yuan, R.; Xiang, Q.; Huang, Y.; Gu, X. Investigating the difference in factors influencing the injury severity between daytime and nighttime speeding-related crashes. Can. J. Civ. Eng. 2023, 51, 60–72. [Google Scholar] [CrossRef]
  7. Behnood, A.; Mannering, F. Time-of-day variations and temporal instability of factors affecting injury severities in large-truck crashes. Anal. Methods Accid. Res. 2019, 23, 100102. [Google Scholar] [CrossRef]
  8. Song, L.; Fan, W.; Li, Y. Time-of-day variations and the temporal instability of multi-vehicle crash injury severities under the influence of alcohol or drugs after the Great Recession. Anal. Methods Accid. Res. 2021, 32, 100183. [Google Scholar] [CrossRef]
  9. Alogaili, A.; Mannering, F. Differences between day and night pedestrian-injury severities: Accounting for temporal and unobserved effects in prediction. Anal. Methods Accid. Res. 2022, 33, 100201. [Google Scholar] [CrossRef]
  10. Zhang, K.; Hassan, M. Crash severity analysis of nighttime and daytime highway work zone crashes. PLoS ONE 2019, 14, e0221128. [Google Scholar] [CrossRef] [PubMed]
  11. Liu, S.; Li, Y.; Fan, W. Mixed logit model based diagnostic analysis of bicycle-vehicle crashes at daytime and nighttime. Int. J. Transp. Sci. Technol. 2022, 11, 738–751. [Google Scholar] [CrossRef]
  12. Zou, W.; Wang, X.; Zhang, D. Truck crash severity in New York city: An investigation of the spatial and the time of day effects. Accid. Anal. Prev. 2017, 99, 249–261. [Google Scholar] [CrossRef]
  13. Dzinyela, R.; Adanu, E.K.; Lord, D.; Islam, S. Analysis of factors that influence injury severity of single and multivehicle crashes involving at-fault older drivers: A random parameters logit with heterogeneity in means and variances approach. Transp. Res. Interdiscip. Perspect. 2023, 22, 100974. [Google Scholar] [CrossRef]
  14. Li, J.; Fang, S.; Guo, J.; Fu, T.; Qiu, M. A Motorcyclist-Injury Severity Analysis: A Comparison of Single-, Two-, and Multi-Vehicle Crashes Using Latent Class Ordered Probit Model. Accid. Anal. Prev. 2021, 151, 105953. [Google Scholar] [CrossRef] [PubMed]
  15. Hou, Q.; Huo, X.; Leng, J.; Mannering, F. A note on out-of-sample prediction, marginal effects computations, and temporal testing with random parameters crash-injury severity models. Anal. Methods Accid. Res. 2022, 33, 100191. [Google Scholar] [CrossRef]
  16. Peng, Z.; Wang, Y.; Wang, L. A comparative analysis of factors influencing the injury severity of daytime and nighttime crashes on a mountainous expressway in China. Int. J. Inj. Control Saf. Promot. 2021, 28, 503–512. [Google Scholar] [CrossRef]
  17. Wang, M.-H. Investigating the Difference in Factors Contributing to the Likelihood of Motorcyclist Fatalities in Single Motorcycle and Multiple Vehicle Crashes. Int. J. Environ. Res. Public Health 2022, 19, 8411. [Google Scholar] [CrossRef]
  18. Zheng, Z.; Lu, P.; Lantz, B. Commercial truck crash injury severity analysis using gradient boosting data mining model. J. Saf. Res. 2018, 65, 115–124. [Google Scholar] [CrossRef]
  19. Dong, S.; Khattak, A.; Ullah, I.; Zhou, J.; Hussain, A. Predicting and Analyzing Road Traffic Injury Severity Using Boosting-Based Ensemble Learning Models with SHAPley Additive exPlanations. Int. J. Environ. Res. Public Health 2022, 19, 2925. [Google Scholar] [CrossRef]
  20. Scarano, A.; Rella Riccardi, M.; Mauriello, F.; D’Agostino, C.; Pasquino, N.; Montella, A. Injury severity prediction of cyclist crashes using random forests and random parameters logit models. Accid. Anal. Prev. 2023, 192, 107275. [Google Scholar] [CrossRef]
  21. Santos, K.; Dias, J.P.; Amado, C. A literature review of machine learning algorithms for crash injury severity prediction. J. Saf. Res. 2022, 80, 254–269. [Google Scholar] [CrossRef] [PubMed]
  22. Wu, S.; Yuan, Q.; Yan, Z.; Xu, Q. Analyzing Accident Injury Severity via an Extreme Gradient Boosting (XGBoost) Model. J. Adv. Transp. 2021, 2021, 3771640. [Google Scholar] [CrossRef]
  23. Guo, M.; Yuan, Z.; Janson, B.; Peng, Y.; Yang, Y.; Wang, W. Older Pedestrian Traffic Crashes Severity Analysis Based on an Emerging Machine Learning XGBoost. Sustainability 2021, 13, 926. [Google Scholar] [CrossRef]
  24. Laphrom, W.; Se, C.; Champahom, T.; Jomnonkwao, S.; Wipulanusatd, W.; Satiennam, T.; Ratanavaraha, V. XGBoost-SHAP and Unobserved Heterogeneity Modelling of Temporal Multivehicle Truck-Involved Crash Severity Patterns. Civ. Eng. J. 2024, 10, 1890–1908. [Google Scholar] [CrossRef]
  25. Goswamy, A.; Abdel-Aty, M.; Islam, Z. Factors affecting injury severity at pedestrian crossing locations with Rectangular RAPID Flashing Beacons (RRFB) using XGBoost and random parameters discrete outcome models. Accid. Anal. Prev. 2023, 181, 106937. [Google Scholar] [CrossRef] [PubMed]
  26. Shi, X.; Wong, Y.D.; Li, M.Z.-F.; Palanisamy, C.; Chai, C. A feature learning approach based on XGBoost for driving assessment and risk prediction. Accid. Anal. Prev. 2019, 129, 170–179. [Google Scholar] [CrossRef]
  27. Savolainen, P.; Mannering, F. Probabilistic models of motorcyclists’ injury severities in single- and multi-vehicle crashes. Accid. Anal. Prev. 2007, 39, 955–963. [Google Scholar] [CrossRef]
  28. Geedipally, S.R.; Lord, D. Investigating the effect of modeling single-vehicle and multi-vehicle crashes separately on confidence intervals of Poisson–gamma models. Accid. Anal. Prev. 2010, 42, 1273–1282. [Google Scholar] [CrossRef]
  29. Ministry of Transport. Road Accident Report 2024; Ministry of Transport: Bangkok, Thailand, 2024. [Google Scholar]
  30. Zhang, P.; Jia, Y.; Shang, Y. Research and application of XGBoost in imbalanced data. Int. J. Distrib. Sens. Netw. 2022, 18, 15501329221106935. [Google Scholar] [CrossRef]
  31. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2016; pp. 785–794. [Google Scholar]
  32. Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
  33. Mansoor, U.; Jamal, A.; Su, J.; Sze, N.N.; Chen, A. Investigating the risk factors of motorcycle crash injury severity in Pakistan: Insights and policy recommendations. Transp. Policy 2023, 139, 21–38. [Google Scholar] [CrossRef]
  34. Se, C.; Champahom, T.; Jomnonkwao, S.; Karoonsoontawong, A.; Ratanavaraha, V. Temporal stability of factors influencing driver-injury severities in single-vehicle crashes: A correlated random parameters with heterogeneity in means and variances approach. Anal. Methods Accid. Res. 2021, 32, 100179. [Google Scholar] [CrossRef]
  35. Ahmed, M.M.; Franke, R.; Ksaibati, K.; Shinstine, D.S. Effects of truck traffic on crash injury severity on rural highways in Wyoming using Bayesian binary logit models. Accid. Anal. Prev. 2018, 117, 106–113. [Google Scholar] [CrossRef]
  36. Zubaidi, H.A.; Obaid, I.A.; Alnedawi, A.; Das, S. Motor vehicle driver injury severity analysis utilizing a random parameter binary probit model considering different types of driving licenses in 4-legs roundabouts in South Australia. Saf. Sci. 2021, 134, 105083. [Google Scholar] [CrossRef]
  37. Yan, X.; He, J.; Zhang, C.; Liu, Z.; Qiao, B.; Zhang, H. Single-vehicle crash severity outcome prediction and determinant extraction using tree-based and other non-parametric models. Accid. Anal. Prev. 2021, 153, 106034. [Google Scholar] [CrossRef]
  38. Arteaga, C.; Paz, A.; Park, J. Injury severity on traffic crashes: A text mining with an interpretable machine-learning approach. Saf. Sci. 2020, 132, 104988. [Google Scholar] [CrossRef]
  39. Rahman, M.H.; Zafri, N.M.; Akter, T.; Pervaz, S. Identification of factors influencing severity of motorcycle crashes in Dhaka, Bangladesh using binary logistic regression model. Int. J. Inj. Control Saf. Promot. 2021, 28, 141–152. [Google Scholar] [CrossRef]
  40. Champahom, T.; Wisutwattanasak, P.; Chanpariyavatevong, K.; Laddawan, N.; Jomnonkwao, S.; Ratanavaraha, V. Factors affecting severity of motorcycle accidents on Thailand’s arterial roads: Multiple correspondence analysis and ordered logistics regression approaches. IATSS Res. 2022, 46, 101–111. [Google Scholar] [CrossRef]
  41. Tamakloe, R.; Das, S.; Nimako Aidoo, E.; Park, D. Factors affecting motorcycle crash casualty severity at signalized and non-signalized intersections in Ghana: Insights from a data mining and binary logit regression approach. Accid. Anal. Prev. 2022, 165, 106517. [Google Scholar] [CrossRef] [PubMed]
  42. Jamal, A.; Zahid, M.; Tauhidur Rahman, M.; Al-Ahmadi, H.M.; Almoshaogeh, M.; Farooq, D.; Ahmad, M. Injury severity prediction of traffic crashes with ensemble machine learning techniques: A comparative study. Int. J. Inj. Control Saf. Promot. 2021, 28, 408–427. [Google Scholar] [CrossRef] [PubMed]
  43. Wang, C.; Liu, L.; Xu, C.; Lv, W. Predicting Future Driving Risk of Crash-Involved Drivers Based on a Systematic Machine Learning Framework. Int. J. Environ. Res. Public Health 2019, 16, 334. [Google Scholar] [CrossRef]
  44. Kardar, A.; Davoodi, S.R. A generalized ordered probit model for analyzing driver injury severity of head-on crashes on two-lane rural highways in Malaysia. J. Transp. Saf. Secur. 2020, 12, 1067–1082. [Google Scholar] [CrossRef]
  45. Abdul Manan, M.M.; Várhelyi, A.; Çelik, A.K.; Hashim, H.H. Road characteristics and environment factors associated with motorcycle fatal crashes in Malaysia. IATSS Res. 2018, 42, 207–220. [Google Scholar] [CrossRef]
  46. Santos, K.; Firme, B.; Dias, J.P.; Amado, C. Analysis of Motorcycle Accident Injury Severity and Performance Comparison of Machine Learning Algorithms. Transp. Res. Rec. 2023, 2678, 736–748. [Google Scholar] [CrossRef]
  47. Adanu, E.K.; Agyemang, W.; Lidbe, A.; Adarkwa, O.; Jones, S. An in-depth analysis of head-on crash severity and fatalities in Ghana. Heliyon 2023, 9, e18937. [Google Scholar] [CrossRef] [PubMed]
  48. Jafari Anarkooli, A.; Hadji Hosseinlou, M. Analysis of the injury severity of crashes by considering different lighting conditions on two-lane rural roads. J. Saf. Res. 2016, 56, 57–65. [Google Scholar] [CrossRef]
  49. Se, C.; Champahom, T.; Jomnonkwao, S.; Kronprasert, N.; Ratanavaraha, V. The impact of weekday, weekend, and holiday crashes on motorcyclist injury severities: Accounting for temporal influence with unobserved effect and insights from out-of-sample prediction. Anal. Methods Accid. Res. 2022, 36, 100240. [Google Scholar] [CrossRef]
  50. Fountas, G.; Anastasopoulos, P.C. A random thresholds random parameters hierarchical ordered probit analysis of highway accident injury-severities. Anal. Methods Accid. Res. 2017, 15, 1–16. [Google Scholar] [CrossRef]
  51. Islam, M. An analysis of motorcyclists’ injury severities in work-zone crashes with unobserved heterogeneity. IATSS Res. 2022, 46, 281–289. [Google Scholar] [CrossRef]
  52. Gabauer, D.J.; Li, X. Influence of horizontally curved roadway section characteristics on motorcycle-to-barrier crash frequency. Accid. Anal. Prev. 2015, 77, 105–112. [Google Scholar] [CrossRef]
  53. Farid, A.; Ksaibati, K. Modeling severities of motorcycle crashes using random parameters. J. Traffic Transp. Eng. (Engl. Ed.) 2021, 8, 225–236. [Google Scholar] [CrossRef]
  54. Kanitpong, K.; Jensupakarn, A.; Dabsomsri, P.; Issalakul, K. Characteristics of motorcycle crashes in Thailand and factors affecting crash severity: Evidence from in-depth crash investigation. Transp. Eng. 2024, 16, 100227. [Google Scholar] [CrossRef]
  55. Naqvi, H.M.; Tiwari, G. Factors Contributing to Motorcycle Fatal Crashes on National Highways in India. Transp. Res. Procedia 2017, 25, 2084–2097. [Google Scholar] [CrossRef]
  56. Rifaat, S.M.; Tay, R.; de Barros, A. Severity of motorcycle crashes in Calgary. Accid. Anal. Prev. 2012, 49, 44–49. [Google Scholar] [CrossRef] [PubMed]
  57. Se, C.; Champahom, T.; Jomnonkwao, S.; Chaimuang, P.; Ratanavaraha, V. Empirical comparison of the effects of urban and rural crashes on motorcyclist injury severities: A correlated random parameters ordered probit approach with heterogeneity in means. Accid. Anal. Prev. 2021, 161, 106352. [Google Scholar] [CrossRef]
  58. Waseem, M.; Ahmed, A.; Saeed, T.U. Factors affecting motorcyclists’ injury severities: An empirical assessment using random parameters logit model with heterogeneity in means and variances. Accid. Anal. Prev. 2019, 123, 12–19. [Google Scholar] [CrossRef] [PubMed]
  59. Se, C.; Champahom, T.; Wisutwattanasak, P.; Jomnonkwao, S.; Chanpariyavatevong, K.; Ratanavaraha, V. Modeling of motorcyclist injury severities: A comparison between crashes on main-, frontage-, and standard-lane of roadway. IATSS Res. 2024, 48, 288–298. [Google Scholar] [CrossRef]
  60. Xin, C.; Wang, Z.; Lee, C.; Lin, P.-S. Modeling Safety Effects of Horizontal Curve Design on Injury Severity of Single-Motorcycle Crashes with Mixed-Effects Logistic Model. Transp. Res. Rec. 2017, 2637, 38–46. [Google Scholar] [CrossRef]
  61. Alonso, F.; Useche, S.A.; Valle, E.; Esteban, C.; Gene-Morales, J. Could Road Safety Education (RSE) Help Parents Protect Children? Examining Their Driving Crashes with Children on Board. Int. J. Environ. Res. Public Health 2021, 18, 3611. [Google Scholar] [CrossRef] [PubMed]
  62. Radin Sohadi, R.U.; Mackay, M.; Hills, B. Multivariate Analysis of Motorcycle Accidents and the Effects of Exclusive Motorcycle Lanes in Malaysia. J. Crash Prev. Inj. Control 2000, 2, 11–17. [Google Scholar] [CrossRef]
  63. Davoodi, S.R.; Hamid, H.; Pazhouhanfar, M.; Muttart, J.W. Motorcyclist perception response time in stopping sight distance situations. Saf. Sci. 2012, 50, 371–377. [Google Scholar] [CrossRef]
  64. Saini, H.K.; Chouhan, S.S.; Kathuria, A. Exclusive motorcycle lanes: A systematic review. IATSS Res. 2022, 46, 411–426. [Google Scholar] [CrossRef]
  65. Yu, M.; Ma, C.; Shen, J. Temporal stability of driver injury severity in single-vehicle roadway departure crashes: A random thresholds random parameters hierarchical ordered probit approach. Anal. Methods Accid. Res. 2021, 29, 100144. [Google Scholar] [CrossRef]
  66. Abrari Vajari, M.; Aghabayk, K.; Sadeghian, M.; Shiwakoti, N. A multinomial logit model of motorcycle crash severity at Australian intersections. J. Saf. Res. 2020, 73, 17–24. [Google Scholar] [CrossRef] [PubMed]
  67. Wen, X.; Xie, Y.; Jiang, L.; Pu, Z.; Ge, T. Applications of machine learning methods in traffic crash severity modelling: Current status and future directions. Transp. Rev. 2021, 41, 855–879. [Google Scholar] [CrossRef]
Figure 1. Distribution of the proportion of different degrees of motorcycle crash injury severity across groups.
Figure 1. Distribution of the proportion of different degrees of motorcycle crash injury severity across groups.
Bdcc 08 00128 g001
Figure 2. Research framework.
Figure 2. Research framework.
Bdcc 08 00128 g002
Figure 3. Nighttime single motorcycle crash results: (a) SHAP summary plot and (b) importance ranking of variables based on mean absolute SHAP value.
Figure 3. Nighttime single motorcycle crash results: (a) SHAP summary plot and (b) importance ranking of variables based on mean absolute SHAP value.
Bdcc 08 00128 g003
Figure 4. Daytime single motorcycle crash results: (a) SHAP summary plot and (b) importance ranking of variables based on mean absolute SHAP value.
Figure 4. Daytime single motorcycle crash results: (a) SHAP summary plot and (b) importance ranking of variables based on mean absolute SHAP value.
Bdcc 08 00128 g004
Figure 5. Nighttime multivehicle motorcycle crash results: (a) SHAP summary plot and (b) importance ranking of variables based on mean absolute SHAP value.
Figure 5. Nighttime multivehicle motorcycle crash results: (a) SHAP summary plot and (b) importance ranking of variables based on mean absolute SHAP value.
Bdcc 08 00128 g005
Figure 6. Daytime multivehicle motorcycle crash results: (a) SHAP summary plot and (b) importance ranking of variables based on mean absolute SHAP value.
Figure 6. Daytime multivehicle motorcycle crash results: (a) SHAP summary plot and (b) importance ranking of variables based on mean absolute SHAP value.
Bdcc 08 00128 g006
Table 1. Descriptive statistics of the variables related to single and multivehicle motorcycle crashes in daytime and nighttime.
Table 1. Descriptive statistics of the variables related to single and multivehicle motorcycle crashes in daytime and nighttime.
VariableMultivehicle Motorcycle CrashesSingle Motorcycle Crashes
DaytimeNighttimeDaytimeNighttime
MeanSDMeanSDMeanSDMeanSD
New_Year (1 if incident occurred during New Year, 0 otherwise)0.1050.3060.1040.3060.1960.3970.2000.400
SongKran (1 if incident occurred during SongKran, 0 otherwise)0.0900.2860.0970.2960.2020.4010.1560.363
Evening_peak (1 if incident occurred during evening peak hours, 0 otherwise)0.2220.4160.2410.4280.2380.4260.1380.344
Morning_peak (1 if incident occurred during morning peak hours, 0 otherwise)0.1990.3990.1890.391
Early_morning (1 if incident occurred during early morning, 0 otherwise)0.1980.3980.3890.488
Driving_closely_behind (1 if rider was driving closely behind another vehicle, 0 otherwise)0.0060.0790.0040.0630.0070.0850.0020.047
Driving_against_traffic (1 if rider was driving against traffic, 0 otherwise)0.0150.1210.0240.1530.0020.0390.0040.065
Exceed_speed (1 if rider exceeded the speed limit, 0 otherwise)0.5630.4960.6540.4760.6920.4620.7160.451
Immediately_pass_in_front (1 if rider immediately passed in front of another vehicle, 0 otherwise)0.3140.4640.1810.3850.0660.2470.0300.170
Distracting_phone (1 if rider was distracted by a phone, 0 otherwise)0.0010.0280.0020.0450.0010.036
Illegal_overtaking (1 if rider was involved in illegal overtaking, 0 otherwise)0.0130.1110.0090.0960.0040.0600.0010.036
Slipping (1 if vehicle slipped, 0 otherwise)0.0010.0360.0020.0400.0060.0750.0050.069
Sudden_lane_change (1 if sudden lane change occurred, 0 otherwise)0.0060.0740.0040.0650.0110.1060.0040.059
Violating_signals_signs (1 if traffic signals/signs were violated, 0 otherwise)0.0250.1560.0410.1970.0050.0710.0060.075
Alcohol (1 if alcohol was involved, 0 otherwise)0.0090.0920.0280.1640.0880.2830.1410.348
Unfamiliar (1 if rider was unfamiliar with the area, 0 otherwise)0.0070.0820.0040.0650.0140.1190.0140.119
No_turn_signal (1 if no turn signal was used, 0 otherwise)0.0080.0900.0040.0590.0020.039
Tire_deterioration_Flat (1 if tire deterioration or flat tire occurred, 0 otherwise)0.0010.0300.0020.0440.0030.0510.0010.030
Brake_system (1 if brake system was faulty, 0 otherwise)0.0010.0350.0010.0220.0020.0390.0030.051
Disease (1 if rider was affected by a disease, 0 otherwise)0.0010.0280.0010.0260.0050.0710.0030.051
Loss_of_control (1 if there was a loss of control, 0 otherwise)0.0030.0560.0030.0530.0070.0820.0050.069
Insufficient_lighting (1 if lighting was insufficient, 0 otherwise)0.0080.0890.0070.083
Fall_asleep (1 if rider fell asleep, 0 otherwise)0.0080.0870.0070.0850.0390.1930.0240.152
Device_defect (1 if there was a device defect, 0 otherwise)0.0020.0460.0040.0590.0090.0930.0060.075
Pedestrian (1 if a pedestrian was involved, 0 otherwise)0.0010.0360.0020.0440.0320.1770.0300.170
Side_swipe (1 if side swipe occurred, 0 otherwise)0.0050.0710.0030.053
Rear_end (1 if rear-end collision occurred, 0 otherwise)0.6850.4640.6450.478
Head_on (1 if head-on collision occurred, 0 otherwise)0.1080.3100.1120.315
Crossing (1 if crossing was involved, 0 otherwise)0.0250.1570.0190.136
Obstacle (1 if an obstacle was involved, 0 otherwise)0.0540.2260.0900.2870.0350.1830.0440.204
Off_road_STR (1 if off-road straight occurred, 0 otherwise)0.0050.0710.0060.0770.4310.4950.5230.499
Off_road_Curve (1 if off-road curve occurred, 0 otherwise)0.0010.0300.0010.0220.1080.3100.0940.292
Curve (1 if curve was involved, 0 otherwise)0.0620.2420.0600.2380.1880.3900.1690.375
Sharp_curve (1 if sharp curve was involved, 0 otherwise)0.0030.0500.0020.0490.0150.1210.0060.078
Private_area_access (1 if private area access was involved, 0 otherwise)0.0210.1430.0130.111
Public_area_access (1 if public area access was involved, 0 otherwise)0.0400.1950.0230.1490.0010.0320.0020.041
Straight (1 if straight road was involved, 0 otherwise)0.6850.4650.7410.4380.6610.4730.6950.460
Slope (1 if slope was involved, 0 otherwise)0.0200.1390.0300.1690.0430.2020.0330.178
4_legs_intersection (1 if 4-leg intersection was involved, 0 otherwise)0.0090.0940.0050.0700.0140.1190.0070.080
T_Intersection (1 if T-intersection was involved, 0 otherwise)0.0100.0990.0080.0880.0330.1780.0110.106
Y_Intersection (1 if Y-intersection was involved, 0 otherwise)0.0030.0540.0030.0530.0110.1030.0140.119
Clear (1 if weather was clear, 0 otherwise)0.9630.1900.9000.3010.9600.1960.8880.316
Rain (1 if rain was involved, 0 otherwise)0.0330.1780.0570.2330.0310.1730.0470.211
Fog_smoke_dust (1 if fog, smoke, or dust was involved, 0 otherwise)0.0010.0370.0100.0990.0020.0390.0150.122
Cloudy (1 if cloudy conditions were involved, 0 otherwise)0.0000.0200.0120.1070.0050.0710.0270.163
Other_Motorcycle (1 if another motorcycle was involved, 0 otherwise)0.1430.3500.1520.359
3_Wheeler (1 if a 3-wheeler was involved, 0 otherwise)0.0100.1000.0110.103
Passenger_car (1 if a passenger car was involved, 0 otherwise)0.3690.4830.3280.469
Van (1 if a van was involved, 0 otherwise)0.0150.1210.0170.130
Pick_up_car (1 if a pickup car was involved, 0 otherwise)0.0090.0920.0130.113
Pick_up_truck (1 if a pickup truck was involved, 0 otherwise)0.3690.4820.3370.473
6_wheels_truck (1 if a 6-wheels truck was involved, 0 otherwise)0.0440.2060.0310.173
10_wheels_truck (1 if a 10-wheels truck was involved, 0 otherwise)0.0230.1490.0380.191
10up_wheels_trailer (1 if a 10-up wheels trailer was involved, 0 otherwise)0.0580.2340.0840.277
Pedestrian (1 if a pedestrian was involved, 0 otherwise)0.0010.0360.0030.056
Table 2. Injury severity distribution differences between crash types and time of day.
Table 2. Injury severity distribution differences between crash types and time of day.
Sub-ModelSeverity LevelTotal (%)
Minor (%)Severe (%)Fatal (%)
a Daytime–Single1147 (7.09)346 (2.14)462 (2.85)1955 (12.08)
b Nighttime–Single1090 (6.74)408 (2.52)813 (5.03)2311 (14.29)
c Daytime–Multi4521 (27.95)1168 (7.22)1968 (12.17)7657 (47.34)
d Nighttime–Multi2000 (12.36)655 (4.05)1597 (9.87)4252 (26.29)
Total8758 (54.15)2577 (15.93)4840 (29.92)16,175 (100.00)
Note: a Daytime single motorcycle crashes; b nighttime single motorcycle crashes; c daytime multivehicle motorcycle crashes; d nighttime multivehicle motorcycle crashes.
Table 3. Hyperparameter tuning for XGBoost and competing machine learning models.
Table 3. Hyperparameter tuning for XGBoost and competing machine learning models.
ModelXGBoostRFSVM
Nighttime Multiα = 0.8max_depth = 6C = 5
colsample_bytree = 0.75max_features = 1.0Degree = 1
γ = 0.6min_samples_leaf = 1gamma = 0.005
reg_lambda = 9min_samples_split = 2kernel = ‘rbf’
reg_lambda = 9n_estimators = 50
Daytime Multiα = 0.8max_depth = 11C = 5
colsample_bytree = 0.75max_features = 0.65Degree = 5
γ = 0.6min_samples_leaf = 10gamma = 0.004
reg_lambda = 9min_samples_split = 10kernel = ‘rbf’
max_depth = 4n_estimators 76
Nighttime Singleα = 0.4max_depth = 15C = 1
colsample_bytree = 0.75max_features = 0.51Degree = 5
γ = 0.6min_samples_leaf = 10gamma = 0.0175
reg_lambda = 8min_samples_split = 7kernel = ‘rbf’
max_depth = 10n_estimators = 344
Daytime Singleα = 0.5max_depth = 4C = 15
colsample_bytree = 0.75max_features = 1.0Degree = 5
γ = 0.1min_samples_leaf = 10gamma = 0.005
reg_lambda = 3min_samples_split = 7kernel = ‘rbf’
max_depth = 6n_estimators = 460
Table 4. Model fit evaluation metrics.
Table 4. Model fit evaluation metrics.
MethodModelTrainingTest
AccuracyPrecisionRecallF1-ScoreAccuracyPrecisionRecallF1-Score
XGBoostNighttime Multi0.650.650.650.650.670.670.660.66
Daytime Multi0.660.660.630.630.660.650.630.63
Nighttime Single0.640.640.630.630.610.610.600.60
Daytime Single0.640.650.590.560.620.600.560.53
LR ModelNighttime Multi0.650.650.640.640.650.650.640.64
Daytime Multi0.660.650.620.620.660.650.620.62
Nighttime Single0.610.620.610.600.600.600.600.59
Daytime Single0.610.610.560.530.610.590.540.50
RF ModelNighttime Multi0.680.680.680.680.650.640.640.64
Daytime Multi0.690.680.650.650.640.630.610.62
Nighttime Single0.630.630.630.630.610.600.600.60
Daytime Single0.640.670.550.490.610.650.540.46
SVM ModelNighttime Multi0.690.690.690.690.640.640.640.64
Daytime Multi0.690.690.650.660.650.640.620.62
Nighttime Single0.630.640.620.620.610.600.590.59
Daytime Single0.640.690.580.550.630.630.550.51
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wisutwattanasak, P.; Se, C.; Champahom, T.; Kasemsri, R.; Jomnonkwao, S.; Ratanavaraha, V. Factors Affecting Single and Multivehicle Motorcycle Crashes: Insights from Day and Night Analysis Using XGBoost-SHAP Algorithm. Big Data Cogn. Comput. 2024, 8, 128. https://doi.org/10.3390/bdcc8100128

AMA Style

Wisutwattanasak P, Se C, Champahom T, Kasemsri R, Jomnonkwao S, Ratanavaraha V. Factors Affecting Single and Multivehicle Motorcycle Crashes: Insights from Day and Night Analysis Using XGBoost-SHAP Algorithm. Big Data and Cognitive Computing. 2024; 8(10):128. https://doi.org/10.3390/bdcc8100128

Chicago/Turabian Style

Wisutwattanasak, Panuwat, Chamroeun Se, Thanapong Champahom, Rattanaporn Kasemsri, Sajjakaj Jomnonkwao, and Vatanavongs Ratanavaraha. 2024. "Factors Affecting Single and Multivehicle Motorcycle Crashes: Insights from Day and Night Analysis Using XGBoost-SHAP Algorithm" Big Data and Cognitive Computing 8, no. 10: 128. https://doi.org/10.3390/bdcc8100128

APA Style

Wisutwattanasak, P., Se, C., Champahom, T., Kasemsri, R., Jomnonkwao, S., & Ratanavaraha, V. (2024). Factors Affecting Single and Multivehicle Motorcycle Crashes: Insights from Day and Night Analysis Using XGBoost-SHAP Algorithm. Big Data and Cognitive Computing, 8(10), 128. https://doi.org/10.3390/bdcc8100128

Article Metrics

Back to TopTop