1. Introduction
Globally, road traffic accidents kill upwards of 1.2 million people each year and injure more than 50 million people, resulting in economic losses of 3% of the global average GDP [
1]. In order to alleviate the huge losses caused by road traffic accidents, efforts have been made to reduce the frequency and severity of accidents [
2]. The rapid development of China’s highway system has provided a strong transportation base for rapid economic growth. Unfortunately, the highways are experiencing considerable road safety problems. Among the different types of roads in China, freeways typically have the highest mortality rate [
3]. The traffic fatality rate on freeways—the ratio of fatalities to injuries—is as high as 35%, significantly higher than on other types of roads [
4]. The design, construction, and maintenance standards for freeway infrastructure are higher than those for other types of roads; the traffic flow is more straightforward, and crash rates may be lower [
5]. However, due to the high proportion of heavy and fast-moving vehicles, freeway accidents frequently have more severe consequences [
3]. According to the Traffic Administration Bureau of the Ministry of Public Security of China, freeway traffic accidents account for only 5% of road traffic accidents, but for about 10% of deaths [
6]. Therefore, an in-depth study of freeway safety in China is crucial and urgently needed.
The issue of traffic safety has long been a concern for freeway management agencies and experts [
7,
8,
9,
10]. Academics have done numerous studies to examine the freeway crash frequency, which is important for developing countermeasures to reduce accidents [
10,
11,
12,
13,
14,
15], but not enough emphasis has been placed on the severity of the crashes. It is necessary to explore the mechanism of crash injuries and put forward countermeasures from the source by considering all relevant factors. Several intricate factors may contribute to the severity of injuries sustained in collisions [
16,
17]. How to quantify the impact of multiple factors on crash severity and further comprehend the interaction mechanism between elements is the premise of proposing efficient countermeasures. In recent years, there has been a gradual increase in research on crash severity in other road environments, where both discrete choice models and emerging data mining techniques have been introduced to solve crash injury severity problems [
5,
6,
18,
19]. However, only the ordered or disordered response models have been used for analysis in those using the discrete choice models [
20,
21]. To bridge the gap between ordered and disordered response models, the partial proportional odds model will allow some of these independent variables to violate the parallel-lines assumption [
22,
23]. The primary motivation of this study is to fully investigate effects of the key factors related to the driver, vehicle, road, and environmental conditions on freeway crash injury severity using a partial proportional odds model together with an ordered logit model and a generalized ordered logit model.
In order to accurately and efficiently identify the key variables that influence the severity of freeway accidents, data on 1443 historical accidents on 290 km of the Hang-Jin-Qu Freeway were collected and supplemented with accident-related road and environmental aspects. The main contribution in the research is the development of three more popular and promising discrete choice models based on historical accident data to analyze accident severity, especially the partial proportional odds model that bridges the gap between ordered and unordered models. Additionally, specific managerial recommendations were made based on the varying impacts of various elements. Finally, based on the modeling results, the potential application areas of the modeling results and the limitations of this study are discussed.
To describe the flow of the current research in detail, this article is divided into the following sections: first, the research background and main contributions of the study are presented in the introduction section; second, there is a literature review section focusing on the methods for analyzing the factors influencing the severity of accidents.
Section 3 provides the databases used in this study and conducts a descriptive exploratory analysis.
Section 4 presents the methods used in the study and the reasons for their selection. Then, the findings and discussion are detailed in
Section 5. Lastly,
Section 6 summarizes the main findings and limitations of this study.
2. Literature Review
In previous studies, a wide range of factors have been found to potentially influence the severity of road traffic crashes, including attributes of human [
24,
25], vehicle, road, and environment conditions [
26,
27,
28].
In terms of the methodology used, the discrete choice model approach is a new trend in the literature for analyzing accident injury severity. The logit or probit models are appropriate and frequently used to solve this kind of problem [
29]. Ye et al., (2013) investigated the crash frequencies by severity level for freeway sections using a joint Poisson regression model [
12]. They discovered that the model could enhance the effectiveness of most coefficient estimators. Ratanavaraha and Suangka (2014) formulated a multiple logistic regression model to examine the probability of injury and fatal accidents compared with property-damage-only (PDO) accidents [
27]. However, the model demands that each variable be independent and rigorously adhere to the independence from irrelevant alternative (IIA) features. Based on a random effects negative binomial (RENB) model, researchers investigated the potential factors contributing to freeway crashes [
13,
14,
15]. The RENB and RPNB models significantly outperform the negative binomial (NB) model, according to research that applied a random parameters negative binomial (RPNB) model in addition [
15,
30]. By relaxing the IIA feature, the mixed logit models were also developed to explore the contribution of predictors of crash injury severity [
31,
32]. Ye et al., (2021) investigated the expressway crash severity using a random parameter logit (RPL) model by considering the potential unobserved heterogeneity [
6]. Based on their investigation of these various factors’ effects on safety using the RENB model, Hou, Tarko, et al., (2018) created the uncorrelated random parameter negative binomial model (URPNB) and the correlated random parameter negative binomial model (CRPNB), both of which had better goodness-of-fit [
11].
To forecast the accident injury severity, several researchers have developed artificial neural network (ANN) [
19], support vector machine (SVM) [
18], and Markov blanket (MB) [
33] models. Usually, these models provide superior model fits but are targeted at prediction accuracy and are less interpretable for accident influencing factors. By relaxing the IIA characteristics, several researchers employed the nested logit model [
34] and the latent class (LC) logit model [
35] to analyze the influencing factors of traffic accidents. However, because all of these models are unordered response models, they cannot capture the internal relationship between the orderly nature of some influencing factors and the injury severity.
Some academics have suggested the ordered reaction models to fit the ordered multi-classification characteristics of accident severity. Chu (2014) used ordered logit (OL) and latent class models to examine critical factors in the severity of injuries in crashes involving high-deck buses on freeways [
21]. However, the OL model requires that the independent variables strictly adhere to the parallel-lines assumption (PLA); that is, the regression coefficients of the independent variables do not change with the accident’s severity. Mergia et al., (2013) and Ma et al., (2016) applied a generalized ordered logit (GOL) model to quantitatively analyze the influence of the significant factors on the likelihood of crash injury severity in selected freeway areas by relaxing the PLA, which allows all independent variables to violate the assumption [
28,
36].
To account for the ordered nature of discrete crash severity levels and spatial association, Zeng et al. (2019) developed a Bayesian spatial GOL model with conditional autoregressive priors to assess the severity of freeway crashes. Bayesian inference shows that the spatial model outperforms the conventional GOL model because of a better model fit [
20]. A partial proportional odds (PPO) model was developed by Wang et al. (2009) to evaluate the impacts of the factors and predict the injury severity in areas where freeways diverge. The results indicated that the PPO model is more adaptable and produces significantly better results [
37]. The common methods used in the literature related to road traffic accident severity analysis and their advantages are shown in
Table 1.
In conclusion, although the ordered response models can capture the ordered nature of categorical data, they impose a tight PLA on all independent variables. Some independent variables do not meet the PLA while creating the ordered response model [
39]. All independent variables, however, are not constrained by the PLA in the GOL model. Both OL and GOL models lack flexibility. As a comprehensive improvement model of the OL and GOL models, the PPO model fully captures the ordered properties of each category variable and allows some variables to violate the PLA. These effects support PPO models in investigating influencing factors of freeway crash injury severity.
In terms of research contents, some studies have developed statistical models that consider several variables that can affect the severity of freeway accidents, including the driver, the vehicle, the road, and the environmental conditions [
27,
28]. However, not many studies have been conducted in the context of an entire freeway. The majority of the research is for a specific stretch of the freeway, such as tunnel sections [
11,
36,
40], freeway merging and diverging locations [
28]. Moreover, differences remain regarding the magnitude of the impact of various factors on accident severity. Although these problems have been paid more attention to in the modeling process in recent years, how to improve the accuracy of model prediction by enhancing the traditional discrete choice model remains to be solved.
4. Methodology
The injury severity level is a discrete dependent variable, and a discrete choice model is an appropriate method for modeling it. The OL, GOL, and PPO models were built. Among them, the PPO model can reflect the orderly nature of each variable and allow the coefficients of some independent variables to vary with different levels, which gives it vital flexibility. The PPO is an improved model of the OL model and GOL model.
When the severity category is
(
), the probability that a crash severity category
occurs in an observed crash
can be expressed as Equation (1).
where
is the probability that a crash severity category
occurs in an observed crash
, and
is a linear function that determines the severity of the crash
. Usually,
can be linearly expressed by Equation (2).
where
is a vector of measurable characteristics (risk factors) that determine severity,
is a vector of computable coefficients to be estimated, and
is a disturbance term that considers unobserved effects.
Here, we define
as the lowest value of the injury severity variable, i.e., PDO. An ordered discrete choice model is an appropriate method for modeling it [
20]. The OL model probability can be expressed as Equation (3).
where
represents the probability of crash severity for a given accident
.
is the number of cut points.
represents the regression intercept of each cut point.
is the regression coefficient vector that does not change across different logit models.
is the explanatory variables vector.
However, a strict limitation of using the OL model is the PLA [
42]. Therefore, some scholars put forward the GOL model as an alternative method that can relax the PLA’s limitations. The only difference between them is that the regression coefficients
may differ in severity levels.
The probability calculation expression of GOL is as Equation (4).
where
represents an injury severity category,
is the cutoff point for the
cumulative logit,
is a vector of model parameters, and
is a vector of observed explanatory variables [
43].
In practice, one or several variables may violate the PLA [
42]. In such a situation, some of the parameters of variables satisfying the PLA may be redundant [
43]. Hence, a gamma-parameterized form of the GOL model proposed by Peterson and Harrell (1990) is commonly used [
23]. The gamma-parameterized GOL model is commonly known as the partially constrained GOL or PPO model. The PPO model is the intermediate method between the OL and GOL models [
22,
23]. In the PPO model, the PLA is only relaxed for some variables. In other words, the regression coefficients of explanatory variables that violate the PLA vary across the dividing points, while other variables remain unchanged. The PPO model can be described as Equation (5).
where
represents a subset of explanatory variables for which the PLA is violated and
is a vector of parameters associated with
[
42]. The
represent deviations from proportionality. If all gammas are equal to zero, the model reduces to the traditional OL model. The model’s parameters are estimated using the maximum likelihood procedure [
43]. The probability formula of the PPO model can be expressed as Equations (6)–(8).
The PPO model can fit the command “gologit2 with autofit lrf” written by the user in the Stata 16.0 [
44]. The PPO model results are interpreted similarly to a binary logistic regression [
43]. We can group the three outcome levels into two comparison groups for a variable with three categories. Consequently, this results in two sets of outcome groups for each model developed. For
, outcome level 1 is compared with outcome levels 2 and 3; for
, the comparison is between outcome levels 1 and 2 compared with outcome level 3. An optimistic coefficient indicates that higher values on the predictor variable increase the likelihood of an injury being at a more severe level than the current one. Likewise, a negative coefficient indicates that higher values on the predictor variable increase the likelihood of an injury being in the present or lower level; that is, it decreases the likelihood of being in higher injury groups.
The marginal effects provide the effect of a one-unit increase in an explanatory variable on the injury-outcome probability. The average marginal effects of overall crash observations will be computed and reported to assess the influence of the explanatory variables on injury severity outcome probabilities. For the crash
and injury severity level
, the marginal effects of all variables can be expressed as Equation (9).
where
represents the marginal effects of the
dummy variable
, and
and
denote the probability that the dummy variable
equals 1 and 0, respectively.
It is usually necessary to test the model’s validity in a regression analysis. The Akaike information criterion (
), Bayesian information criterion (
), and McFadden’s pseudo
are usually used to evaluate the fitness of a theoretical model. The calculation method of each evaluation index can be defined as Equations (10)–(12).
where
is the initial value of the log-likelihood at zero; that is, the value of the log-likelihood when no independent variable is included in the model.
is the convergence value of the log-likelihood function; that is, the value of the log-likelihood function when all significant independent variables and constant terms are included in the model,
is the number of parameters, and
is the number of observations. The best-fitted models have a lower
,
, and higher pseudo
.
6. Conclusions
This study aimed to analyze the influence of driver, vehicle, road, and environmental factors on crash injury severity on the freeway based on three different discrete choice models. Based on the data from this study, the fit of the PPO model was compared with the fit of the OL and GOL models. The variables that violated PLA were identified through the Brandt test. The PPO model was finally applied for detailed analysis because of its potency in handling a mixture of variables that met or violated the PLA. The results showed that collision with a guardrail and other objects, female drivers, and drivers aged 55+ years were more likely to cause injury and fatal outcomes. PDO was more likely, but injury and fatal accidents were less likely when the driving experience was 2- years, a large vehicle was responsible, and the vehicle was not going straight. Wet-skid road conditions enhanced the risk of injury accidents. The severity of an accident typically increased when it occurred at night on a road without lighting.
This study provides a new methodological reference for the field of freeway traffic safety research. The research results can provide a decision basis for the freeway management to take corresponding safety management measures in reducing the severity of freeway crashes. When rigid obstacles such as piers exist in the middle zone or lateral clearance range of a high-speed freeway, safety protection or energy absorption facilities should be added to reduce the severity of accidents. Dynamic variable speed limit signs and warnings can be installed to limit speed, improve driver vigilance, and reduce the accident severity when visibility declines or on wet-skid road surfaces. The lighting conditions of the corresponding sections should be improved to mitigate severe accidents in the accident-prone areas without lighting. When selecting the type of intermediate belt guardrail, a semi-rigid guardrail should be chosen as far as possible to meet the requirements of protection grade. Passive safety protection facilities can be added to the accident-prone sections with concrete guardrails. Strengthening drivers’ safety education and training is necessary, especially for female drivers who have 3–10 years’ experience. For senior drivers, it is vital to improve the assessment of their physical health status. Increased traffic enforcement and a decrease in lane-change violations by vehicles are required.
The research has some shortcomings. A variety of factors influence the likelihood of injury severity outcomes. This study only analyzed the factors for which data could be collected, but there are still many factors for which data are not easy to assemble or that have a significant impact on accidents but are unknown, such as instantaneous vehicle speed, real-time weather, use of seat belts, light conditions, etc. In addition, some of the influencing factors about human behavior were not considered in this study, such as fatigue, low performance, and driver vigilance and aggression, but the impact of these factors on the severity of accidents is very important. In follow-up study, more types of data can be collected for research to obtain more valuable research results. In addition to the PPO model, more models, such as the nested logit and RPL models, can be used to analyze accident severity. A comprehensive comparative analysis of the practicability of each model is also a direction for subsequent research. This study took only the traffic accidents on a particular freeway as a sample. It is fair to say there is a possibility that a wider area will provide insight into what has been documented in the present study. Different sample sizes will affect the parameter estimation of the model. This paper does not group the samples in-depth, and statistics of the sample sizes applicable to other models may also lead to deviations in the calibration results. In follow-up study, we can study the influence of different sample sizes on parameter estimation and group the samples to obtain better-fitting results.