1. Introduction
Road traffic crashes constitute a major public health concern worldwide. Approximately 1.3 million people die each year as a result of traffic accidents, and between 20 and 30 million people suffer non-fatal injuries [
1]. To design effective strategies to limit the number of victims of such events, we need a better understanding of the risk factors affecting the likelihood of being involved in a road crash and of its severity. This study seeks to take a step in that direction.
Our aim here, therefore, is to analyze the road crash risk factors that affect the expected proportion of bodily injury (BI) victims by level of severity. To do so, this paper analyzes the injuries suffered by vehicle occupants in traffic crashes on Spain’s roads using a BI severity level. We use official police data recording crashes involving victims in 2016 and examine a series of risk factors associated with the vehicle, the driver, and the crash itself considered as having a significant impact on the expected proportion of vehicle occupants suffering non-serious (slight), serious, and fatal injuries in a crash. Identifying the factors that affect road safety and understanding the impact of different vehicle attributes should help in the development of new safety features and improved transportation safety programs. In our analysis of factors that affect the expected number of injured occupants by level of BI severity, we pay particular attention to driver age, vehicle age, and the interaction of the two.
The contribution of this study is threefold: First, the methodology we employ is able to capture the heterogeneity attributable to the involvement of more than one vehicle in the same crash; second, any prior form of association between driver age and vehicle age, on the one hand, and the expected severity of the motor crash, on the other, is not stated, but rather is determined by the data; and, third, the potential interaction between driver age and vehicle age that might enhance the impact on expected BI severity is fully investigated.
A number of studies indicate that the effect of driver age is nonlinear both with crash severity—with young and old drivers constituting the riskiest groups [
2,
3,
4]—and with the probability of causing a crash [
5]. Recent research efforts have focused on older drivers, given their increasing longevity and the impact of aging on road traffic crash injury rates [
6,
7,
8]. Researchers have also been interested in the shape of the relationship between crash severity and vehicle age. Here, the accelerated incorporation in recent decades of technological safety improvements in newer vehicles means the latest generations of cars are associated with lower probabilities of injury and fatality in road crashes [
9,
10,
11,
12,
13,
14]. Narváez-Villa et al. [
15] report that drivers of all ages reduce their mileage as their vehicles age, suggesting that at equal exposure, the probabilities of crashes and, therefore, of potential injury are even smaller. Against a backdrop of increasing longevity, and with drivers driving until later ages, a certain association might also be expected between the aging of the driver and the aging of their vehicle, reflecting a lower expected tendency to change vehicles after a certain age. As such, an association between these two factors and the severity of injuries suffered in a traffic accident can be expected. Whereas Ayuso et al. [
12] reported an increase in the probability of fatal and serious injuries in drivers over the age of 75 and in vehicles older than the average age of the car fleet, here we aim to capture the existence of this interaction in more global terms, first, by demonstrating whether longevity in drivers and aging vehicles are statistically correlated, and, second, by measuring whether the simultaneous inclusion of the two variables as a covariate has a significant effect on the probability that the injuries resulting from a crash present a certain level of BI severity.
A vast number of studies, conducted from a range of different perspectives, have analyzed the risk factors that affect BI severity following a road traffic crash. Some focus on the type of vehicle involved and the resulting injuries [
16,
17,
18,
19]. For instance, two-wheeled motor vehicles are associated with a greater risk of serious injury or fatality [
20,
21,
22,
23], whereas heavier vehicles cause more damage to other vehicles but provide better protection for their occupants [
24]. Other studies have examined differences between crash type and BI severity, the latter increasing, for example, when the accident involves a frontal rather than rear impact [
25]. In rollover crashes and drops, passengers are more likely to suffer serious head and cervical spine injuries [
26,
27]. Similarly, analyses show that driving under non-optimal conditions of light and on non-optimal road surfaces play an important role in crash severity [
28,
29,
30]. However, as Eluru et al. [
31] indicate, there is strong evidence of the presence of correlated unobserved factors affecting BI severity levels among vehicle occupants.
Some authors have used random parameter models to account for the heterogeneity attributable to unobserved factors related to road geometrics, vehicle types, and spatial areas [
32,
33,
34]. Anastasopoulos and Mannering [
35] suggest that ignoring the possibility of random parameters when estimating count data models can result in changes to the magnitude of the effect of factors impacting crash frequency. Anastasopoulos and Mannering [
36] draw a similar conclusion when demonstrating that random parameter models using less detailed crash-specific data are still able to provide a reasonable level of accuracy. Osman et al. [
37] argue that injury severity conditional on crash occurrence can depend on numerous factors, none of which are included in crash databases. They go on to stress that the unobserved heterogeneity derived from these unobserved factors can moderate the influence of other observed covariates in the model, leading to variation in the parameter effects across different observations. Finally, Hosseinpour et al. [
38] estimate crash counts for four multi-vehicle collision types and report dependencies between collision types and a spatial correlation between adjacent sites.
In this paper, we seek to further analyze the dependencies between driver and vehicle ages and BI severity in road traffic accidents. In so doing, we also include the effect of unobserved factors that might influence the correlation between the two variables. Previous studies show that older drivers drive older vehicles more frequently [
6,
12]; here, we aim to test whether this correlation is statistically significant in explaining differences in crash severity. We apply generalized linear models (GLMs) with random effects—that is, generalized linear mixed models (GLMMs)—to examine the dependence between vehicles involved in the same crash. Specifically, we apply a binomial regression model with random effects, which is a particular case of random parameter models. By including random effects in fixed linear models, we are able to analyze multilevel data when those data have more than one source of random variability. As Mannering et al. [
32] point out, multivariate issues are likely to arise in the case of crashes involving multiple occupant injuries incurred in the same accident. In such instances, unobserved factors that influence the severity of the injuries—such as the structural characteristics of the vehicles involved, among others—would be correlated [
31,
32,
33]. Indeed, the structural characteristics of new- and older-generation vehicles can vary considerably. The GLMM framework assumes a linear relationship between the dependent variable and the covariables. Additionally, a semiparametric GLMM is fitted to the data to determine the real form of dependence between driver and vehicle ages and the severity of the motor traffic crash.
The rest of this paper is structured as follows.
Section 2 defines the GLMM used to model the proportion of injured victims in a crash by level of BI severity when including random effects.
Section 3 describes the dataset and presents the key descriptive statistics. Results related to the model selection and the binomial GLMM estimated are reported in
Section 4, where a detailed analysis of the impact of driver age and vehicle age on BI severity is carried out. Discussion is provided in
Section 5, and
Section 6 concludes.
2. Generalized Linear Mixed Models
Our analysis focuses on the relationship between a set of risk factors and the number of victims in a vehicle involved in a crash according to the severity of their injuries. We deal with three discrete variables: the number of non-seriously injured occupants in the vehicle, yns; the number of seriously injured occupants, ys; and the number of fatally injured occupants, yf, where injuries are considered non-serious if the victim suffered only minor personal injuries and did not require hospitalization or was hospitalized for less than 24 h; serious if they required hospitalization for more than 24 h; and fatal if the victim’s death occurs as a result of the crash within a 30-day period following the accident. The unit of observation in our analysis is the vehicle involved in the crash.
The number of injured victims is a function of vehicle occupancy. The set of vehicles included in the analysis has different passenger capacities and, even if they had the same capacity, the number of occupants at the time of the crash is likely to differ. The number of injured occupants per level of BI severity are modeled in relative terms, i.e., the proportion of injured victims in relation to the total vehicle occupancy. GLMs with a binomial error distribution is the appropriate regression when the dependent variable is expressed in relative terms. The GLM relates the conditional mean of the distribution
and the linear regression through the link function
as follows:
for the
ith vehicle,
i = 1,…,
I, where
is the linear predictor,
β is the vector of the regression coefficients, and
is the vector of regressors. The dependent variable
reflecting the relative number of injured victims in the vehicle according to the level of severity
j = (ns,s,f) follows a binomial distribution,
, where
s is the number of occupants in the vehicle and
is the proportion of victims injured with a severity level
j. If the canonic link function selected is
, the binomial specification is equivalent to the logit regression model [
38].
When multiple vehicles are involved in a crash, the number of victims presenting the same level of BI in each vehicle is assumed to be correlated [
31]. When a dataset presents correlated clusters, GLMMs are a more appropriate specification. GLMMs are an extension of GLMs that incorporate random effects for the analysis of multilevel data. Now, we introduce a Q-dimension vector of cluster-specific parameters
and a vector
of predictors corresponding to the random effects, for n = 1,…,N. In our case,
n indicates the crash and only one cluster-specific parameter is considered, so
and
are scalars. In the GLMM with a cluster-specific variable, the conditional mean
is regressed on the predictors as follows:
. The constant term of the linear predictor is no longer the same for all observations but now varies for each group of vehicles involved in the same crash. Thus, unobserved individual-specific heterogeneity associated with the crash in which the vehicle was involved is introduced into the regression modeling.
3. Data
The dataset of road crashes involving victims was provided by the Spanish Traffic Authority (DGT). It contains information monitoring the evolution of victims in a thirty-day period following the accident, as recorded by traffic agents. The complete database contains information for 100,494 police-reported motor vehicle crashes with victims for the period from January 2016 to December 2016. A total of 179,295 vehicles were involved, there being no victims in 73,611 of the vehicles and at least one victim in 105,684 of the vehicles. Only those vehicles presenting complete records in line with our research requirements were selected. Thus, we analyzed 96,472 vehicles involved in 59,040 crashes (
Table 1). Of these, 46.67% involved one vehicle, 45.88% involved two, and the remaining 7.45% involved more than two vehicles. In 42.27% of the vehicles, none of the occupants were injured as a result of the crash, whereas in 57.73%, at least one occupant was injured.
Table 2 presents the variables used in our analysis. The dataset contains information on the number of victims in each vehicle by level of BI severity level, differentiating between i) non-injury, ii) non-serious or slight injury, iii) serious injury, and iv) fatalities. Driver information includes age and gender. Vehicle information includes type, age, and number of occupants (including the driver). Other variables related to the accident, that is, crash type, road type, road conditions, and visibility, are also included.
The mean age of the drivers involved in the crashes was 41.4, and the mean vehicle age was 10.35. The mean number of occupants per vehicle was 1.41. Most occupants suffered non-serious injuries (average of 0.69 per vehicle), followed by occupants who did not suffer any injuries (average of 0.65 per vehicle), serious injuries (average of 0.06 per vehicle), and fatalities (average of 0.01 per vehicle).
The association between driver age and vehicle age was evaluated. Pearson’s correlation and rank-based measures of association between the two variables were computed and no significant association was found (Pearson’s correlation: 0.043; Spearman Kendall’s τ: 0.008, and Spearman’s ρ: 0.012). We next tested for an association between the mean age of the vehicle and the age of the drivers, respectively (
Figure 1). Although no association was detected,
Figure 1 seems to indicate that the mean vehicle age increased in the case of drivers over the age of 65. Association measures were again computed conditioned specifically to drivers over 65 but the values increased only slightly (Pearson’s correlation: 0.152; Spearman Kendall’s τ: 0.101, and Spearman’s ρ: 0.143). Thus, we conclude that no relevant association between the age of the driver and the age of the vehicle was detected. As is evident from the confidence intervals shown in
Figure 1 (dashed lines), dispersion around the mean age of the vehicle increased with the age of the driver, which could affect the results obtained.
Figure 1 also shows that younger drivers (probably reflecting the fact that drivers in the first few years after obtaining their license drive non-new vehicles) and older drivers tend to drive older vehicles.
5. Discussion
This study analyzed several road crash risk factors that affect the expected proportion of BI victims by level of severity, taking into account the dependence between the vehicles involved in the same crash. The observation unit employed in this analysis is the vehicle, and we consider the dependence between vehicles involved in the same crash, including its random effects in the regression. The model performance is found to improve when this dependence is taken into consideration. Thus, the inclusion of random effects captures, at least partially, the heterogeneity due to the involvement of more than one vehicle in the same crash.
When two or more vehicles are involved in the same crash, we might expect to derive a relationship between the damage they suffer respectively and the severity of injury of the victims. Several studies report the incidence and severity of injuries when different types of vehicle are involved, including passenger vehicles and trucks [
39,
40] and motorbikes and non-motorbike vehicles [
41], as well as the position of occupants inside the vehicle [
42,
43]. Dependence between the BI severity levels of those involved in the same crash can be especially relevant if we seek to predict the expected number of victims and their injury severity; for example, as a consequence of a safety policy or, more specifically, in the insurance context, when we wish to calculate provisions for the coverage of automobile claims. Methodologically, this objective is in line with previous studies that suggest that ignoring the possibility of including random parameters when estimating count-data models may affect the magnitude of the coefficients [
35,
36,
37,
38].
Our analysis pays special attention to the age of the driver and vehicle age as factors explaining the proportion of occupants presenting different levels of BI severity in a crash. We demonstrate that the relationship between these factors and the (transformed) dependent variable is nonlinear. Subsequently, both factors were redefined to reflect their association with the expected proportion of injured occupants.
In the case of the age of the driver, we found a quadratic relationship with the severity of injury of vehicle occupants. Indeed, in line with previous studies [
4,
44,
45], young and old drivers constituted the riskiest groups. Young drivers were associated with a high risk of accidents with non-serious injuries, whereas old drivers presented the highest risk in accidents with serious and fatal injuries. This does not, however, mean that older drivers are necessarily more dangerous drivers; rather, it seems to reflect the fact that older drivers (and their old passengers, too) are inherently more likely to be seriously injured in crashes due to physical frailty [
4,
46]. Previous studies have suggested that elderly road users need to be the increasing target of road safety policies [
6,
47,
48], especially because, in many countries, the number of such drivers is rising as a result of general population aging. Interestingly, as the number of older drivers becomes more significant, researchers have access to growing amounts of data about this group of drivers, opening up an important line of future research.
Vehicle age is also gaining attention in road safety research, with previous studies suggesting it is positively associated with driver age [
12,
49,
50]. Indeed, vehicles are becoming increasingly safer as a result of technology and safety advances implemented in the new generation of automobiles [
51]. Here, we found that the expected proportion of occupants injured by level of severity increases with vehicle age up to 18 years and then remains constant at the highest level. This finding is especially relevant in countries with old fleets of automobiles, such as Spain, where the average age of automobiles has risen from 7.65 in 2002 to 13.49 years in 2021 [
52]. In the EU, 2020 data indicate passenger cars are on average 11.5 years old [
53].
However, and despite the fact that here we have demonstrated the individual statistical significance of driver age and vehicle age when analyzing the proportion of victims in the vehicle by level of BI severity, we have not observed the individual significance for the joint effect between the two variables (i.e., interaction of driver and vehicle ages). As a result, the hypothesis that longevity in drivers and age in their vehicles are statistically correlated variables is rejected, as is the hypothesis that including the two variables as a covariate has a significant effect on the probability that the injuries resulting from the crash present a certain severity. However, the monitoring of both variables and their effects in forthcoming years constitutes an important line of research, considering the increasing longevity of drivers in countries such as Spain (with a marked expected growth also in the number of people aged 65 and over) and the continuous aging of its vehicle fleet [
52].
The rest of our results confirm conclusions previously presented in the literature. Male drivers are associated with accidents involving more serious injuries (for a review, see [
4]). Two-wheeled motor vehicles are more likely to be associated with serious or fatal injuries than four-wheeled or heavier vehicles [
21,
23], which is expected given they offer less protection to riders. Previous studies also suggest that heavy vehicles (pickup trucks, minivans, and sport utility vehicles or SUVs) are safer for their own occupants but cause more damage to the other vehicles involved in a crash [
18,
24]. A number of studies have found that driving in dark conditions increases expected accident severity [
28,
29,
30]. Sullivan and Flannagan [
28] concluded that the risk of fatal injury in pedestrians involved in crashes is 3 to 6.75 times higher in the dark than in daylight. Wanvik [
29] found that the risk of injury from accidents in darkness increases on average by 17% on lit rural roadways and by 145% on unlit rural roadways. Uddin and Huynh [
30] also confirm the importance of examining lighting conditions on rural and urban roadways as risk factors. Here, we have found that the expected proportions of slight, serious, and fatal injuries following an accident increase when visibility is less than optimum.
We found that non-optimal road surface conditions increase the expected proportion of slightly injured occupants, but in these same conditions the expected proportion of serious and fatal injured victims falls. Although an increase in crash BI severity might be expected with worsening road conditions (bad weather, poor road surfaces, etc.), unobserved factors, such as increased attention to driving, higher traffic density, and higher signaling rates, seem to have an opposite effect. Various studies have shown that the influence of good road conditions on traffic accidents and severity of injuries is unclear, with mixed results having been reported (see, for example, in [
54]).
The expected proportion of injured victims is higher on principal and minor roads than on local roads. Although the number of crashes in local areas is usually higher than on arterials and collectors, such accidents are associated with a lower BI severity [
55]. The type of crash analyzed has a direct influence on the expected proportion of injured occupants. When a vehicle is involved in a multi-vehicle collision, the expected proportion of injured occupants falls compared to the corresponding proportion for two-vehicle collisions. Collisions involving multiple vehicles (pile-ups) are more frequently rear impact crashes, which are associated with less severe BI outcomes [
25,
55]. Abu-Zidan and Eid [
25] report that injury severity among those involved in front and side impacts was double that of those involved in rear impacts. Likewise, the expected proportion of injured occupants falls when the type of crash is a run-over compared to the corresponding proportion for two-vehicle collisions. Note that in a traffic accident involving a pedestrian, the pedestrian is expected to sustain the highest BI damage [
56,
57], whereas the occupants of the vehicle (the focus of analysis in this study) are much more protected. Finally, when the vehicle is involved in a rollover, drop, or collision with an object, an increased proportion of injured occupants is expected for all levels of severity. In the literature, when crashes with victims are analyzed, single-vehicle crashes are frequently associated with more severe BI damage than collisions involving two or more vehicles [
25,
55,
58].
The high level of significance of most of our parameter estimates provides a good understanding of the effect of automobile and crash characteristics on the expected number of occupants injured by level of severity. However, our study is not without its limitations. The crash data used in the study are from 2016, which means that the analysis of posterior years would be helpful for understanding the dynamics of elderly drivers and aging vehicles in relation to crash severity. Although we are able to control for the heterogeneity attributable to multiple vehicles being involved in the same crash, other sources of unobserved heterogeneity are not controlled for here. For example, we estimated binomial BI severity models separately for the different levels of BI severity experienced by occupants, but some unobserved factors are likely to impact simultaneously all levels of severity. Additionally, relevant information for explaining the severity of the crash was not always available in the dataset. For instance, the age and position occupied by passengers in the vehicle, the use of safety measures, or the place where the crash occurred have been extensively studied as factors influencing crash severity [
33,
59,
60]. Here, these factors, as well as a lack of information about driving behavior, contribute to unobserved heterogeneity. Indeed, telemetric research points to a close relationship between driving behavior and crash severity [
51,
60,
61,
62,
63,
64]. The incorporation of driving behavior information into the model could differentiate aspects that would further understanding of the influence of traditional risk factors. For example, a better understanding of the driving behavior of old drivers might help to distinguish the proportion of the higher crash severity risk attributable to declining skills and the proportion associated with increased physical frailty.