Data Mining-Based Collision Scenarios of Vehicles and Two Wheelers for the Safety Assessment of Intelligent Driving Functions

Wang, Rong; Qian, Yubin; Dong, Honglei; Yu, Wangpengfei

doi:10.3390/wevj14100284

Open AccessArticle

Data Mining-Based Collision Scenarios of Vehicles and Two Wheelers for the Safety Assessment of Intelligent Driving Functions

by

Rong Wang

¹,

Yubin Qian

^1,*,

Honglei Dong

² and

Wangpengfei Yu

¹

School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science, Shanghai 201620, China

²

Key Laboratory of Product Defect and Safety for State Market Regulation, Beijing 100101, China

^*

Author to whom correspondence should be addressed.

World Electr. Veh. J. 2023, 14(10), 284; https://doi.org/10.3390/wevj14100284

Submission received: 15 September 2023 / Revised: 3 October 2023 / Accepted: 7 October 2023 / Published: 9 October 2023

Download

Browse Figures

Versions Notes

Abstract

:

The safety performance test of intelligent driving vehicles needs to rely on the collision scenarios in a real road traffic environment. In order to study the collision scenarios and accident characteristics of vehicles and two wheelers (TWs) in line with the complex traffic conditions in China, this paper proposes using clustering analysis to initially cluster traffic accident data to obtain the base scenarios and then applying the association rule algorithm to each base scenario to obtain the potential connection of its accident attributes and describe the collision scenarios in more detail. This study is based on data from 335 vehicle and two-wheeler crashes in the National Automobile Accident In-Depth Investigation System (NAIS). It used clustering analysis to cluster the crash data into different partitions to obtain eight clusters of vehicle and two-wheeler base scenarios and applied association rules to analyze the rest of the accident attributes, revealing common crash characteristics to describe the base scenarios in more detail. In the end, it constructed eleven types of detailed vehicle and two-wheeler collision scenarios covering straight roads, intersections, and T-junctions. The results provide richer and more suitable crash scenarios of vehicles and two wheelers in China’s complex traffic and is an important reference for the development of intelligent driving testing scenarios in the future.

Keywords:

traffic accident; collision scenarios; accident characteristics; association rules; clustering analysis

1. Introduction

In recent years, road accident safety has been a major concern, with the World Health Organization’s Global Status Report on Road Safety 2018 showing that around 1.35 million people die in road accidents each year. Of particular concern is the fact that about 23 percent of all road traffic deaths are related to two-wheeler (TW) accidents [1]. In China, the number of accidents involving TWs has increased year by year in the past five years. According to the data from the National Bureau of Statistics [2], there were 86,881 accidents involving TWs, and the number of casualties was as high as 115,940 in 2021. Consequently, the traffic safety of drivers of TWs as vulnerable road users (VRUs) needs to be solved urgently. Advanced Driver Assistance Systems (ADAS) can effectively reduce the risk of road traffic accidents [3]. With the continuous development and application of ADAS, constructing collision scenarios through real traffic accident data has become a vital link in the research on the functional safety of intelligent driving vehicles [4]. However, intelligent driving still faces significant challenges, and its further development must be evaluated by testing in more complex driving environments [5]. There are significant differences between China’s road traffic environment and that of foreign countries; the accident scenarios involving TWs are more complex and diverse. Therefore, it is necessary to conduct an in-depth study of the accidental collision scenarios of TWs in China in order to promote the application and development of the safety functions of the intelligent driving system in China and improve the safety of traffic accidents between vehicles and TWs.

Clustering algorithms, as a kind of data mining technique [6], can reduce the influence of researchers’ subjective opinions on results when performing big data analysis with high reproducibility, and have been widely used to extract collision scenarios from traffic accident data. Li et al. [7] initially employed systematic clustering and chi-square tests to identify seven hazardous scenarios involving TWs and used PreScan to build relevant testing scenarios. Hu et al. [8] considered factors such as the degree of accident casualties, motion states, and vehicle speeds for different scenarios. They derived motion state characteristics for 11 categories of car-to-TW collision scenarios in a detailed manner. Zhou et al. [9] developed kinematic models based on the fundamental scenarios obtained from clustering accident data of car-to-TWs at intersections and derived five sets of hazardous scenarios. Xu et al. [10] utilized multivariate logistic regression to investigate the influencing factors of accident severity and identify feature elements for testing scenarios, and subsequently used these feature elements as clustering parameters to extract eight types of intersection testing scenarios. Building on the China In-Depth Accident Study (CIDAS) dataset, Sui et al. [11] applied the K-medoid clustering algorithm to cluster 672 accident cases involving cars and TWs, resulting in 6 common collision scenarios; these were then compared to the 4 typical scenarios obtained by Cao et al. [12]. Wang et al. [13] employed 239 crash cases from the China In-Depth Mobility Safety Study–Traffic Accident (CIMSS-TA). They summarized six functional scenarios using the K-medoids clustering based on seven collision characteristics; additionally, they established dynamic parameters for collision trajectory analysis during hazardous moments and generated testing scenarios suitable for autonomous driving. Pan et al. [14] conducted an in-depth analysis of traffic accident data with monitored videos. They classified TWs into three typical types and used clustering analysis and accident characteristics to identify collision scenarios involving various types of TWs and cars.

Currently, research mainly focuses on extracting testing scenarios from accident data but lacks the exploration of accident attribute relationships and potential features. The obtained scenarios are relatively simple and lack the consideration of complex traffic environmental factors, such as vehicle lanes, road speed limits, and road infrastructure. Traditional clustering algorithms are commonly used as research methods, but they face challenges in achieving clear data classification when dealing with a large number of variables [15], which may result in the loss of important variable association scenarios [6]. Moreover, analyzing road accident data requires consideration of data heterogeneity; otherwise, certain relationships between the data may remain hidden [16]. Association rules, as a data mining technique, can extract the hidden relationships between various attributes in a large amount of data and are widely used in traffic accident data analysis [17]. Xu et al. [18] used association rules for serious casualty traffic accidents (accidents with more than 10 fatalities) in China to reveal the accident factors that often occur together and their interdependence to determine the characteristics of serious casualty accidents. Das et al. [19] identified factors and hidden features affecting fatal pedestrian crashes at intersections in the United States by applying association rule mining to detailed pedestrian accident data to help understand the collision scenarios of pedestrian accidents at intersections. However, direct computing association rules on the entire dataset led to an enormous number of rules that are difficult to interpret. Kumar [20] demonstrated that conducting clustering analysis on the dataset before applying association rules can partially eliminate the heterogeneity in traffic accident data and mitigate the issue of having a high number of difficult-to-interpret rules. Nitsche et al. [21] investigated the key scenarios and collision characteristics of traffic accidents at UK junctions by means of K-medoids and association rules, obtained clusters of collisions under thirteen types of T-junctions and six types of intersections, and identified twelve pre-crash scenarios at junctions, taking into account clusters of high-injury outcomes of the accidents.

To the best of our knowledge, there are no studies that use association rules to mine accident characteristics of vehicle-to-TW collisions while extracting vehicle-to-TW collision scenarios through clustering analysis. Consequently, in this paper, the combination of clustering analysis and association rules is used for the first time to analyze vehicle-to-TW accident data. It aims to gain insights into the accident characteristics and key patterns of vehicle-to-TW collision scenarios in China and to provide scenario references for the assessment of intelligent driving safety performance in China based on accident data in the Songjiang district of Shanghai from NAIS. First, the base scenario is obtained by initially dividing the traffic accident data through clustering analysis. Then, the association rules algorithm is applied to the base scenarios to generate the rest of the more detailed accident attributes, ultimately constructing vehicle-to-TW collision scenarios that are suitable for complex traffic conditions in China.

2. Data Sources and Scenarios Feature Element Extraction

2.1. Sources of Accident Data

The accident data in this paper comes from about 800 traffic accidents in Songjiang District, Shanghai, collected by NAIS during 2018–2021. There were more than 400 accidents involving TWs, accounting for about 54%. Considering the complexity of vehicle-to-TW traffic accidents, this paper screens accident cases by the following conditions:

A collision between a vehicle (car, SUV, and MVP) and a TW;
The type of road is straight, an intersection, or a T-junction;
The motions of the vehicle and TW were limited to traveling straight ahead, turning, and others (the driver was waiting to turn left, reversing, performing a U-turn, or overtaking);
Vehicle-to-TW rear-end accidents were ruled out.

Consequently, 335 real accident cases were selected to analyze the collision scenarios of vehicle-to-TW accidents.

2.2. Accident Variable Extraction and Coding

Traffic accidents are caused by “human-vehicle-road-environment” interactions [14]. The purpose of this study is to extract vehicle-to-TW collision scenarios in real traffic environment. Therefore, considering the different variables in the four elements and the demand for establishing subsequent testing scenarios, the scenarios should be accurately and adequately described using as few variables as possible [21]. Combined with previous studies [7,8,9,10], eleven variables in the four main elements of “human-vehicle-road-environment” were selected to state the vehicle-to-TW collision scenarios. The different variables corresponding to the four main elements are shown in Figure 1.

The performance of cluster analysis can be affected by high dimensional data [21], and all variables were divided into two groups for clustering analysis and association rules, respectively. The names, attributes, and frequencies of the variables are shown in Table 1 and Table 2. The clustering variables (in Table 1) mainly include the kinematic state of the participants prior to the collision and environmental variables [22] (weather, light, and so on), with a total of five variables and seventeen attributes. The variables used for the association rule mining (in Table 2) include more detailed accident variables related to the road infrastructure and so on (injury severity of the TW rider, speed limit on the road, and so on), with a total of six variables and thirty-three attributes. Due to the small number of accidents in which vehicles were subjected to collision forces in the 3–9 o’clock directions in the accident data of this study, the 3–9 o’clock directions are given together in Table 2.

Injury severity of the TW rider is classified into four levels according to the Maximum Abbreviated Injury Scale [23] (MAIS): uninjured (MAIS 0), slight (MAIS 1–2), serious (MAIS 3–5), and fatal (MAIS 6). Vehicles traveling in lanes adjacent to a non-motorized carriageway or curb are the inside lane; otherwise, they are in the outside lane, adding to the number of lanes in the direction that the vehicle is traveling. The motion of the TW relative to the vehicle is divided into left and right on the axis of the vertical center of the vehicle. The direction of the collision force on a vehicle is divided into 12, which are called 1–12 o’clock directions, as shown in Figure 2. The division is made by taking the vehicle as the center and dividing the position of the first collision point between the TW and the vehicle relative to the vehicle in 30° steps. The vehicle directly in front of the vehicle corresponds to the 12 o’clock direction, the rest are in a clockwise direction from 1–11 o’clock.

Clustering analysis is an algorithm for categorization based on distance or similarity, and it is important to avoid the effect of unequal distances between different attributes on the similarity between samples. The above variables are discrete categorical variables and need to be coded to ensure that the distances between attributes are measurable during clustering analysis and that the distances between the same attributes in the same clustered variable are zero, while the distances between different attributes are equal. In this paper, the variables are coded using one-hot encoding [24], which is a common form of encoding in machine learning. It can extend the unordered categorical variable taking values into the Euclidean space and indicate the state of the variable with the binary code 0, 1.

3. Data Mining Methods

3.1. Hierarchical Clustering

Clustering analysis is an unsupervised learning method for discovering clustering effects among data. It can greatly reduce the influence of researchers’ subjective opinions on the scenario classification results and is highly reproducible. In this paper, one of the most common clustering algorithms, hierarchical clustering, is used to cluster 335 accident cases in order to obtain the base scenarios. The steps of the hierarchical clustering algorithm are as follows:

Each sample is a separate cluster;
The distance between different samples is calculated, and the two samples with the closest distance are combined into one cluster;
Calculate the distance between the different clusters, combining two closest clusters into one new cluster;
Keep repeating step 3 until all the samples are clustered into one cluster.

The distance between different samples is calculated using Euclidean distance. Each sample contains

m

variables, where the

i

th sample can be represented as:

X_{i} = {(X}_{i 1}, X_{i 2}, \dots, X_{i m}),

(1)

where

X_{i m}

denotes the value (0 or 1) of the

m

th variable in the

i

th sample, and the distance between samples

i

and

j

is:

d_{i j} = \sqrt{\sum_{k = 1}^{m} {|X_{i k} - X_{j k}|}^{2}},

(2)

The distance between the different clusters is calculated using Ward’s method. First, calculate the within-cluster sum of squares of deviations for each cluster separately; then, select two clusters to be merged into one. Since the sum of squares of deviations increases after reducing the number of clusters by one, the two clusters with the smallest increase in the sum of the squares of deviations are chosen for merging.

The within-cluster sum of squares of deviations of the samples is as follows:

S_{Q} = \sum_{i = 1}^{n_{Q}} (X_{i Q} - {\bar{X}}_{Q})^{'} (X_{i Q} - {\bar{X}}_{Q}),

(3)

where

n_{Q}

is the number of samples in the cluster

C_{Q}

,

X_{i Q}

is the

i

th sample in the cluster

C_{Q}

, and

{\bar{X}}_{Q}

is the centroid of the

C_{Q}

.

The distance between clusters is:

D_{L Q} = S_{R} - S_{L} - S_{Q},

(4)

where

S_{R}

and

S_{L}

are the sum of squares of deviations for clusters

C_{R}

and

C_{L}

, and

C_{R}

is a merged cluster of

C_{L}

and

C_{Q}

.

3.2. Association Rules Mining

Association rules mining is a popular method of data analysis in road traffic safety research [25,26,27]. Association rules mining, also known as “frequent item mining”, is widely used to discover associations between incident attributes [28]. Each sample in the association rules is called a transaction

(t_{1}, t_{2}, \dots, t_{n}) \in T

, where each attribute is an item

(i_{1}, i_{2}, \dots, i_{m}) \in I

. The rule term can be expressed as

A \to B

, where A represents the antecedent and B represents the consequent; meanwhile,

A \in I

and

B \in I

. It is worth noting that these rules represent associative relationships between attributes and cannot be interpreted as causal relationships between antecedent and consequent [29].

In this paper, we used the Apriori algorithm for association rules. Apriori is one of the most commonly used association rule algorithms in the field of traffic accident data analysis. The steps of the Apriori algorithm are as follows. First, find all the frequent itemsets that satisfy the minimum support; then, generate strong association rules from these frequent itemsets that satisfy the minimum confidence. Support is the frequency of a rule that represents the importance of the rule. Higher support thresholds are the rules that also occur more frequently, while lower support thresholds may result in more rules, but the rules may not occur frequently enough and may not be representative. Confidence represents how reliable the rule is. Higher confidence thresholds produce more strongly correlated rules, while lower confidence thresholds may result in more rules but less reliable rules. Therefore, appropriate support and confidence thresholds need to be chosen to ensure that a moderate and representative number of rules is mined.

Support (A \to B) = \frac{n (A B)}{n},

(5)

Confidence (A \to B) = \frac{Support (A \to B)}{Support (A)},

(6)

In the field of road traffic safety, regarding the setting of support and confidence thresholds, different studies have set different thresholds according to the research purpose and sample size [18,19,25]. The main objective of the association rules in this paper is similar to Nitsche’s study [21], which obtains detailed relevant accident attributes for collision scenarios, and the samples for conducting the association rule in this paper are small. Therefore, the minimum support is 0.1 by conducting experiments on different thresholds, and the minimum confidence is 0.75 based on the values taken in the study [21].

Lift, also known as ”interestingness”, is a metric to measure the degree of correlation between antecedent and consequent in a rule [28]. It reflects the probability of simultaneous occurrence of a consequent under the given condition of an antecedent. A rule is considered to be strongly correlated if the Lift > 1. Therefore, strong association rules with Lift > 1 were further screened in this study.

Lift (A \to B) = \frac{Confidence (A \to B)}{Support (B)}

(7)

4. Base Scenarios and Rules Mining

4.1. Accident Data Clustering

In this paper, we used the silhouette analysis [30] to assess the effectiveness of clustering and obtain the optimal results of clustering analysis; it can help us to analyze the cohesiveness and separation of clusters. Each cluster is expressed by a silhouette coefficient, and silhouette coefficients close to 1 indicate better clustering results for that cluster. The average silhouette width (ASW) is the average of the silhouette coefficients under the current number of clusters, which is used to select the most appropriate number of clusters; the larger the ASW, the higher the clustering validity. Although the ASW values gradually become larger as the number of clusters increases, there is a subsequent need to calculate the association rules for each cluster. Therefore, the number of samples in each cluster is not less than 30 to ensure that the association rule analysis is supported by sufficient sample data to reveal the relationship between different attributes. In this study, the optimal number of clusters was determined by comparing the ASW and minimum sample size for different numbers of clusters. Figure 3a shows the ASW and the minimum sample size in clusters for different numbers of clusters k = 2 to k = 15; the clustering results with k > 8 are excluded based on the minimum sample size. Further, the ASW in the clustering results was analyzed. The highest ASW = 0.29 for k = 8, and the overall clustering result was better, so the number of clusters k = 8.

The silhouette values of all samples in each cluster when k = 8 are shown in Figure 3b. Samples with negative silhouette values may be assigned to the wrong clusters. C2, C4, C6, C7, and C8 all have samples that may be assigned to the wrong clusters, but the overall number of incorrect samples is low, which indicates that the vast majority of the samples were assigned to the correct clusters and better reflects the similarity between samples in the same cluster. In addition, C1, C3, and C5 do not have negative silhouette value samples and have larger overall silhouette values, so their accident characteristics are more obvious.

The inconsistency coefficients of the cluster analyses were further examined in order to enhance the confidence for the selection of the number of clusters [6]. The larger the increase in the inconsistency coefficient, the better the last clustering. As shown in Figure 4, the inconsistency coefficient corresponding to the 328th clustering has increased substantially from the inconsistency coefficient of the 327th clustering. Therefore, the 327th clustering is more effective, which means that the number of clusters is 8.

The clustering variables in Table 1 were selected to obtain the 8 clusters of accident base scenarios of vehicle-to-TW accidents by clustering the 335 vehicle-to-TW accidents, as shown in Table 3, where the grey table represents the accident attributes that account for greater than 80% of each characteristic variable for each cluster as the main characteristics of that base scenario. Based on the clustering results in Table 3, it can also be concluded that each of the accident attributes of C1 and C3 are more obvious.

C1 and C2 are accident scenarios under straight road. C1 is the most numerous type of scenario dataset with a total of 53 crashes, accounting for 15.8%, which is mainly for the vehicle-to-TW vertical collision during sunny weather, the day, and straight road. The difference between C2 and C1 is the weather, but a clear delineation of the light in the accidents in C2 was not formed. C3, C4, and C7 are all accident scenarios at intersections; all of the accidents in C3 were vehicle straight ahead and TW straight ahead during sunny weather, the day, and intersections; C4 occurred during cloudy weather, when vehicle was traveling straight ahead or turning left, while TW was traveling straight ahead, resulting in a conflict; C7 is clearly an intersection accident scenario during a well-lit night of a “TW crossing the road”. C6 occurred at a T-junction during the day, when the vehicle was traveling straight ahead or turning left and made contact with TW traveling straight ahead. C5 is an accident scenario in rainy/snowy weather on a straight road or an intersection. C8 is characterized by accidents with no obvious segmentation of road type but is the only cluster that represents an accident scenario where a vehicle is traveling straight ahead and a TW is making a left turn. The above eight base scenarios basically cover the vehicle-to-TW standard conflict scenarios in the Euro–NCAP [31] and highlight the collision characteristics of vehicle-to-TW accidents that are unique to China: nighttime scenarios and scenarios in which the vehicle or TW turns.

4.2. Collision Scenarios Derived from Association Rules

For each cluster base scenario, association rules were computed using the attributes given in Table 2. Due to the large number of rules obtained by the Apriori algorithm for the eight clusters in this paper, we do not give all of them. The several clusters of base scenarios were selected to calculate the rules based on accident variables (road type and motion of the vehicle and TW) that reflect the key collision scenarios, and the degree of clustering was also considered in order to make the accident characteristics of the clustered variables in the collision scenarios more obvious.

Among clusters C1, C2, C3, C4, C6, and C7, in which straight, intersection, and T-junction are the salient features, we have, respectively, chosen the best divisions cluster in each road type: C1, C3, and C6. Only C8 of the eight clusters is a vehicle that is straight ahead and a TW turning left accident. The last remaining cluster, C5, has a strong similarity of samples within the cluster, with all accident characteristics except road type being more prominent, so the other accident attributes of C8 and C5 were further investigated. Based on the above analyses, five clusters were identified to apply the association rules: C1, C3, C5, C6, and C8.

The association rules aim to obtain the accident attributes, so we analyzed only two-item and three-item rules. As an example, C3 further explains its association rule results. C3 generated 29 rules, as shown in Table 4, which gives the antecedent, consequent, support, confidence, and lift of the rules, and the rules are sorted by support values. Each rule consists of an antecedent and a consequent, which are expressed as “short name of variable = code of attribute”. These rules represent the degree of association and dependency of Table 2 accident attributes in cluster C3, but they do not represent causal relationships between accident attributes. The motion of the TW relative to the vehicle in the association rule variables combined with the motion of the vehicle and TW in the clustering variables can provide a clear indication of the collision. Therefore, other accident attributes associated with the motion of the TW relative to the vehicle are mainly analyzed in the rules.

As can be seen in Table 4, the rule with the highest support is the “Motr = L and Injury = Sei”. The TW ran out of the left side of the vehicle, and the cyclist was serious, which is also related to the fact that the vehicle was traveling on the road with a speed limit of “40 mph” (Splim = 40 mph, rule No. 18), the lanes in the direction of travel were located in the “Outside lane of a dual carriageway” (Lane = Odc, rule Nos. 4 and 5), and the road center separation is “Central green belt” (Rcensep = Cgb, rule No. 7). Rule No. 3 is “Motr = R and Splim = 60 mph”, which means that a vehicle is traveling on a road with a speed limit of 60 mph and a TW comes out from the right side of the vehicle, which is related to “Injury = Sei” (rule No. 8), and “Motr = R” is related to “Injury = Sli” (rule No. 10) and “Dirt = O1” (rule Nos. 25, 26, and 27). As a result, C3 derives two collision scenarios: C3.1 and C3.2, where the direction of motion of the TW relative to the vehicle (Motr = L or Motr = R) is used as the dividing variable for the derived collision scenarios.

The accident characteristics of the final collision scenario are a combination of the accident attributes of each base scenario and the remaining accident attributes obtained from the association rules. For example C3.1, the accident attributes of the base scenario C3 (intersection, sunny, day, straight ahead, straight ahead), together with the remaining accident attributes obtained by the association rules (serious injury, outside of dual carriageway, left, 40 mph, central green belt, and 11 o’clock direction), give a detailed description of the final collision scenario C3.1. C3.1 is at an intersection on a sunny day when a vehicle driving straight ahead on the outside of the dual carriageway by the central green belt separation and the road speed limit is 40 mph, the TW from the vehicle’s left side of the straight run, the TW driver was seriously injured, the vehicle was hit at the 11 o’clock direction collision force. C3.2 is at an intersection on a sunny day when a vehicle driving straight ahead on a road with a speed limit of 60 mph, the TW from the vehicle’s right side of the straight run, the TW driver was seriously or slightly injured, and the direction of the collision force on the vehicle is at the 1 o’clock direction.

Due to the subsequent need to explain collision scenarios based on road type, the road type in C5 and C8 did not form an obvious classification, and it was considered to add the road type to the association rules variables of C5 and C8. According to the association rules, C1, C3, C5, C6, and C8 derive a total of 11 collision scenarios, and each collision scenario should contain 11 accident attributes. Based on the motion of the vehicle and the TW and the motion of the TW relative to the vehicle, all schematic diagrams of collision scenarios are shown in Figure 5. Figure 5a–c show the collision scenarios under straight, intersection, and T-junction, respectively. A is a vehicle and B is a TW. The remaining nine accident variables (such as weather, speed limit, and injury severity of a TW rider) other than the motion of the vehicle and TW are given in Table 5; they are arranged according to the scenarios under different road types. The probability of accident attributes occurring at the same time is higher in different scenarios, so the attributes should be taken into account when establishing testing scenarios to occur at the same time. The “/” in the table is the attribute of the variable not appearing because the thresholds for confidence and support are too high or the dataset has a small sample. These can be used as the changing attributes in the testing scenario, while the other attributes constitute the “static” elements of the scenario.

In Table 5, it can be seen that the injury severity of a TW rider is mostly serious, but “Fatal” appears in the derived scenarios C5.1 and C5.2 of the accident dataset C5 under rainy and snowy weather, which is due to the impact of bad weather on the consequences of the accident. C1.1, C1.2, C5.1, and C8.1 are collision scenarios on straight roads, where vehicles are traveling in a single carriageway. C3.1, C3.2, C5.2, and C8.2, are intersection collision scenarios in which vehicles are traveling on a dual carriageway (mostly the outside of a dual carriageway). When a vehicle is traveling in the outside lane of a dual carriageway and a TW is exiting from the vehicle’s left side, there is a collision scenario due to dynamic field of view occlusion caused by vehicles in the inside lane (C3.1, C5.2). C6.1, C6.2, and C8.3 are the three collision scenarios associated with T-junctions, and most of the accident attributes are similar to the intersection collision scenarios. In Table 5, some of the frequently occurring accident attributes are sunny weather, the day, Sei (serious injury), Sc (single carriageway), Odc/Idc (outside/inside of a dual carriageway), 60 mph, and O12 (12 o’clock), which highlight some characteristics of the vehicle-to-TW collision scenarios.

Vertical collision scenarios (where the vehicle is traveling straight ahead and the TW is crossing the road) are the most common scenarios for vehicle-to-TW collision scenarios, which are consistent with the results of previous studies [11,12]. For vertical collision scenarios, TWs coming from the left and right sides of the vehicle cases are generally unified as vertically orientated oncoming TWs [11]. However, the real traffic environment corresponding to the right and left oncoming traffic is different and the proportions of TWs appearing from the left and right sides of the vehicle is different [32], so the accident attributes related to the motion of the TW relative to the vehicle are extracted by the association rules and the final vertical collision scenarios (left or right) are obtained separately, such as in C1.1 and C2.1.

Certain accident attributes are different in the left and right vertical collision scenarios deduced through the association rules, such as the direction of collision force on the vehicle and the injury severity of the TW rider. But the specific effect of the motion of the TW relative to the vehicle in the accident should also be further investigated in depth in order to consider whether it is necessary to separate the right and the left sides when establishing the testing scenarios in the future.

5. Conclusions

This paper reports the use of clustering analysis and association rule data mining algorithms to analyze vehicle-to-TW accident data and construct collision scenarios. This approach not only extracts the collision scenarios from real accident data but also obtains the hidden relationships between accident attributes through association rules. The clustering analysis initially classified the accident data and obtained eight clusters of typical base scenarios based on road type, weather, light, motion of vehicle, and motion of TW. In addition, this study also considered accident attributes such as road speed limit, vehicle traveling lane, and the motion of the TW relative to the vehicle and obtained the rest of the strongly related accident attributes of the base scenarios using the association rules to further describe the scenarios in detail. Some key attributes such as serious injuries of the TW rider, a road speed limit of 60 mph, and the direction of collision force on a vehicle in the 12 o’clock direction often appeared in the generated rules. This reveals the potential collision characteristics of vehicle-to-TW accidents in complex traffic environments in China, ultimately resulting in 11 categories covering vehicle-to-TW collision scenarios on straight roads, intersections, and T-junctions.

The results of this study support the existing findings on accident safety of vehicle-to-TW accidents. The collision scenarios obtained in this paper help to reduce the number of possible variations in accident attributes, such as vehicle trajectory, road speed limit, and the number of lanes, when building intelligent driving testing scenarios. This study provides a reference for the establishment of vehicle-to-TW testing scenarios for intelligent driving functional safety assessment.

Author Contributions

Conceptualization, R.W. and Y.Q.; methodology, R.W.; software, R.W.; validation, R.W., Y.Q., H.D. and W.Y.; formal analysis, R.W.; investigation, R.W.; resources, Y.Q.; data curation, R.W.; writing—original draft preparation, R.W.; writing—review and editing, R.W.; visualization, W.Y.; supervision, Y.Q.; project administration, H.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Central Fundamental Scientific Research Operating Expenses Project, grant number (282022Y-9463), the Science and Technology Programme Project of the State Administration for Market Supervision and Administration of China, grant number (2022MK183), and the Applied Research on Vehicle Defect Analysis and Determination Technology Based on the In-depth Investigation of Vehicle Accidents (Songjiang, Shanghai), grant number ((20)JQ-023).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

World Health Organization. Global Status Report on Road Safety: Summary; World Health Organization: Geneva, Switzerland, 2018. [Google Scholar]
National Bureau of Statistics. Annual Traffic Accident Data [2022-09]. Available online: http://www.stats.gov.cn/sj/ndsj/2022/indexch.htm (accessed on 10 February 2022).
Han, I. Scenario establishment and characteristic analysis of intersection collision accidents for advanced driver assistance systems. Traffic Inj. Prev. 2020, 21, 354–358. [Google Scholar] [CrossRef] [PubMed]
Bing, Z.; PeiXing, Z.; Hong, C.; Xu, Z. Research progress on scene-based virtual test of autonomous driving vehicles. China J. Highw. Transp. 2019, 32, 1–19. [Google Scholar]
Wang, R.; Zhu, Y.; Zhao, X. Research progress on test scenario of autonomous driving. J. Traffic Transp. Eng. 2021, 21, 21–37. [Google Scholar]
Tan, Z.; Che, Y.; Xiao, L.; Hu, W.; Li, P.; Xu, J. Research of fatal car-to-pedestrian precrash scenarios for the testing of the active safety system in China. Accid. Anal. Prev. 2021, 150, 105857. [Google Scholar] [CrossRef] [PubMed]
Li, L.; Zhu, X.; Liu, Y.; Ma, Z. Typical traffic danger scenes involving cyclists. J. Tongji Univ. Nat. Sci. Ed. 2014, 42, 1082–1087. [Google Scholar]
Hu, L.; Yi, P.; Huang, J.; Zhang, X.; Lei, B. Research on automatic emergency braking system two-wheeled vehicle test scenario based on real accident cases. Automot. Engine 2018, 40, 1435–1446+1453. [Google Scholar]
Zhou, H.; Zhang, Q.; Mu, Y.; Tan, Z.; Sun, Q.; Zhang, D. Clustering and deduction of typical dangerous scenarios between passenger vehicles and two-wheelers at crossroads. China Saf. Sci. J. 2020, 30, 100–107. [Google Scholar]
Xu, X.; Zhou, Z.; Hu, W.; Xiao, L.; Li, W.; Wang, S. Intersection test scenarios for AEB based on accident data mining. J. Beijing Univ. Aeronaut. Astronaut. 2020, 46, 1817–1825. [Google Scholar]
Sui, B.; Lubbe, N.; Bargman, J. A clustering approach to developing car-to-two-wheeler test scenarios for the assessment of Automated Emergency Braking in China using in-depth Chinese crash data. Accid. Anal. Prev. 2019, 132, 105242. [Google Scholar] [CrossRef]
Cao, Y.; Xiao, L.; Dong, H.; Wang, Y.; Wu, X.; Li, P.; Qiu, Y. Typical pre-crash scenarios reconstruction for two-wheelers and passenger vehicles and its application in parameter optimization of AEB system based on NAIS database. In Proceedings of the International Conference on Enhanced Safety of Vehicles, Eindhoven, The Netherlands, 10–13 June 2019. [Google Scholar]
Wang, X.; Peng, Y.; Xu, T.; Xu, Q.; Wu, X.; Xiang, G.; Yi, S.; Wang, H. Autonomous driving testing scenario generation based on in-depth vehicle-to-powered two-wheeler crash data in China. Accid. Anal. Prev. 2022, 176, 106812. [Google Scholar] [CrossRef]
Pan, D.; Han, Y.; Jin, Q.; Wu, H.; Huang, H. Study of typical electric two-wheelers pre-crash scenarios using K-medoids clustering methodology based on video recordings in China. Accid. Anal. Prev. 2021, 160, 106320. [Google Scholar] [CrossRef] [PubMed]
Ren, L.; Xia, H.; Jiang, C.; Fan, T.; Zhao, T. Construction of autonomous emergency braking system test scenarios based on traffic accident data. Sci. Technol. Eng. 2022, 22, 10737–10747. [Google Scholar]
Montella, A.; de Oña, R.; Mauriello, F.; Riccardi, M.R.; Silvestro, G. A data mining approach to investigate patterns of powered two-wheeler crashes in Spain. Accid. Anal. Prev. 2020, 134, 105251. [Google Scholar] [CrossRef] [PubMed]
Meißner, K.; Rieck, J. Strategic planning support for road safety measures based on accident data mining. IATSS Res. 2022, 46, 427–440. [Google Scholar] [CrossRef]
Xu, C.; Bao, J.; Wang, C.; Liu, P. Association rule analysis of factors contributing to extraordinarily severe traffic crashes in China. J. Saf. Res. 2018, 67, 65–75. [Google Scholar] [CrossRef]
Das, S.; Tamakloe, R.; Zubaidi, H.; Obaid, I.; Alnedawi, A. Fatal pedestrian crashes at intersections: Trend mining using association rules. Accid. Anal. Prev. 2021, 160, 106306. [Google Scholar] [CrossRef]
Kumar, S.; Toshniwal, D. A data mining framework to analyze road accident data. Big Data 2015, 2, 26. [Google Scholar] [CrossRef]
Nitsche, P.; Thomas, P.; Stuetz, R.; Welsh, R. Pre-crash scenarios at road junctions: A clustering method for car crash data. Accid. Anal. Prev. 2017, 107, 137–151. [Google Scholar] [CrossRef]
Distefano, N.; Leonardi, S. A list of accident scenarios for three legs skewed intersections. IATSS Res. 2017, 42, 97–104. [Google Scholar] [CrossRef]
Ferreira, S.; Amorim, M.; Couto, A. Risk factors affecting injury severity determined by the MAIS score. Traffic Inj. Prev. 2017, 18, 515–520. [Google Scholar] [CrossRef]
Kunanbayev, K.; Temirbek, I.; Zollanvari, A. Complex Encoding. In Proceedings of the 2021 International Joint Conference on Neural Networks(IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–6. [Google Scholar]
Montella, A.; Aria, M.; D’Ambrosio, A.; Mauriello, F. Analysis of powered two-wheeler crashes in Italy by classification trees and rules discovery. Accid. Anal. Prev. 2012, 49, 58–72. [Google Scholar] [CrossRef] [PubMed]
Das, S.; Dutta, A.; Avelar, R.; Dixon, K.; Sun, X.; Jalayer, M. Supervised association rules mining on pedestrian crashes in urban areas: Identifying patterns for appropriate countermeasures. Int. J. Urban Sci. 2019, 23, 30–48. [Google Scholar] [CrossRef]
Chen, L.; Huang, S.; Yang, C.; Chen, Q. Analyzing factors that influence expressway traffic crashes based on association rules: Using the shaoyang-xinhuang section of the shanghai-kunming expressway as an example. J. Transp. Eng. Part A 2020, 146, 05020007. [Google Scholar] [CrossRef]
Guillaume, S.; Guillet, F.; Philippe, J. Improving the discovery of association rules with intensity of implication. In Principles of Data Mining and Knowledge, Discovery Lecture Notes in Computer Science; Żytkow, J.M., Quafafou, M., Eds.; Springer: Berlin/Heidelberg, Germany, 1998; Volume 1510, pp. 318–327. [Google Scholar]
Kong, X.; Das, S.; Jha, K.; Zhang, Y. Understanding speeding behavior from naturalistic driving data: Applying classification based association rule mining. Accid. Anal. Prev. 2020, 144, 105620. [Google Scholar] [CrossRef] [PubMed]
Peter, R.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar]
Euro NCAP. European New Car assessment Programme (Euro NCAP)-Test Protocol AEB/LSS VRU Systems. 2023. Available online: https:/cdn.euroncap.com/media/77299/euro-ncap-aeb-lss-vru-test-protocol-v44.pdf (accessed on 12 August 2023).
Han, Y.; Li, Q.; He, W.; Wan, F.; Wang, K.B. Mizuno Analysis of vulnerable road user kinematics before/during/after vehicle collisions based on video recordings. Proc. IRCOBI Conf. 2017, 13, 15. [Google Scholar]

Figure 1. Variables of the vehicle-to-TW collision scenario.

Figure 2. Twelve directions of collision force on a vehicle.

Figure 3. Silhouette analysis plot: (a) ASW values and minimum sample sizes for different number of clusters; (b) silhouette values for k = 8.

Figure 4. The inconsistency coefficient by clustering.

Figure 5. Vehicle-to-TW collision scenarios: (a) vehicle-to-TW collision scenarios on a straight; (b) vehicle-to-TW collision scenarios in an intersection; (c) vehicle-to-TW collision scenarios at a T-junction. (A is a vehicle, B is a TW).

Table 1. Accident variables used for clustering analysis.

Variable	Attribute	Count	Frequency
Road type	Straight	138	41.2%
	Intersection	150	44.8%
	T-junction	47	14.0%
Weather	Sunny	176	52.5%
	Cloudy	84	25.1%
	Rain/snow	75	22.4%
Light	Day	237	70.7%
	Night lighted	71	21.2%
	Night not lighted	27	8.1%
Motion of vehicle	Straight ahead	258	77.0%
	Turn left	30	9.0%
	Turn right	32	9.6%
	Other	15	4.4%
Motion of TW	Straight ahead	264	78.9%
	Turn left	53	15.9%
	Turn right	14	4.2%
	Other	4	1.2%

Table 2. Accident variables used for association rule mining.

Variable (Short Name)	Attribute (Code)	Count	Frequency
Injury severity of TW rider (Injury)	Uninjured (Unj)	10	3.0%
	Slight (Sli)	47	14.0%
	Serious (Sei)	229	68.4%
	Fatal (Fal)	49	14.6%
Vehicle traveling lane (Lane)	Single carriageway (Sc)	86	25.7%
	Inside of dual carriageway (Idc)	65	19.4%
	Outside of dual carriageway (Odc)	120	35.8%
	Inside of three or more carriageways (Itcs)	38	11.3%
	Outside of three or more carriageways (Otcs)	26	7.8%
Motion of TW relative to vehicle (Motr)	Left (L)	183	54.6%
Motion of TW relative to vehicle (Motr)	Right (R)	152	45.4%
Speed limit (Splim)	30 mph (30 mph)	24	7.2%
	40 mph (40 mph)	95	28.3%
	50 mph (50 mph)	41	12.2%
	60 mph (60 mph)	156	46.6%
	Above 60 mph (>60 mph)	19	5.7%
Road center separation (Rcensep)	Unisolated (Unl)	28	8.4%
	Dotted line (Dl)	37	11.0%
	Solid line (Sl)	162	48.4%
	Isolation rail (Ir)	36	10.7%
	Central green belt (Cgb)	72	21.5%
Direction of collision force on vehicle (Dirt)	1 o’clock direction (O1)	70	20.9%
	2 o’clock direction (O2)	25	7.5%
	3–9 o’clock direction (O3–O9)	42	12.5%
	10 o’clock direction (O10)	19	5.7%
	11 o’clock direction (O11)	69	20.6%
	12 o’clock direction (O12)	110	32.8%

Table 3. Cluster results (k = 8, n = 335).

Cluster		C1	C2	C3	C4	C5	C6	C7	C8
Count (Frequency)		53 (15.8%)	37 (11.0%)	43 (12.8%)	50 (14.9%)	40 (11.9%)	31 (9.2%)	40 (11.9%)	41 (12.2%)
Road type	Straight	53	37	0	0	24	1	5	18
	Intersection	0	0	43	49	16	0	32	10
	T-junction	0	0	0	1	0	30	3	13
Weather	Sunny	53	5	43	7	0	14	18	36
	Cloudy	0	24	0	36	0	7	12	5
	Rain/snow	0	8	0	7	40	10	10	0
Light	Day	48	17	43	34	40	21	0	34
	Night lighted	2	11	0	3	0	9	40	6
	Night not lighted	3	9	0	13	0	1	0	1
Motion of vehicle	Straight ahead	39	30	43	29	32	15	35	35
	Turn left	4	2	0	10	3	11	0	0
	Turn right	5	4	0	9	2	3	3	6
	Other	5	1	0	2	3	2	2	0
Motion of TW	Straight ahead	52	31	43	38	36	27	37	0
	Turn left	1	3	0	11	2	1	0	35
	Turn right	0	1	0	1	2	3	1	6
	Other	0	2	0	0	0	0	2	0

Table 4. Rules obtained for C3 (abbreviated codes for variables and corresponding attributes are in Table 2).

No.	Antecedent	Consequent	Support	Confidence	Lift
1	Motr = L	Injury = Sei	0.512	0.759	1.087
2	lane = Odc	Injury = Sei	0.279	0.857	1.229
3	Motr = R	Splim = 60 mph	0.256	0.786	1.408
4	Lane = Odc and Injury = Sei	Motr = L	0.209	0.750	1.112
5	Lane = Odc and Motr = L	Injury = Sei	0.209	0.900	1.290
6	Injury = Sei and Rcensep = Cgb	Motr = L	0.186	0.889	1.318
7	Rcensep = Cgb and Motr = L	Injury = Sei	0.186	0.800	1.147
8	Injury = Sei and Motr = R	Splim = 60 mph	0.163	0.875	1.568
9	Rcensep = Sl and Dirt = O12	Splim = 60 mph	0.163	0.778	1.394
10	Injury = Sli	Motr = R	0.140	0.750	2.304
11	Lane = Sc	Injury = Sei	0.140	0.750	1.075
12	Splim = 40 mph	Injury = Sei	0.140	0.857	1.229
13	Splim = 40 mph	Motr = L	0.140	0.857	1.271
14	Dirt = O1	Motr = L	0.140	0.857	1.271
15	Lane = Odc and Rcensep = Sl	Injury = Sei	0.140	0.857	1.229
16	Injury = Sei and 40 mph	Motr = L	0.140	1.000	1.483
17	Splim = 40 mph and Motr = L	Injury = Sei	0.140	1.000	1.433
18	Splim = 40 mph	Injury = Sei and Motr = L	0.140	0.857	1.675
19	Lane = Odc and Rcensep = Sl	Motr = L	0.140	0.857	1.271
20	Splim = 50 mph	Injury = Sei	0.116	1.000	1.433
21	Dirt = O10	Injury = Sei	0.116	1.000	1.433
22	Lane = Itcs	Splim = 60 mph	0.116	0.833	1.493
23	Lane = Otcs	Motr = L	0.116	1.000	1.483
24	Splim > 60 mph	Motr = L	0.116	1.000	1.483
25	Dirt = O1 and Motr = R	Injury = Sli	0.116	0.833	4.479
26	Dirt = O1 and Injury = Sli	Motr = R	0.116	1.000	3.071
27	Motr = R and Injury = Sli	Dirt = O1	0.116	0.833	3.583
28	Motr = L and Lane = Idc	Injury = Sei	0.116	0.833	1.194
29	Lane = Odc and 60 mph	Injury = Sei	0.116	0.833	1.194

Table 5. Some accident attributes of vehicle-to-TW collision scenarios (abbreviated codes for accident attributes are in Table 2).

Road Type	Collision Scenario	Weather	Light	Injury Severity of TW Rider	Vehicle Traveling Lane	Motion of TW Relative to Vehicle	Speed Limit	Road Center Separation	Direction of Collision Force on Vehicle
Straight	C1.1	Sunny	Day	Sei	Sc	Left	40/50 mph	Unl	O11
	C1.2	Sunny	Day	Sli	Sc	Right	40/50 mph	Unl	/
	C5.1	Rain/snow	Day	Sei/Fal	Sc	Right	60 mph	/	O1
	C8.1	Sunny	Day	Sei	Sc	Right	40 mph	/	/
Intersection	C3.1	Sunny	Day	Sei	Odc	Left	40 mph	Cgb	O11
	C3.2	Sunny	Day	Sei/Sli	/	Right	60 mph	Sl	O12/O1
	C5.2	Rain/snow	Day	Sei/Fal	Odc	Left	60 mph	/	O12
	C8.2	Sunny	Day	Sei	Odc	Left	60 mph	Cgb	/
T-junction	C6.1	/	Day	Sei	Idc	Right	60 mph	/	O12
	C6.2	/	Day	Sei	Idc	Left	60 mph	/	O12
	C8.3	Sunny	Day	/	/	Right	/	/	/

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, R.; Qian, Y.; Dong, H.; Yu, W. Data Mining-Based Collision Scenarios of Vehicles and Two Wheelers for the Safety Assessment of Intelligent Driving Functions. World Electr. Veh. J. 2023, 14, 284. https://doi.org/10.3390/wevj14100284

AMA Style

Wang R, Qian Y, Dong H, Yu W. Data Mining-Based Collision Scenarios of Vehicles and Two Wheelers for the Safety Assessment of Intelligent Driving Functions. World Electric Vehicle Journal. 2023; 14(10):284. https://doi.org/10.3390/wevj14100284

Chicago/Turabian Style

Wang, Rong, Yubin Qian, Honglei Dong, and Wangpengfei Yu. 2023. "Data Mining-Based Collision Scenarios of Vehicles and Two Wheelers for the Safety Assessment of Intelligent Driving Functions" World Electric Vehicle Journal 14, no. 10: 284. https://doi.org/10.3390/wevj14100284

Article Menu

Data Mining-Based Collision Scenarios of Vehicles and Two Wheelers for the Safety Assessment of Intelligent Driving Functions

Abstract

1. Introduction

2. Data Sources and Scenarios Feature Element Extraction

2.1. Sources of Accident Data

2.2. Accident Variable Extraction and Coding

3. Data Mining Methods

3.1. Hierarchical Clustering

3.2. Association Rules Mining

4. Base Scenarios and Rules Mining

4.1. Accident Data Clustering

4.2. Collision Scenarios Derived from Association Rules

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI