Next Article in Journal
Concurrent Access Performance Comparison Between Relational Databases and Graph NoSQL Databases for Complex Algorithms
Previous Article in Journal
Short-Term Effect of Attributional Versus Non-Attributional Negative Normative Feedback on Motor Tasks: A Double-Blind Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Risk and Complexity Assessment of Autonomous Vehicle Testing Scenarios

1
School of Traffic & Transportation Engineering, Central South University, Changsha 410075, China
2
Department of Industrial and Manufacturing Systems Engineering, The University of Hong Kong, Hong Kong 999077, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(21), 9866; https://doi.org/10.3390/app14219866
Submission received: 9 September 2024 / Revised: 20 October 2024 / Accepted: 24 October 2024 / Published: 28 October 2024
(This article belongs to the Section Transportation and Future Mobility)

Abstract

:
Autonomous vehicles (AVs) must fulfill adequate safety requirements before formal application, and performing an effective functional evaluation to verify vehicle safety requires extensive testing in different scenarios. However, it is crucial to rationalize the application of different scenarios to support different testing needs; thus, one of the current challenges limiting the development of AVs is the critical evaluation of scenarios, i.e., the lack of quantitative criteria for scenario design. This study introduces a method using the Spherical Fuzzy-Analytical Network Process (SF-ANP) to evaluate these scenarios, addressing their inherent risks and complexities. The method involves constructing a five-layer model to decompose scenario elements and using SF-ANP to calculate weights based on element interactions. The study evaluates 700 scenarios from the China In-depth Traffic Safety Study–Traffic Accident (CIMSS-TA) database, incorporating fuzzy factors and element weights. Virtual simulation of vehicles in the scenarios was performed using Baidu Apollo, and the performance of the scenarios was assessed by collecting the vehicle test results. The correlation between the obtained alternative safety indicators and the quantitative values confirms the validity and scientific validity of this approach. This will provide valuable guidance for categorizing audiovisual test scenarios and selecting corresponding scenarios to challenge different levels of vehicle functionality. At the same time, it can be used as a design basis to generate a large number of effective scenarios to accelerate the construction of scenario libraries and promote commercialization of AVs.

1. Introduction

Developed over the past decades, advanced sensors and algorithms have endowed autonomous vehicles (AVs) with the potential to significantly enhance traffic safety by eliminating human error [1,2]. However, to ensure AVs are safe and reliable before deployment on public roads, rigorous and comprehensive safety testing is essential. Numerous testing methodologies have been proposed and extensively used to ensure AVs’ safety. Traditional approaches, such as open road testing and closed field testing, are often compromised by external environmental conditions and human factors, making it challenging to accurately assess AV safety performance. In contrast, scenario-based virtual simulation testing has emerged as a leading option due to its distinct advantages. These include greater control over environmental conditions, the ease of creating and testing complex scenarios, efficient data collection, and the ability to effectively simulate and verify various traffic situations [3,4,5]. Comprehensive scenario-based testing, which simulates a wide range of real-world traffic situations, provides a more precise and holistic assessment of AV performance [6,7,8,9].
To effectively identify the vulnerabilities and limitations in AVs, it is crucial to design testing scenarios that optimally balance risk and complexity [10,11,12]. In order to enable challenging test scenarios to be used for refined and functional testing, we need to quantify different scenarios. By quantifying the scenarios, the scenarios included in the scenario library can be better targeted under different numerical labels, i.e., different scenarios can be filtered according to the different testing needs. At the same time, new scenarios can be quickly and accurately placed under different labels to facilitate the updating and application of the scenario library. However, quantitatively evaluating the risk and complexity of the testing scenarios is challenging [13,14]. The primary difficulties arise from an incomplete understanding of how various scenario elements impact AVs and the lack of an effective representation of the dynamic interactions between AVs and their environments. Traffic environments are composed of multiple factors whose influence is not merely additive but involves complex interactions. Therefore, scientific methods are necessary to accurately determine the complexity and risk levels of the testing scenarios. The current method mostly quantizes the scenarios using the potential field method, which is complicated and not universal, to assess the nature of different scenarios in the scenario library [15,16,17]. It cannot be implemented under the conditions of the continuous iterative updating of scenario libraries.
To address these research gaps, this study aims to evaluate the risk and complexity of AV testing scenarios across two dimensions. First, we developed a method for deconstructing scenario elements into five distinct layers. Using the Spherical Fuzzy-Analytical Network Process (SF-ANP), we then proposed a quantitative evaluation model to assess the risk and complexity of each scenario. Concurrently, we determined the weights of the risk and complexity for each scenario element. Next, we applied this method to evaluate real-world scenarios from the China In-depth Traffic Safety Study–Traffic Accident (CIMSS-TA) database, using a fuzzy comprehensive evaluation approach. After eliminating accidents with incomplete statistical information or more fuzzy accidents, we screened 700 accidents as research objects. Finally, the proposed model’s validity was tested through virtual simulations. This study contributes to the research in two significant ways: (1) It introduces a novel five-layer test scenario deconstruction method and applies an innovative Fuzzy-ANP technique to determine the weights of the 17 scenario sub-elements. The magnitude of the weights differs according to the degree of their contribution under different scenario metrics. Lighting and weather have the largest weights under both metrics; (2) Based on these weights, a quantitative assessment model is developed for evaluating the risk and complexity of audiovisual security test scenarios. By virtue of normalization, setting the values of different elemental parameters provides a general framework. This approach provides new insights for selecting and constructing audiovisual security test scenarios. (3) Simulation results are used to validate the assessment methodology, and correlation analysis demonstrates that in the unavoidable scenario, the complexity and risk of the scenario are positively correlated with the actual metrics, with 0.21 and 0.36, respectively, and negatively correlated with the avoided scenario, with −0.31 and −0.35, respectively.
The remainder of this paper is structured as follows: Section 2 provides a literature review. Section 3 outlines the testing scenario deconstruction and the SF-ANP-based comprehensive evaluation approach. Section 4 presents the results of the scenario evaluations and simulation validations. Section 5 discusses the significance of the scenario elements in the testing process. Finally, Section 6 concludes and suggests directions for future research.

2. Literature Review

Researchers have been working to create challenging testing scenarios for AVs. Riedmaier et al. [12] highlighted the need to shift from focusing on individual scenarios to evaluating overall safety. They discussed two methods: one that covers a wide range of scenarios and another that focuses on extreme, rare cases. Feng et al. [10] stressed the importance of defining what makes a scenario critical to build a comprehensive testing library. They proposed a framework that considers both the difficulty of maneuvers and how often they occur, allowing for diverse testing conditions and performance metrics. Comparing broad coverage scenarios with extreme cases, Riedmaier et al. [12] found that broader scenarios are more effective in testing AV safety, leading to a preference for this approach.
The PEGASUS project introduced a five-layer model for testing scenarios [18]. Layer 1 covers road features like lane markings and geometry. Layer 2 includes traffic infrastructure such as traffic lights and barriers. Layer 3 integrates these with time, defining a standard day. Layer 4 focuses on traffic participants and their interactions, while Layer 5 considers environmental factors like weather and lighting. Bagschik et al. [19] refined this model, suggesting that Layer 2 should only include physical objects interacting with the road, considering traffic rules. Layer 4 was revised to detail specific maneuvers, and Layer 5 emphasized the environment’s impact on these interactions. Recognizing the need for digital communication, Sauerbier et al. [20] added a sixth layer for vehicle-to-everything (V2X) communication and digital maps. The characteristics of the different models are shown in Table 1.
The previously discussed models do not fully address the state of the autonomous vehicles (AVs) themselves. It is important to consider how AVs interact with their traffic environments, accounting for the elements surrounding them. To address this gap, Zhang et al. [17] expanded the six-layer model of testing scenarios to include the ego vehicle, which is the AV being tested. In contrast, the National Highway Traffic Safety Administration (NHTSA) proposed a four-element framework for scenario testing that focuses on the perspective of the ego vehicle within the AV testing framework [21]. This framework encompasses tactical maneuvers, which are control-related tasks, and the Operational Design Domain (ODD) elements, which define the specific conditions under which an AV system is designed to operate, including roadway type, speed limits, lighting, weather, and other constraints. It also includes Object and Event Detection and Response (OEDR), which involves the AV’s ability to monitor its environment and respond appropriately to objects and events, as well as failure mode behaviors that consider how the AV handles incomplete or incorrect information. Furthermore, Ulbrich et al. [7] combined static infrastructure information with dynamic environmental data to create a graph-based representation of the local environment, incorporating details about the AV’s own state and enriching the scenario with task-specific information. This approach includes various aspects of the state space, such as the vehicle’s position, speed, and steering; topological information like lane markings, traffic rules, and signals; and the presence of other road participants, including vehicles and obstacles, within the AV’s field of view [22,23,24].
The selection of scenario elements for evaluation in scenario tests is influenced by various factors, including scenario complexity. Wang et al. [25] divided scenario elements into three categories: static, dynamic, and environmental. Static elements cover physical structures like road signs and signals, while environmental elements include conditions such as rain, snow, fog, and dust. Yu et al. [26] argued that scenario complexity consists of two main aspects: road semantic complexity and traffic element complexity. Road semantic complexity considers factors like road type (urban, highway, rural), scenario type (e.g., intersections, tunnels), and challenging conditions (e.g., curves, night driving). Traffic element complexity is measured using a matrix that tracks the distance and angle of the ego vehicle relative to other traffic elements [27].
The potential field method is often used to assess the complexity of traffic scenes by analyzing scenario elements. Cheng et al. [15] used this method combined with hierarchical analysis to quantify the complexity of traffic environments, but they did not account for road conditions. Similarly, Wang et al. [16] applied the potential field method with a hierarchical approach to evaluate road models within a driver–vehicle–road framework, but they also overlooked critical elements like weather conditions. Zhang et al. [17] introduced a method to quantify scenario complexity by using a feasible trajectory that incorporates current vehicle dynamics constraints and entropy as a measure. This approach adjusts the travel domain based on the presence of other traffic participants, defining scenario complexity as the total entropy of all vehicles in the scene. However, the potential field method, when viewed from the perspective of the ego vehicle, primarily focuses on the vehicle’s dynamic characteristics and fails to adequately account for interactions between traffic elements, leading to an incomplete quantification of testing scenarios.
Multi-Criteria Decision Making (MCDM) is a useful method for making decisions that involve multiple criteria and trade-offs [28,29]. The assessment of complexity and risk in testing scenarios for autonomous vehicles can be seen as a multi-criteria decision problem. The Analytical Network Process (ANP) helps manage interdependencies between different criteria and elements, such as those in traffic models [30]. By incorporating fuzzy sets into ANP, the subjectivity in evaluations can be minimized. Fuzzy sets extend Boolean logic to handle uncertainties in real-world decisions [31]. Several types of fuzzy sets exist, including type 2, intuitionistic, Pythagorean, neutrosophic, and hesitant fuzzy sets, with each offering different ways to deal with uncertainty and hesitation in decision-making [32,33,34,35,36]. Spherical Fuzzy sets (SFs) account for hesitation by setting a limit on the sum of squared membership, non-membership, and hesitation values [37]. Erdoğan et al. [38] used SFs, along with DEMATEL, ANP, and VIKOR, to assess risks in automated driving systems, while Zhang et al. [39] applied Fuzzy-ANP to evaluate a water supply network. These studies show that Fuzzy-ANP is effective in evaluating the relationships between elements in testing scenarios.
Differing from the traditional five-layer model, we considered adding vehicle state information and various pieces of interaction information based on multiple modified models to realize the construction of scenario elements serving the simulation test. Abandoning the complex dynamic process of potential field, we completed the weight assignment of scene elements through Fuzzy-ANP. Finally, we proposed the use of a normalization method to classify the different elements and performed the quantization of the scenarios using a matrix approach.

3. Materials and Methods

3.1. Elements of Testing Scenarios

Previous research has shown that the test scenarios for autonomous vehicles (AVs) should comprehensively cover all relevant elements to accurately simulate real-world conditions. The hierarchical structure of the scenarios presented in the previous paper is more mature, but due to the limitations in vehicle states and interactions, we propose the use of an updated five-layer structure to represent these elements.
The first layer focuses on the ego vehicle, which is the AV being tested. This includes its motion (such as speed, acceleration, and direction), driving tasks (like turning, lane changes, and stopping), and the vehicle type, which is based on its physical properties and geometry. The second layer covers road characteristics, detailing elements like road section types (such as intersections or straight roads), the number of lanes (which impacts road width), road surface conditions (including how weather affects surface adhesion), road smoothness, and slope. These factors are critical as they influence the vehicle’s dynamics and handling. The third layer includes the traffic infrastructure, such as traffic signals and physical obstacles, which guide and influence the AV’s behavior. These elements contribute to the complexity of the scenario by fostering interactions among different traffic participants. The fourth layer addresses the objects around the AV, including other vehicles, pedestrians, and cyclists. It considers their behaviors, quantities, relative directions, and positions, which are crucial for accurately modeling real-world interactions. The fifth layer represents the environment, encompassing various external conditions such as weather and lighting. These environmental factors significantly affect both the AV’s sensors and its driving behavior. The elements of the scenario are shown in Table 2.
By combining these five layers, the study creates a comprehensive virtual environment that closely replicates real-world scenarios, allowing for a thorough assessment of the AV’s performance under diverse conditions.

3.2. Fuzzy ANP Method

The ANP framework offers a notable advantage compared to the Analytical Hierarchy Process (AHP) method [40,41,42]. ANP allows for the characterization of intricate relationships between decision levels and elements without the need for establishing distinct levels for the elements. Furthermore, ANP enables the representation of interdependence and interaction among elements. The conventional ANP comprises two layers: the control layer, which includes a structure controlling the interaction between criteria, typically composed of objectives and indicators; and the network layer, which consists of elements or individual elements that depend on the objectives or indicators. The unique structure of the ANP allows individual objects to exist both as indicators and as elements. Interdependencies within the model are illustrated through bidirectional arrows, while circular arcs denote interdependencies within the same element, with the direction indicating the influence exerted by one sub-element or element on the other sub-element or element it affects. ANP quantifies the relationship of influence between elements based on the decision maker’s subjective evaluation using a pairwise comparison matrix under specific criteria, and it is crucial to consider the consistency of the comparison matrix.
In ANP, the influence between elements is quantified through a pairwise comparison matrix, which is based on the subjective evaluations of the decision-maker under certain criteria. Maintaining consistency in this matrix is crucial for accurate decision-making. Traditional ANP treats these evaluations as precise values, but this approach can be limited due to factors like ambiguity in human judgment, uncertainty in decision contexts, and incomplete information. To overcome these challenges, fuzzy methods are employed. Fuzzy sets help capture the inherent vagueness in human preferences by using fuzzy numbers to simulate the decision-making process more realistically. They provide a suitable approach for handling the complex network of relationships in ANP and offer a logical foundation for the comparison process.
SFs enhance this approach by combining the strengths of Pythagorean fuzzy sets and neutrosophic sets while reducing their limitations. SFs extend traditional fuzzy sets by transferring the membership function to a spherical surface, which allows for three levels of representation: membership, non-membership, and hesitation. This representation of fuzzy sets across the entire domain provides a more nuanced way to model uncertainty and indecision, as described by (1) and (2).
A ˜ S = x , μ A ˜ S x , υ A ˜ S x , π A ˜ S x | x X
0 μ A ˜ S 2 x + υ A ˜ S 2 x + π A ˜ S 2 x 1 x X
where μ A ˜ S : X 0 , 1 , υ A ˜ S : X 0 , 1 , π A ˜ S : X 0 , 1 . For each x X , μ A ˜ S ( x ) , υ A ˜ S ( x ) , and π A ˜ S ( x ) are the degree of membership, non-membership, and hesitancy of x to A ˜ S , respectively. The basic operations of SFs are defined as (3)–(8):
A ˜ S B ˜ S = max μ A ˜ S , μ B ˜ S , min υ A ˜ S , υ B ˜ S , min ( 1 max μ A ˜ S , μ B ˜ S 2 + min υ A ˜ S , υ B ˜ S 2 ) 1 / 2 , max π A ˜ S , π B ˜ S
A ˜ S B ˜ S = min μ A ˜ S , μ B ˜ S , max υ A ˜ S , υ B ˜ S , max ( 1 min μ A ˜ S , μ B ˜ S 2 + max υ A ˜ S , υ B ˜ S 2 ) 1 / 2 , min π A ˜ S , π B ˜ S
A ˜ S B ˜ S = ( μ A ˜ S 2 + μ B ˜ S 2 μ A ˜ S 2 μ B ˜ S 2 ) 1 / 2 , υ A ˜ S υ B ˜ S , ( 1 μ B ˜ S 2 π A ˜ S 2 + 1 μ A ˜ S 2 π B ˜ S 2 π A ˜ S 2 π B ˜ S 2 ) 1 / 2 }
A ˜ S B ˜ S = { μ A ˜ S μ B ˜ S , ( υ A ˜ S 2 + υ B ˜ S 2 υ A ˜ S 2 υ B ˜ S 2 ) 1 / 2 , ( ( 1 υ B ˜ S 2 ) π A ˜ S 2 + ( 1 υ A ˜ S 2 ) π B ˜ S 2 π A ˜ S 2 π B ˜ S 2 ) 1 / 2 }
λ · A ˜ S = 1 1 μ A ˜ S 2 λ 1 / 2 , υ A ˜ S λ , ( 1 μ A ˜ S 2 λ 1 μ A ˜ S 2 π A ˜ S 2 λ ) 1 / 2 }
A ˜ S λ = μ A ˜ S λ , 1 1 υ A ˜ S 2 λ 1 / 2 , ( 1 υ A ˜ S 2 λ 1 υ A ˜ S 2 π A ˜ S 2 λ ) 1 / 2 }
For these SFs A ˜ S = ( μ A ˜ S , υ A ˜ S , π A ˜ S ) and B ˜ S = ( μ B ˜ S , υ B ˜ S , π B ˜ S ) , the followings are valid under the condition ( λ , λ 1 , λ 2 > 0 ) .
A ˜ S B ˜ S = B ˜ S A ˜ S
A ˜ S B ˜ S = B ˜ S A ˜ S
λ A ˜ S B ˜ S = λ B ˜ S A ˜ S
λ 1 A ˜ S λ 2 A ˜ S = λ 1 + λ 2 A ˜ S
A ˜ S B ˜ S λ = A ˜ S λ B ˜ S λ
A ˜ S λ 1 A ˜ S λ 2 = A ˜ S λ 1 + λ 2
Based on SF-ANP, the research problem is clarified and then the linguistic items of the expert scoring table are determined. The generic expert scoring table is divided into nine layers using the evaluation description of subjective human understanding. Nine layers of linguistic items are quantified by SFs, which can effectively represent the uncertainty of the expert evaluation of fuzzy numbers or events with corresponding label scores to portray the importance level. The details are shown in Table 3. In assessments, it is difficult for experts and researchers to assign a definitive value. Numerical expressions seem more ambiguous than verbal descriptions. We defined our constant values using nine evaluation levels based on the values that were proposed to correspond to the different evaluations [38,43]. The upper, middle, and lower bounds of the dynamic values we determined using the probability of a fuzzy number of questionnaires under the condition of satisfying (15) and (16). The SI was obtained via spherical fuzzy number defuzzification. For absolutely more importance (AMI), very high importance (VHI), high importance (HI), slightly more importance (SMI), and equal importance (EI), the SI was obtained using the following formula:
SI = 100 × μ A ˜ S π A ˜ S 2 υ A ˜ S π A ˜ S 2
For slightly low importance (SLI), low importance (LI), very low importance (VLI), andabsolutely low importance (ALI), the SI was obtained using the following formula:
SI = 1 100 × μ A ˜ S π A ˜ S 2 υ A ˜ S π A ˜ S 2
To assess the relationships among elements within a network structure in ANP, an expert scoring table was employed to facilitate evaluations, resulting in a comparison matrix for the relevant elements. However, due to the presence of subjective factors inherent in human judgment, the comparison matrix may possess inconsistencies. To mitigate these adverse effects, a consistency test of the comparison matrix should be conducted prior to further calculations. For those matrices that do not meet the consistency index criteria, the evaluation calculation formula needs to be redefined, as delineated in (17).
C I = λ n n 1
where λ is the largest eigenvalue of the comparison matrix, and n is the dimension of the matrix.
The consistency ratio (CR) is defined as the ratio between the consistency of a given evaluation matrix and the consistency of a random matrix:
C R = C I R I n
where R I ( n ) is a random index [44] that depends on the size of matrix n. The matrix is considered to have satisfactory consistency and its degree of inconsistency is within a manageable threshold when C R < 0.1.
The Spherical Weighted Geometric Mean (SWGM) in spherical fuzzy has the characteristics of being computationally small and easy to extend in fuzzy cases, and can guarantee the uniqueness of the solution, etc. This operator was used to calculate the fuzzy values of the pairwise comparison matrix between the sub-elements in the expert scoring table and the sub-element weights under the sub-criterion shown in (19) via defuzzification. The defuzzification process was defined using the specific formula provided in (21).
w i j = ω i 1 j 1 ω i 1 j 2 ω i 2 j 1 ω i 2 j 2 ω i 1 j n j ω i 2 j n j ω i n j j 1 ω i n j j 2 ω i n j j n j
SWG M w A ˜ S 1 , A ˜ S 2 , , A ˜ S n = w 1 A ˜ S 1 + w 2 A ˜ S 2 + + w n A ˜ S n = i = 1 n μ A ˜ S i w i , 1 i = 1 n 1 υ A ˜ S i 2 w i , i = 1 n 1 υ A ˜ S i 2 w i j = 1 n 1 υ A ˜ S i 2 π A ˜ S i 2 w i
where w = ( w 1 , w 2 , . . . , w n ) ; w i 0 , 1 ; i = 1 n w i = 1
D S F = 100 × 3 μ A ˜ S π A ˜ S 2 2 υ A ˜ S 2 π A ˜ S 2
The column vectors within the matrix depict the extent of the local interaction between the elements belonging to the i-th elements and those in the j-th elements. Subsequently, all the sub-elements were integrated to represent the relationship encapsulating the degree of local influence of all sub-elements within the testing scenario, presented as a supermatrix.
To ensure comparability, column normalization is necessary when dealing with the same elements across different sub-criteria. The pairwise comparison matrix between elements was then calculated through consistency testing, SWGM evaluation, defuzzification, and weighting procedures, resulting in the weighting matrix A for the corresponding supermatrix, as outlined below:
A = a 1 1 a 1 2 a 2 1 a 2 2 a 1 N a 2 N a N 1 a N 2 a N N
The column vector is the degree to which the elements in that column interact with the rest of the elements. The global weights R, constructed as in (23), were obtained by dot-multiplying the supermatrix with the weighting matrix in the following form:
R = a 1 1 × w 11 a 1 2 × w 12 a 2 1 × w 21 a 2 2 × w 22 a 1 N × w 1 N a 2 N × w 2 N a N 1 × w N 1 a N 2 × w N 2 a N N × w N N
Finally, the weight vector was obtained by taking the power limit of the weight vector to obtain the weight vector R ^ for all sub-elements.
R ^ = r 1 , r 2 , r 3 , , r n
Fuzzy comprehensive evaluation offers a solution for complex decision-making problems that involve multiple variables. It considers multiple factors by utilizing fuzzy variation and adhering to the principle of maximum affiliation, ultimately yielding the evaluation results. Building upon the weight vector determined by SF-ANP, as discussed earlier, an evaluation set was constructed. With consideration of the interrelationship of sub-elements and evaluation objectives, the evaluation set typically adopts a five-level structure, which is expressed as (25). This structure exhibits a progressive relationship between levels, facilitating a level-by-level qualitative evaluation of the assessment results based on the evaluation objectives.
V = v 1 , v 2 , v 3 , v 4 , v 5
A visual comparison was employed to facilitate the drawing of conclusions that provide a more accurate and clear representation of the evaluation opinions for different programs. The score matrix is shown in (26) and was derived quantitatively by assigning corresponding values to the scores.
P = p 1 , p 2 , p 3 , p 4 , p 5
The evaluation matrix U is a representation of the rank of the elements in the different alternatives in the established evaluation set, where each column of the cell vector represents the evaluation opinion rank to which the element belongs.
U = u 11 u 12 u 21 u 22 u 1 n u 2 n u 51 u 52 u 5 n
Fuzzy operators are artificially defined to enable optimal and rational quantitative assessment in fuzzy inference evaluation. The common classifications of fuzzy operators include four categories: Min–Max, Sum–Product, Min–Sum, and Max–Product operators. The Min–Max operator minimizes the distances between antecedents of fuzzy rules, and between antecedents and descendants, and maximizes the distances between rules. The remaining three categories follow similar general principles. The overall evaluation result B was obtained by applying the fuzzy operator to the evaluation matrix U using the set of element weights R, as shown in (28):
B = R U
where ∘ is the comprehensive evaluation of fuzzy operators. Finally, the total system score is calculated in order to compare the objectives with each other to solve the fuzzy comprehensive evaluation of multiple objectives. The evaluation results are multiplied with the score matrix to obtain the quantitative score of the evaluated object, as described in (29).
P ˜ = B · P

4. Results

4.1. Risk and Complexity of Testing Scenario Assessment Based on SF-ANP

Fuzzy-ANP is a good quantitative evaluation method by virtue of its negative impact, suppressing subjective human thinking and hesitant decisions. Since there is no predefined set value and no good objective calculation method for evaluating scenarios with autonomous vehicles, the use of experienced researchers’ judgment methods is an effective solution. In order to realize our comprehensive evaluation of accident scenarios in autonomous vehicles testing, using the hierarchical structure and Fuzzy-ANP established above, we chose the two more important indicators of risk and complexity in scenario testing as the evaluation objectives. At the same time, the risk and complexity of the test scenarios are affected by the variability of each scenario element. Therefore, we adopted different SF-ANP structures to determine the relationship between sub-elements and elements.
In this study, the control layer comprised complexity and risk, according to expert evaluations, to determine whether the elements affect each other. Then, the SF-ANP structure of the testing scenario was constructed, as shown in Figure 1 and Figure 2. A B means A is affected by B and A B means that A and B influence each other, while circular arcs denote interdependencies within the same element.
We generated two types of scoring table by combining and averaging the scores from seven experts. The seven experts include three engineers and four researchers working in the field of road traffic and traffic safety. The weights were set according to the number of years of work and research experience and the level of education. Each scoring table includes a pairwise comparison matrix for elements based on five distinct criteria, as well as a comparison of the importance of sub-elements within each element, where an arbitrary sub-element is used as a sub-criterion. In total, there are 92 comparison matrices. For illustration purposes, Table 4 shows the relationships among the elements when scenario complexity is the objective, with the ego vehicle serving as the criterion. Table 5 illustrates the level of influence between the elements and the ego vehicle, with scenario risk level as the main goal. Table 6 and Table 7 present the pairwise comparison matrices for sub-elements within the road, where the behavior of the ego vehicle is used as a sub-criterion.
Prior to normalizing the matrix, the rationality of the scoring table needed to be confirmed. Super Decision software(v2.10) was created by Thomas L. Saaty and his wife Rozann Whitaker Saaty in Pittsburgh, USA.And it was employed to conduct consistency testing by inputting the SI of the scoring table. The analysis revealed that all the obtained comparison matrices had values below 0.1, indicating high consistency. Consequently, it could be concluded that the scoring tables met the fundamental requirements. The individual sub-criteria comparison matrix was made unambiguous by passing it through SWGM, as follows:
s l c o m = 18.1755 , 11.8179 , 8.9890 , 15.5154 , 11.1818
s l r i s k = 16.1295 , 19.8542 , 10.2979 , 7.7028 , 10.2979
where s l c o m and s l r i s k were the calculated values, obtained after the defuzzification of five sub-elements regarding scenario complexity and scenario risk, respectively.
The local weights were calculated using the following formulas:
w ¯ j = D S F s j j = 1 n D S F s j
L com = 0.2767 , 0.1799 , 0.1369 , 0.2362 , 0.1702 T
L risk = 0.2509 , 0.3089 , 0.1602 , 0.1198 , 0.1602 T
The individual criteria comparison matrix was made unambiguous by passing it through SWGM, as follows:
s a c o m = 5.5536 , 15.4053 , 14.0927 , 17.2751 , 12.4745
s a r i s k = 1.5331 , 15.0660 , 13.2325 , 15.0660 , 17.2751
where s a c o m and s a r i s k represent the calculated values obtained after the defuzzification of five elements regarding scenario complexity and scenario risk, respectively. The weights among the elements were obtained based on the normalization method, as follows:
A com = 0.0857 , 0.2377 , 0.2175 , 0.2666 , 0.1925 T
A risk = 0.0247 , 0.2423 , 0.2128 , 0.2423 , 0.2779 T
The weighted matrix was calculated with the unweighted supermatrix to obtain the weighted supermatrix; meanwhile, the required element weight values were obtained for the testing scenarios under different objectives, which were computed as shown in (31) and (32):
R c o m = A com 0 × L com
R r i s k = A risk 0 × L risk
R c o m = 0.0606 , 0.0428 , 0.0325 , 0.0562 , 0.0405
R r i s k = 0.0608 , 0.0748 , 0.0388 , 0.0290 , 0.0388
Using the same steps, all the elements, along with the matrix of sub-elements, were calculated by SWGM, defuzzification, weight calculation, and normalized weighting to obtain the global weight matrix of the sub-elements. Finally, the power limit of the matrix was calculated using MATLAB and the version is R2020b. to obtain the final weight of all sub-elements in terms of risk and complexity, as shown in Figure 3.
The global weight is the relative contribution of the scenario sub-elements to scenario risk and scenario complexity and measures the importance weights of each scenario component to different objectives. The evaluation matrices under different objectives were constructed separately:
V 1 = v 11 , v 12 , v 13 , v 14 , v 15
V 2 = v 21 , v 22 , v 23 , v 24 , v 25
where V 1 is complexity; V 2  is risk. Five evaluation levels were used, as follows: v 11 —very simple; v 12 —simple; v 13 —basically simple; v 14 —complex; v 15 —very complex; v 21 —very safe; v 22 —safe; v 23 —basically safe; v 24 —dangerous; and v 25 —very dangerous. According to a large amount of traffic data, the parameter ranges corresponding to the different evaluation levels for each element parameter in terms of risk and complexity were determined, which provided reference standards for element evaluation, as shown in the Appendix A.
In order to facilitate the accurate expression of comparative evaluation results, the different evaluation levels were quantified and characterized by a score matrix P.
P = 0.2 , 0.4 , 0.6 , 0.8 , 1.0
The values of the scenario element parameters were determined in each real scene by extraction; the total of scene sub-elements was seventeen, corresponding to the range of evaluation level parameters required to obtain the evaluation matrix U.
U = u 11 u 12 u 21 u 22 u 117 u 217 u 51 u 52 u 517
Each of these columns represented the evaluation level of a single element; for example, if the value of the element C i j parameter in the scenario risk degree in an actual scenario fell within the range of basically simple and basically safe in the evaluation level, its corresponding column vector was 0 , 0 , 1 , 0 , 0 . Using this method, it was possible to obtain the evaluation vectors of all elements in a single real scene to construct the evaluation matrix. Combined with the fuzzy operator M = , , the integrated evaluation results were calculated. The evaluation results of the integrated testing scenario are jointly influenced by all elements. As M = , has the advantage of allowing each factor to maintain its contribution to the evaluation results, this operator was used to calculate the matrix in combination to obtain the integrated evaluation matrix B, as described in (36).
B = R ^ , U B k = j = 1 m r j u j l k = 1 , 2 , , n
The evaluation matrix is quantitatively represented by the score matrix, and the risk and complexity values for individual scenarios are calculated as shown in (37).
P ˜ = B P = b 1 , b 2 , b 3 , b 4 , b 5 0.2 , 0.4 , 0.6 , 0.8 , 1.0 = 0.2 b 1 + 0.4 b 2 + 0.6 b 3 + 0.8 b 4 + b 5
This step was duplicated for existing testing scenarios to obtain the quantified values for all statistical scenarios.

4.2. Scenario Assessment and Testing Results

The assessment results of the 700 scenarios are shown in Figure 4, and the scenario complexity and the scenario risk are roughly linearly and positively correlated. Compared with the stronger correlation distributed in the middle or smaller region of the complexity and risk values, the relationship between complexity and risk becomes weaker when the values are larger, and the distribution is characterized by a discrete pattern. This indicates that as the scenarios become more and more complex, the scenario risk does not increase in a single way. The scenario evaluation value was analyzed and processed to obtain its descriptive statistical analysis of each indicator, as shown in Table 8.
As shown in Table 9, the normal distribution of complexity and risk was tested using the Shapiro–Wilk test (W-test). The obtained p-value was less than 0.05, indicating a departure from the normal distribution as the condition p > 0.05 was not met. Therefore, when using the Spearman correlation coefficient, the calculated p-value was found to be less than 0.01, indicating a statistically significant correlation. Furthermore, the correlation coefficient of 0.64 suggests a strong positive correlation.
Among the 700 scenarios, Baidu Apollo can safely pass 546 scenarios, while 154 crash scenarios cannot be avoided. The distribution of test values for the scenarios presented in Figure 5 illustrates that, overall, the risk values are higher and concentrated in the range of 0.40–0.72, while the complexity values are mostly distributed in the range of 0.30–0.66.
In order to quantify the safety level of the driving process in the avoidable scenarios, this study calculated the Generalized-Time-To-Collision (GTTC) at each moment of the driving process (sampling period is 0.1 s). The specific definition equation of GTTC is shown in (38)–(40):
D i , j = P m P n T P m P n
D ˙ i , j = P m P n T V m V n D i , j
G T T C = D i , j D ˙ i , j , D ˙ i , j < 0 + , D ˙ i , j 0
where P m , P n denote the two positions of the two nearest points of the two vehicles, respectively, and V m , V n denote the two velocities of the two nearest points of the two vehicles, respectively. D i , j denotes the distance between the two nearest points of the two vehicles and D ˙ i , j denotes the first-order derivative of the distance between the nearest two points of the two vehicles, i.e., the difference in speeds between these points. D ˙ i , j 0 considers that the distance between the two cars will increase or remain unchanged and no danger will occur.
The G T T C m i n metric was utilized to determine the most perilous time period within a scenario. It represents the minimum value of  G T T C observed throughout the simulation process. In avoidable scenarios, G T T C m i n was employed to quantify the level of danger during vehicle operations. Conversely, Delta V was utilized to characterize the level of danger in unavoidable scenarios.
A comparison of simulation and evaluation results is shown in Figure 6. Delta V in the unavoidable scenarios is mainly distributed between 2 and 18, with a large proportion around 8. The overall trend of Delta V in the graph becomes larger as the risk and complexity increase, but there are some fluctuations that are more discrete. In the avoided scenarios G T T C m i n , it is mainly concentrated between 0 and 6, and mostly around 0. What can be seen is that as the risk and complexity of the scenarios become higher, G T T C m i n is less present at larger values and more concentrated at lower values. Compared to G T T C m i n , the complexity and risk of the scenarios have a stronger correlation with Delta V.
The correlation between the simulation and the evaluation results is verified through the Pearson and Spearman correlation test, i.e., it proves the rationality of the fuzzy comprehensive evaluation based on SF-ANP.
It is shown in Table 10 that, using the same normal distribution test with a Delta V p-value < 0.05, the Spearman coefficient was used. The significance test p-value < 0.01 and the correlation coefficient is 0.21, indicating a positive weak correlation. Similarly, the risk-Delta V correlation test uses the Spearman coefficient, where the p-value < 0.01 and the Spearman correlation coefficient is 0.36, which exhibits a positive weak correlation. Delta V is consistent with the evaluation results of the testing scenarios.
Through Table 11, the p-value of G T T C m i n in the normal W-test is less than 0.05, and the p-value of risk < 0.05. Therefore, all of them use the Spearman correlation coefficient, the p-values of the significance tests are both less than 0.01, and the correlation coefficient is categorized as −0.31, −0.35, which all indicate a negative weak correlation. In the scenarios with a lower risk value and less complexity, the AV has higher operational safety.

5. Discussion

5.1. Scenario Element for Evaluation

Assessing scenario risk involves considering the environment, roads, and objects, all of which play critical roles. The environment, a key component of traffic, influences the road conditions and affects the behavioral choices of traffic participants, thereby impacting the overall risk of the scenarios [45]. Within the road layer, factors like road level and slope contribute to risks related to both the horizontal and vertical movements of vehicles. Additionally, the behavior of objects, including their distance from the ego vehicle, directly impacts the risk associated with the ego vehicle’s movements. This relationship can be modeled using the concept of a risk potential field within the scenario [16,46].
Cheng et al. [15] emphasized humans as the most critical element in their assessment of scenario complexity, including other factors such as motor vehicles, animals, vegetation, ancillary facilities, signs, and line markings. However, they did not consider how driving intentions and different road structures interact. In contrast, we argue that roads and objects significantly influence scenario complexity. The road is fundamental to the scenario because the type of road segment determines the behavior and intent of the ego vehicle, as well as the range of target options. More complex road segments lead to dynamic changes in the available choices for traffic participants, increasing scenario complexity. The number of lanes also affects the driving conditions, as more lanes provide a wider range of locations for traffic participants, which increases scenario complexity [47].
Objects also play a crucial role in determining scenario complexity. The types of objects that are present influence their behaviors, and more complex behaviors contribute to greater scenario complexity. A higher number of objects leads to more task assignments, more information interactions, and an overall increase in scenario complexity, simulating a more realistic traffic environment.

5.2. Scenario Risk and Complexity Assessment

Within a testing scenario, as complexity increases, the ego vehicle faces more demanding perceptual tasks within a certain threshold [48]. This increased complexity introduces additional factors into the decision-making process, making it more challenging to achieve the vehicle’s functional goals. However, if the complexity becomes too high, the vehicle may adopt conservative measures to maintain safety, which can reduce efficiency and potentially lead to a decrease or stabilization in scenario risk. Conversely, an increase in scenario risk does not always correspond to an increase in complexity. Scenario risk is a measure used to determine whether the ego vehicle can achieve safe driving, while scenario complexity evaluates how well the vehicle can perform its intended functions, taking the impact on efficiency into account.
Feng et al. [10] confirm the accuracy and validity of the scenario risk evaluation method through theoretical analysis, although the qualitative aspects are not immediately clear. As scenario complexity grows, the ego vehicle is required to handle more complex perceptual tasks, but if this complexity is excessive, the vehicle’s focus on safety through conservative actions may lead to reduced efficiency and a potential decrease in scenario risk. Importantly, an increase in risk does not necessarily mean an increase in complexity. Scenario risk assesses the vehicle’s ability to maintain safety, while scenario complexity measures how effectively the vehicle can perform its functions, considering efficiency as a factor.
In summary, scenario risk indicates whether the vehicle’s safety goals are met, whereas scenario complexity assesses the extent of functionality realization while considering the impact on overall efficiency.

5.3. Analysis of Testing Results

To ensure the rationality of the evaluation methods, the verification of the evaluation results is a crucial step. In the studies conducted by [26,27], a method was employed to quantify the complexity of scenarios based on the location and number of surrounding vehicles. The evaluation results were verified through on-road tests. However, it should be noted that the vehicles used in these studies were not equipped with intelligence and were different from AVs. Therefore, obtaining an accurate validation of the evaluation results was not possible. In this study, one of the state-of-the-art ADS, Baidu Apollo, was utilized for validation to ensure the credibility of the simulation results. The 700 scenarios were derived from real-world crashes; the risk values obtained ranged from 0.41 to 0.71, and the scenario complexity values ranged from 0.32 to 0.69.
According to the results, the AV avoided a crash in 546 out of the 700 scenarios, which accounts for 78% of the total. Among the 546 avoidable scenarios, both the complexity and risk distributions fell within the lower range. Complexity was predominantly concentrated in the range of 0.40–0.50, while risk spanned from 0.45 to 0.53. On the other hand, the 154 unavoidable scenarios exhibited a broader range of complexity and risk compared to the avoidable scenarios. Complexity is mainly concentrated in the range of 0.37–0.48, and risk ranges from 0.50 to 0.58.
Delta V is used to characterize the level of danger. In the avoidable scenarios [49], complexity and risk are positively correlated with Delta V, meaning that as complexity and risk increase, the Delta V gradually becomes larger. On the other hand, the G T T C m i n is used to quantify the level of danger during the operation of the vehicle in the avoidable scenarios. As complexity and risk increase, the G T T C m i n tends to decrease. As the complexity and risk level of scenarios increase, AVs face greater challenges, raising the likelihood of crashes. Conversely, AVs can more effectively manage scenarios with lower complexity and risk.

6. Conclusions

This study presents a fuzzy comprehensive evaluation method based on the Spherical Fuzzy Analytical Network Process (SF-ANP) to address the challenges of imperfect scenario evaluation and the difficulties in quantifying evaluation metrics in vehicle virtual testing. The method takes into account the interactions among various traffic elements in vehicle scenarios by modeling these relationships as a network. To evaluate scenario data, a five-level numerical interval system is used, which helps to classify new data and improves the selection of scenarios for specific tests. This approach enhances efficiency and allows for the classification of critical scenarios. As a simple and versatile template, all newly collected scenarios can be quantified in a uniform way, and a scenario library serving different testing needs can be built in a reasonable way. In addition, the evaluation metrics derived from this method can be used as a benchmark for designing high-risk and effective test scenarios, and effective critical scenarios can be generated in large quantities by learning the distribution of scenarios through the scenario characteristics of the quantitative metrics. The validation using Baidu Apollo’s real-world crash data shows consistency between the proposed evaluation method and the simulation results.
Future research can propose adaptive assessment methods that will be tied to the simulation results through feedback mechanisms to continuously improve the accuracy of the assessment. Models with different algorithms will be used to test quantitative scenarios and enhance the adaptability of the evaluation system through corrections. We will consider more different functional indicators, including comfort, economy, etc., to emphasize the all-round performance of the autonomous vehicles on the basis of ensuring safety. Future research will focus on refining the test scenario evaluation system and applying it to vehicle safety testing. We will use different datasets to improve the coverage of the scenario assessment and further validate the assessment metrics. Using evaluation as the scenario standard and test results as the assessment index of vehicle function will greatly promote the autonomous vehicles. Systematic and standardized testing will help in the realization of intelligent transportation.
However, this study has some limitations. There are no established standards for evaluating testing scenarios, meaning that our scenario elements, based on current viewpoints, may need ongoing updates. The selected sub-elements may not fully capture the characteristics of the overall elements. As AV technology evolves, the evaluation criteria should expand beyond the current metrics. Also, the method used to quantify scenario risk and complexity relies on qualitative judgments, which could lead to errors. The scenarios used in this study are based on real-world crash data, but the number of scenarios is limited. To improve validation, it is recommended to expand the scenario library to include a broader range of real-world scenarios, such as natural driving data, to ensure comprehensive coverage.

Author Contributions

Conceptualization, Z.W. and R.Z.; methodology, Z.W.; software, Z.W.; validation, R.Z. and H.Z.; formal analysis, H.Z.; investigation, H.Z.; resources, R.Z.; data curation, R.Z.; writing—original draft preparation, Z.W.; writing—review and editing, R.Z. and H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Natural Science Foundation of Hunan Province, China. No.2024JJ7624.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy and ethical restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Element parameter evaluation level (complexity).
Table A1. Element parameter evaluation level (complexity).
C1: Ego VehicleC2: RoadC3: Traffic InfrastructureC4: ObjectsC5: EnvironmentValue
C11C12C13C21C22C23C24C25C31C32C41C42C43C44C45C51C52Value
Velocity > 80%×v or Acceleration > 80%×aGo StraightOthersStraightwayDry0% ∼ 3%1 Number of Obstacles = 0 or Position Relative to Ego Vehicle [30, +]MPVs, SUVsGo Straight0Longitudinal Same DirectionFrontSunnyDaytime With Good Lighting0.2
80%×v > Velocity > 60%×v or 80%×a > Acceleration > 60%×aTurn LeftSedans, VehiclesRoad Entrance and ExitWet3% ∼ 4%2Evenness Number of Obstacles = 1 or Position Relative to Ego Vehicle [20, 30]Sedans, VehiclesTurn Left1Longitudinal Opposite DirectionLeft Front, Right FrontCloudyDaytime0.4
60%×v > Velocity > 40%×v or 60%×a > Acceleration > 40%×aTurn RightMPVs, SUVsThree-leg IntersectionSlipping5%∼ 7%4 Traffic LightsNumber of Obstacles = 1 or Position Relative to Ego Vehicle [10, 20]Trucks, BusTurn Right2From Right to LeftLeft, RightRainEvening or Dusk With Poor Lighting0.6
40%×v > Velocity > 20%×v or 40%×a > Acceleration > 20%×aStopLight TrucksFour-leg IntersectionWaterlogged8%∼ 9%6UnevennessTraffic SignsNumber of Obstacles = 2 or Position Relative to Ego Vehicle [5, 10]PTWsStop3From Left to RightLeft Rear, Right RearFoggyNighttime With Streetlight0.8
OthersOthersTrucks, BusOthersFreezing or OthersOthers8 or More Road MarksNumber of Obstacles ≥ 3 or Position Relative to Ego Vehicle < 5 mPedestrians or BicycleOthers4+OthersRearSnow, SandstormNighttime Without Streetlight1.0
Table A2. Element parameter evaluation level (Risk).
Table A2. Element parameter evaluation level (Risk).
C1: Ego VehicleC2: RoadC3: Traffic InfrastructureC4: ObjectsC5: EnvironmentValue
C11C12C13C21C22C23C24C25C31C32C41C42C43C44C45C51C52Value
OthersOthersOthersOthersDry0% ∼ 3%1 Number of Obstacles = 0 or Position Relative to Ego Vehicle [30, +]Sedans, VehiclesOthers0Longitudinal Same DirectionFrontSunnyDaytime With Good Lighting0.2
40%×v > Velocity > 20%×v or 40%×a > Acceleration > 20%×aGo StraightSedans, VehiclesStraightwayWet3% ∼ 4%2EvennessTraffic LightsNumber of Obstacles = 1 or Position Relative to Ego Vehicle [20, 30]MPVs, SUVsGo Straight1Longitudinal Opposite DirectionLeft Front, Right FrontCloudyDaytime0.4
60%×v > Velocity > 40%×v or 60%×a > Acceleration > 40%×aTurn LeftMPVs, SUVsFour-leg IntersectionSlipping5%∼ 7%4 Traffic LightsNumber of Obstacles = 1 or Position Relative to Ego Vehicle [10, 20]Trucks, BusTurn Left2From Right to LeftLeft, RightRainEvening or Dusk With Poor Lighting0.6
80%×v > Velocity > 60%×v or 80%×a > Acceleration > 60%×aTurn RightLight TrucksThree-leg IntersectionWaterlogged8%∼ 9%6Unevenness Number of Obstacles = 2 or Position Relative to Ego Vehicle [5, 10]PTWsTurn Right3From Left to RightLeft Rear, Right RearFoggyNighttime With Streetlight0.8
Velocity > 80%×v or Acceleration > 80%×aStopTrucks, BusRoad Entrance and ExitFreezing or OthersOthers8 or More Road MarksNumber of Obstacles ≥ 3 or Position Relative to Ego Vehicle < 5 mPedestrians or BicycleStop4+OthersRearSnow, SandstormNighttime Without Streetlight1.0

References

  1. Ben Abdessalem, R.; Nejati, S.; Briand, L.C.; Stifter, T. Testing advanced driver assistance systems using multi-objective search and neural networks. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, Singapore, 3–7 September 2016; pp. 63–74. [Google Scholar]
  2. Leledakis, A.; Lindman, M.; Östh, J.; Wågström, L.; Davidsson, J.; Jakobsson, L. A method for predicting crash configurations using counterfactual simulations and real-world data. Accid. Anal. Prev. 2021, 150, 105932. [Google Scholar] [CrossRef] [PubMed]
  3. Elrofai, H.; Paardekooper, J.P.; de Gelder, E.; Kalisvaart, S.; den Camp, O.O. Scenario-Based Safety Validation of Connected and Automated Driving, TNO, Technical Report; Netherlands Organization for Applied Scientific Research: Haag, The Netherlands, 2018. [Google Scholar]
  4. Kalra, N.; Paddock, S.M. Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability? Transp. Res. Part A Policy Pract. 2016, 94, 182–193. [Google Scholar] [CrossRef]
  5. Scanlon, J.M.; Kusano, K.D.; Daniel, T.; Alderson, C.; Ogle, A.; Victor, T. Waymo simulated driving behavior in reconstructed fatal crashes within an autonomous vehicle operating domain. Accid. Anal. Prev. 2021, 163, 106454. [Google Scholar] [CrossRef] [PubMed]
  6. Ding, W.; Xu, C.; Arief, M.; Lin, H.; Li, B.; Zhao, D. A survey on safety-critical driving scenario generation—A methodological perspective. IEEE Trans. Intell. Transp. Syst. 2023, 24, 6971–6988. [Google Scholar] [CrossRef]
  7. Ulbrich, S.; Menzel, T.; Reschka, A.; Schuldt, F.; Maurer, M. Defining and substantiating the terms scene, situation, and scenario for automated driving. In Proceedings of the 2015 IEEE 18th International Conference on Intelligent Transportation Systems, Gran Canaria, Spain, 15–18 September 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 982–988. [Google Scholar]
  8. Wei, Z.; Huang, H.; Zhang, G.; Zhou, R.; Luo, X.; Li, S.; Zhou, H. Interactive Critical Scenario Generation for Autonomous Vehicles Testing Based on In-depth Crash Data Using Reinforcement Learning. IEEE Trans. Intell. Veh. 2024, 1–12. [Google Scholar] [CrossRef]
  9. Zhou, R.; Lin, Z.; Zhang, G.; Huang, H.; Zhou, H.; Chen, J. Evaluating Autonomous Vehicle Safety Performance Through Analysis of Pre-Crash Trajectories of Powered Two-Wheelers. IEEE Trans. Intell. Transp. Syst. 2024, 25, 13560–13572. [Google Scholar] [CrossRef]
  10. Feng, S.; Feng, Y.; Yu, C.; Zhang, Y.; Liu, H.X. Testing scenario library generation for connected and automated vehicles, part I: Methodology. IEEE Trans. Intell. Transp. Syst. 2020, 22, 1573–1582. [Google Scholar] [CrossRef]
  11. Nitsche, P.; Thomas, P.; Stuetz, R.; Welsh, R. Pre-crash scenarios at road junctions: A clustering method for car crash data. Accid. Anal. Prev. 2017, 107, 137–151. [Google Scholar] [CrossRef]
  12. Riedmaier, S.; Ponn, T.; Ludwig, D.; Schick, B.; Diermeyer, F. Survey on scenario-based safety assessment of automated vehicles. IEEE Access 2020, 8, 87456–87477. [Google Scholar] [CrossRef]
  13. Yang, S.; Gao, L.; Zhao, Y.; Li, X. Research on the quantitative evaluation of the traffic environment complexity for unmanned vehicles in urban roads. IEEE Access 2021, 9, 23139–23152. [Google Scholar] [CrossRef]
  14. Zhang, S.; Tak, T. Risk analysis of autonomous vehicle test scenarios using a novel analytic hierarchy process method. IET Intell. Transp. Syst. 2024, 18, 794–807. [Google Scholar] [CrossRef]
  15. Cheng, Y.; Liu, Z.; Gao, L.; Zhao, Y.; Gao, T. Traffic risk environment impact analysis and complexity assessment of autonomous vehicles based on the potential field method. Int. J. Environ. Res. Public Health 2022, 19, 10337. [Google Scholar] [CrossRef] [PubMed]
  16. Wang, J.; Wu, J.; Zheng, X.; Ni, D.; Li, K. Driving safety field theory modeling and its application in pre-collision warning system. Transp. Res. Part C Emerg. Technol. 2016, 72, 306–324. [Google Scholar] [CrossRef]
  17. Zhang, L.; Ma, Y.; Xing, X.; Xiong, L.; Chen, J. Research on the complexity quantification method of driving scenarios based on information entropy. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 3476–3481. [Google Scholar]
  18. Schuldt, F. Towards Testing of Automated Driving Functions in Virtual Driving Environments. Ph.D. Thesis, Technical University Brunswick, Braunschweig, Germany, 2017. [Google Scholar]
  19. Bagschik, G.; Menzel, T.; Maurer, M. Ontology based scene creation for the development of automated vehicles. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1813–1820. [Google Scholar]
  20. Sauerbier, J.; Bock, J.; Weber, H.; Eckstein, L. Definition of scenarios for safety validation of automated driving functions. ATZ Worldw. 2019, 121, 42–45. [Google Scholar] [CrossRef]
  21. Thorn, E.; Kimmel, S.C.; Chaka, M.; Hamilton, B.A. A Framework for Automated Driving System Testable Cases and Scenarios; Technical Report; United States, Department of Transportation, National Highway Traffic Safety Administration: Washington, DC, USA, 2018. [Google Scholar]
  22. Aradi, S. Survey of deep reinforcement learning for motion planning of autonomous vehicles. IEEE Trans. Intell. Transp. Syst. 2020, 23, 740–759. [Google Scholar] [CrossRef]
  23. Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Al Sallab, A.A.; Yogamani, S.; Pérez, P. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transp. Syst. 2021, 23, 4909–4926. [Google Scholar] [CrossRef]
  24. Zhao, D.; Guo, Y.; Jia, Y.J. Trafficnet: An open naturalistic driving scenario library. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–8. [Google Scholar]
  25. Wang, Y.; Li, K.; Hu, Y.; Chen, H. Modeling and quantitative assessment of environment complexity for autonomous vehicles. In Proceedings of the 2020 Chinese Control And Decision Conference (CCDC), Hefei, China, 22–24 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 2124–2129. [Google Scholar]
  26. Yu, R.; Zheng, Y.; Qu, X. Dynamic driving environment complexity quantification method and its verification. Transp. Res. Part C Emerg. Technol. 2021, 127, 103051. [Google Scholar] [CrossRef]
  27. Wang, J.; Zhang, C.; Liu, Y.; Zhang, Q. Traffic sensory data classification by quantifying scenario complexity. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1543–1548. [Google Scholar]
  28. Rudnik, K.; Chwastyk, A.; Pisz, I. Approach Based on the Ordered Fuzzy Decision Making System Dedicated to Supplier Evaluation in Supply Chain Management. Entropy 2024, 26, 860. [Google Scholar] [CrossRef]
  29. Bulut, M.S.; Ordu, M.; Der, O.; Basar, G. Sustainable Thermoplastic Material Selection for Hybrid Vehicle Battery Packs in the Automotive Industry: A Comparative Multi-Criteria Decision-Making Approach. Polymers 2024, 16, 2768. [Google Scholar] [CrossRef]
  30. Saaty, T.L. Decision Making with Dependence and Feedback: The Analytic Network Process; RWS Publications: Pittsburgh, PA, USA, 1996; Volume 4922. [Google Scholar]
  31. Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
  32. Zadeh, L.A. The concept of a linguistic variable and its application to approximate reasoning—I. Inf. Sci. 1975, 8, 199–249. [Google Scholar] [CrossRef]
  33. Atanassov, K.T.; Atanassov, K.T. Intuitionistic Fuzzy Sets; Springer: Berlin/Heidelberg, Germany, 1999. [Google Scholar]
  34. Yager, R.R. Pythagorean fuzzy subsets. In Proceedings of the 2013 Joint IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), Edmonton, AB, Canada, 24–28 June 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 57–61. [Google Scholar]
  35. Smarandache, F. Neutrosophic logic-a generalization of the intuitionistic fuzzy logic. Multispace Multistructure Neutrosophic Transdiscipl. 2010, 4, 396. [Google Scholar] [CrossRef]
  36. Torra, V. Hesitant fuzzy sets. Int. J. Intell. Syst. 2010, 25, 529–539. [Google Scholar] [CrossRef]
  37. Kutlu Gündoğdu, F.; Kahraman, C. Spherical fuzzy sets and spherical fuzzy TOPSIS method. J. Intell. Fuzzy Syst. 2019, 36, 337–352. [Google Scholar] [CrossRef]
  38. Erdoğan, M.; Kaya, İ.; Karaşan, A.; Çolak, M. Evaluation of autonomous vehicle driving systems for risk assessment based on three-dimensional uncertain linguistic variables. Appl. Soft Comput. 2021, 113, 107934. [Google Scholar] [CrossRef]
  39. Zhang, W.; Lai, T.; Li, Y. Risk assessment of water supply network operation based on ANP-fuzzy comprehensive evaluation method. J. Pipeline Syst. Eng. Pract. 2022, 13, 04021068. [Google Scholar] [CrossRef]
  40. Huang, J.J.; Chen, C.Y. Using Markov Random Field and Analytic Hierarchy Process to Account for Interdependent Criteria. Algorithms 2023, 17, 1. [Google Scholar] [CrossRef]
  41. Yazo-Cabuya, E.J.; Ibeas, A.; Herrera-Cuartas, J.A. Integration of Sustainability in Risk Management and Operational Excellence through the VIKOR Method Considering Comparisons between Multi-Criteria Decision-Making Methods. Sustainability 2024, 16, 4585. [Google Scholar] [CrossRef]
  42. Zhang, C.; Huang, Y.; Zhou, D.; Dong, Z.; He, S.; Zhou, Z. A MCDM-Based Analysis Method of Testability Allocation for Multi-Functional Integrated RF System. Electronics 2024, 13, 3618. [Google Scholar] [CrossRef]
  43. Kutlu Gündoğdu, F.; Kahraman, C. A novel spherical fuzzy analytic hierarchy process and its renewable energy application. Soft Comput. 2020, 24, 4607–4621. [Google Scholar] [CrossRef]
  44. Golden, B.L.; Wasil, E.A.; Harker, P.T. The analytic hierarchy process. Appl. Stud. 1989, 2, 1–273. [Google Scholar]
  45. Li, G.; Li, Y.; Craig, B.; Liu, X. Investigating the effect of contextual factors on driving: An experimental study. Transp. Res. Part F Traffic Psychol. Behav. 2022, 88, 69–80. [Google Scholar] [CrossRef]
  46. Wang, J.; Huang, H.; Li, Y.; Zhou, H.; Liu, J.; Xu, Q. Driving risk assessment based on naturalistic driving study and driver attitude questionnaire analysis. Accid. Anal. Prev. 2020, 145, 105680. [Google Scholar] [CrossRef] [PubMed]
  47. Tanshi, F.; Söffker, D. Determination of takeover time budget based on analysis of driver behavior. IEEE Open J. Intell. Transp. Syst. 2022, 3, 813–824. [Google Scholar] [CrossRef]
  48. Wang, J.; Huang, H.; Li, K.; Li, J. Towards the unified principles for level 5 autonomous vehicles. Engineering 2021, 7, 1313–1325. [Google Scholar] [CrossRef]
  49. Ding, C.; Rizzi, M.; Strandroth, J.; Sander, U.; Lubbe, N. Motorcyclist injury risk as a function of real-life crash speed and other contributing factors. Accid. Anal. Prev. 2019, 123, 374–386. [Google Scholar] [CrossRef]
Figure 1. ANP construct (scenario complexity).
Figure 1. ANP construct (scenario complexity).
Applsci 14 09866 g001
Figure 2. ANP construct (scenario risk).
Figure 2. ANP construct (scenario risk).
Applsci 14 09866 g002
Figure 3. Global weights of sub-elements (risk and complexity).
Figure 3. Global weights of sub-elements (risk and complexity).
Applsci 14 09866 g003
Figure 4. Evaluation result of 700 scenarios.
Figure 4. Evaluation result of 700 scenarios.
Applsci 14 09866 g004
Figure 5. Distribution of evaluation results.
Figure 5. Distribution of evaluation results.
Applsci 14 09866 g005
Figure 6. Comparison of simulation and evaluation results: (a) Delta V in unavoidable scenarios; (b) G T T C m i n in avoidable scenarios.
Figure 6. Comparison of simulation and evaluation results: (a) Delta V in unavoidable scenarios; (b) G T T C m i n in avoidable scenarios.
Applsci 14 09866 g006
Table 1. Characteristics of the models.
Table 1. Characteristics of the models.
ModelsContributionsGaps
Five-layer [18]Proposing a generalized scenario hierarchyExcludes the state of the ego vehicle; too little description of interactions in traffic
Refined Five-layer [19]Elements were modified based on the original structureExcludes the state of the ego vehicle, with some scenario-level interactions
Six-layer [20]With a new layer of digital communicationExcludes the state of the ego vehicle; content features interactions
Table 2. Basic elements of testing scenario.
Table 2. Basic elements of testing scenario.
LayerCategory
Motion
Ego VehicleDDT
Types
Road Segment Type
Road Surface Conditions
RoadRoad Slope
The Number of Lanes
Road Leveling
Traffic InfrastructureSignal Control
Obstacle
Type
Behavior
ObjectsNumber of Objects
Relative Motion Direction to Ego Vehicle
Position Relative to Ego Vehicle
EnvironmentWeather
Lighting
Table 3. The linguistic scale for the SF-ANP method.
Table 3. The linguistic scale for the SF-ANP method.
Linguistic Term ( μ , υ , π ) SI
AMI(0.9, 0.1, 0.0)9
VHI(0.8, 0.2, 0.1)7
HI(0.7, 0.3, 0.2)5
SMI(0.6, 0.4, 0.3)3
EI(0.5, 0.4, 0.4)1
SLI(0.4, 0.6, 0.3)1/3
LI(0.3, 0.7, 0.2)1/5
VLI(0.2, 0.8, 0.1)1/7
ALI(0.1, 0.9, 0.0)1/9
Table 4. Scenario complexity comparison matrix (ego-vehicle).
Table 4. Scenario complexity comparison matrix (ego-vehicle).
C1 Ego VehicleC1 Ego VehicleC2 RoadC3 Traffic InfrastructureC4 ObjectsC5 Environment
C1 Ego Vehicle[0.5, 0.4, 0.4] 1[0.2, 0.8, 0.1] 1/7[0.2, 0.8, 0.1] 1/7[0.1, 0.9, 0.0] 1/9[0.3, 0.7, 0.2] 1/5
C2 Road[0.8, 0.2, 0.1] 7[0.5, 0.4, 0.4] 1[0.6, 0.4, 0.3] 3[0.4, 0.6, 0.3] 1/3[0.6, 0.4, 0.3] 3
C3 Traffic Infrastructure[0.8, 0.2, 0.1] 7[0.4, 0.6, 0.3] 1/3[0.5, 0.4, 0.4] 1[0.4, 0.6, 0.3] 1/3[0.6, 0.4, 0.3] 3
C4 Objects[0.9, 0.1, 0.0] 9[0.6, 0.4, 0.3] 3[0.6, 0.4, 0.3] 3[0.5, 0.4, 0.4] 1[0.6, 0.4, 0.3] 3
C5 Environment[0.7, 0.3, 0.2] 5[0.4, 0.6, 0.3] 1/3[0.4, 0.6, 0.3] 1/3[0.4, 0.6, 0.3] 1/3[0.5, 0.4, 0.4] 1
CR 0.0958
Table 5. Scenario risk comparison matrix (ego-vehicle).
Table 5. Scenario risk comparison matrix (ego-vehicle).
C1 Ego VehicleC1 Ego VehicleC2 RoadC3 Traffic InfrastructureC4 ObjectsC5 Environment
C1 Ego Vehicle[0.5, 0.4, 0.4] 1[0.1, 0.9, 0.0] 1/9[0.1, 0.9, 0.0] 1/9[0.1, 0.9, 0.0] 1/9[0.1, 0.9, 0.0] 1/9
C2 Road[0.9, 0.1, 0.0] 9[0.5, 0.4, 0.4] 1[0.6, 0.4, 0.3] 3[0.5, 0.4, 0.4] 1[0.4, 0.6, 0.3] 1/3
C3 Traffic Infrastructure[0.9, 0.1, 0.0] 9[0.4, 0.6, 0.3] 1/3[0.5, 0.4, 0.4] 1[0.4, 0.6, 0.3] 1/3[0.4, 0.6, 0.3] 1/3
C4 Objects[0.9, 0.1, 0.0] 9[0.5, 0.4, 0.4] 1[0.6, 0.4, 0.3] 3[0.5, 0.4, 0.4] 1[0.4, 0.6, 0.3] 1/3
C5 Environment[0.9, 0.1, 0.0] 9[0.6, 0.4, 0.3] 3[0.6, 0.4, 0.3] 3[0.6, 0.4, 0.3] 3[0.5, 0.4, 0.4] 1
CR 0.0780
Table 6. Scenario complexity comparison matrix (ego vehicle behavior–road).
Table 6. Scenario complexity comparison matrix (ego vehicle behavior–road).
C11 Ego Vehicle BehaviorC21 Road Segment TypeC22 Road Surface ConditionsC23 Road SlopeC24 The Number of LanesC25 Road Leveling
C21 Road Segment Type[0.5, 0.4, 0.4] 1[0.7, 0.3, 0.2] 5[0.8, 0.2, 0.1] 7[0.6, 0.4, 0.3] 3[0.7, 0.3, 0.2] 5
C22 Road Surface Conditions[0.3, 0.7, 0.2] 1/5[0.5, 0.4, 0.4] 1[0.6, 0.4, 0.3] 3[0.4, 0.6, 0.3] 1/3[0.5, 0.4, 0.4] 1
C23 Road Slope[0.2, 0.8, 0.1] 1/7[0.4, 0.6, 0.3] 1/3[0.5, 0.4, 0.4] 1[0.3, 0.7, 0.2] 1/5[0.4, 0.6, 0.3] 1/3
C24 The Number of Lanes[0.4, 0.6, 0.3] 1/3[0.6, 0.4, 0.3] 3[0.7, 0.3, 0.2] 5[0.5, 0.4, 0.4] 1[0.7, 0.3, 0.2] 5
C25 Road Leveling[0.3, 0.7, 0.2] 1/5[0.5, 0.4, 0.4] 1[0.6, 0.4, 0.3] 3[0.3, 0.7, 0.2] 1/5[0.5, 0.4, 0.4] 1
CR 0.0467
Table 7. Scenario risk comparison matrix (ego vehicle behavior–road).
Table 7. Scenario risk comparison matrix (ego vehicle behavior–road).
C11 Ego Vehicle BehaviorC21 Road Segment TypeC22 Road Surface ConditionsC23 Road SlopeC24 The Number of LanesC25 Road Leveling
C21 Road Segment Type[0.5, 0.4, 0.4] 1[0.4, 0.6, 0.3] 1/3[0.7, 0.3, 0.2] 5[0.7, 0.3, 0.2] 5[0.7, 0.3, 0.2] 5
C22 Road Surface Conditions[0.6, 0.4, 0.3] 3[0.5, 0.4, 0.4] 1[0.8, 0.2, 0.1] 7[0.9, 0.1, 0.0] 9[0.8, 0.2, 0.1] 7
C23Road Slope[0.3, 0.7, 0.2] 1/5[0.2, 0.8, 0.1] 1/7[0.5, 0.4, 0.4] 1[0.6, 0.4, 0.3] 3[0.5, 0.4, 0.4] 1
C24 The Number of Lanes[0.3, 0.7, 0.2] 1/5[0.1, 0.9, 0.0] 1/9[0.4, 0.6, 0.3] 1/3[0.5, 0.4, 0.4] 1[0.4, 0.6, 0.3] 1/3
C25 Road Leveling[0.3, 0.7, 0.2] 1/5[0.2, 0.8, 0.1] 1/7[0.5, 0.4, 0.4] 1[0.6, 0.4, 0.3] 3[0.5, 0.4, 0.4] 1
CR 0.0479
Table 8. Descriptive statistics of the scenario evaluation and testing results (N = 700).
Table 8. Descriptive statistics of the scenario evaluation and testing results (N = 700).
Scenario Evaluation ValueMeanStandard DeviationSE of Mean VarianceVarianceMinimumMedianMaximum
Risk0.53140.05140.00190.00260.41010.52830.7185
Complexity0.47800.06220.00240.00390.31910.47540.6910
Table 9. Shapiro–Wilk test results (N = 700).
Table 9. Shapiro–Wilk test results (N = 700).
CategoryW-Valuep-ValueCorrelation Coefficientp-Value
Complexity0.99001.14 × 10 4 0.63<0.01
Risk0.99322.75 × 10 3
Table 10. Shapiro–Wilk test results (unavoidable scenario, N = 154).
Table 10. Shapiro–Wilk test results (unavoidable scenario, N = 154).
Categoryp-ValueW-ValueSpearman Coefficient (p-Value)Correlation Coefficient
Complexity0.0030.9722<0.010.21
Delta V3.02 × 10 5 0.9508
Risk0.0060.9746<0.010.36
Delta V3.02 × 10 5 0.9508
Table 11. Shapiro–Wilk test results (avoided scenario, N = 546).
Table 11. Shapiro–Wilk test results (avoided scenario, N = 546).
Categoryp-ValueW-ValueSpearman Coefficient (p-Value)Correlation Coefficient
Complexity0.390.9970<0.01−0.31
G T T C m i n 1.48 × 10 27 0.7575
Risk0.010.9928<0.01−0.35
G T T C m i n 1.48 × 10 27 0.7575
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wei, Z.; Zhou, H.; Zhou, R. Risk and Complexity Assessment of Autonomous Vehicle Testing Scenarios. Appl. Sci. 2024, 14, 9866. https://doi.org/10.3390/app14219866

AMA Style

Wei Z, Zhou H, Zhou R. Risk and Complexity Assessment of Autonomous Vehicle Testing Scenarios. Applied Sciences. 2024; 14(21):9866. https://doi.org/10.3390/app14219866

Chicago/Turabian Style

Wei, Zhiyuan, Hanchu Zhou, and Rui Zhou. 2024. "Risk and Complexity Assessment of Autonomous Vehicle Testing Scenarios" Applied Sciences 14, no. 21: 9866. https://doi.org/10.3390/app14219866

APA Style

Wei, Z., Zhou, H., & Zhou, R. (2024). Risk and Complexity Assessment of Autonomous Vehicle Testing Scenarios. Applied Sciences, 14(21), 9866. https://doi.org/10.3390/app14219866

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop