An Explainable Data-Driven Optimization Method for Unmanned Autonomous System Performance Assessment

Yi, Hang; Zhang, Haisong; Wang, Hao; Wang, Wenming; Jia, Lixin; Feng, Lihang; Wang, Dong

doi:10.3390/electronics13224469

Open AccessArticle

An Explainable Data-Driven Optimization Method for Unmanned Autonomous System Performance Assessment

by

Hang Yi

¹,

Haisong Zhang

¹,

Hao Wang

¹,

Wenming Wang

¹,

Lixin Jia

²,

Lihang Feng

^2,*

and

Dong Wang

³

¹

Beijing Aerospace Wanyuan Science and Technology Company Ltd., Beijing 102676, China

²

College of Electrical Engineering and Control Science, Nanjing Tech University, Nanjing 211800, China

³

School of Instrument Science and Engineering, Southeast University, Nanjing 210096, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(22), 4469; https://doi.org/10.3390/electronics13224469

Submission received: 12 September 2024 / Revised: 29 October 2024 / Accepted: 1 November 2024 / Published: 14 November 2024

(This article belongs to the Special Issue Advanced Control Strategies and Applications of Multi-Agent Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Unmanned autonomous systems (UASs), including drones and robotics, are widely employed across various fields. Despite significant advances in AI-enhanced intelligent systems, there remains a notable deficiency in the interpretability and comprehensive quantitative evaluation of these systems. The existing literature has primarily focused on constructing evaluation frameworks and methods, but has often overlooked the rationality and reliability of these methods. To address these challenges, this paper proposes an innovative optimization evaluation method for data-driven unmanned autonomous systems. By optimizing the weights of existing indicators based on data distribution characteristics, this method enhances the stability and reliability of assessment outcomes. Furthermore, interpretability techniques such as Local Interpretable Model-agnostic Explanations (LIMEs) and Partial Dependence Plots (PDPs) were employed to verify the effectiveness of the designed evaluation indicators, thereby ensuring the robustness of the evaluation system. The experimental results validated the effectiveness of the proposed approach.

Keywords:

unmanned autonomous system; intelligent system evaluation; model-agnostic explanation method; adaptive iterative method

1. Introduction

With the development of artificial intelligence technology, unmanned autonomous systems are utilized in diverse fields such as UAVs [1,2,3,4,5], vehicles [6,7,8], and industrial robotics [9,10,11]. Compared to traditional manual control methods, unmanned intelligent systems integrated with artificial intelligence algorithms significantly enhance operational efficiency.

Although intelligent systems across various fields enhanced with artificial intelligence technology can effectively improve efficiency, the majority of AI algorithms lack interpretability and comprehensive quantitative evaluation in relation to their tasks. Therefore, there is an urgent need for research into the corresponding evaluation methods for performance assessment. Numerous scholars have already evaluated intelligent systems across diverse domains. In the domain of unmanned aerial vehicles (UAVs), Han et al. [1] introduced the AHP-FCE method for evaluating UAV swarm systems, which, while effective, leans on subjective expert judgment. This approach may conflict with the LIME framework, which could reveal minimal impact from indicators considered important by experts, questioning the objectivity of weight distribution in AHP-FCE. Zhu et al. [5] developed an expert-dependent model for UAV photography performance. This model, validated through UAV data, might not reflect real-world scenarios due to inconsistent expert knowledge. The PDP framework could challenge this model by showing the average feature impact that contradicts the subjective weights assigned by experts. Alharasees et al. [2] integrated the OODA loop with AHP for the AI-UAV systems, focusing on operational performance. Despite its effectiveness, this method could overlook certain interactions not captured by OODA-AHP integration. The SHAP values framework might unveil these interactions, potentially leading to differing evaluation outcomes.

In the realm of intelligent vehicles, Dong et al.’s AHP-GA [8] method addressed the inconsistencies in traditional AHP matrix parameters but was computationally intensive and sensitive to the initial parameters. The ALE framework could expose feature correlations that AHP-GA might miss, resulting in inconsistent evaluation results. Sun et al. [7] proposed the EAHP for unmanned ground vehicle performance, which, despite its hierarchical structure, does not account for interdependencies between indicators. Global interpretability methods like PDP might reveal overall feature impact, highlighting the limitations of EAHP’s hierarchical approach. Chen et al.’s comprehensive evaluation method [6] for URAT incorporates fuzzy logic and neural networks but might not align with local interpretability methods like LIME due to its complexity.

In industrial robotics, Fei et al.’s enhanced network learning method [9] for TFRM parameters could lead to discrepancies when evaluated through SHAP values due to computational demands and parameter sensitivity. Leitzke et al.’s Robotstone benchmark [11] refines traditional approaches but might not adapt well to technological advancements, potentially leading to inconsistent results when contrasted with PDP. Sheh’s ML-based evaluation system [10], while theoretically sound, might face practical challenges and discrepancies when evaluated through LIME due to its reliance on existing ML frameworks.

In summary, the existing body of research has made significant strides in evaluating the capabilities and overall performance of unmanned autonomous systems across a spectrum of application scenarios. These efforts have resulted in the development of various assessment frameworks and corresponding methodologies. However, there is a conspicuous absence of in-depth analysis on the rationality and effectiveness of these evaluation methods. The current literature also lacks extensive discourse on the reliability of the evaluation methods that have been designed and proposed. This underscores a critical gap in the field, as the need for objective and reliable evaluation methods is more pertinent than ever.

To bridge this gap, this study introduces a pioneering optimization evaluation methodology [12] that is predicated on the stability of data distribution characteristics. This methodology is innovative in its aim to enhance the rationality and objectivity of assessments for unmanned autonomous systems (UASs). It leverages the inherent constancy of overall data distribution features that emerge during the evaluation process. Furthermore, it employs an adaptive iterative algorithm to refine the weights of existing performance metrics, thereby significantly bolstering the stability of performance evaluation results for UASs. To further augment the reliability of the model’s evaluative outcomes, interpretability methods were seamlessly integrated into the evaluation system. This paper built upon a method previously outlined in related publications by the same authors, who also contributed to the present work. They had initially presented the foundational framework for this optimization strategy. The current research expanded on that methodology, providing exhaustive examination and validation through a comprehensive interpretability analysis. This rigorous approach reaffirmed the applicability and effectiveness of the proposed methodology in assessing UAS performance, ensuring that it met the demands of the evolving landscape of unmanned autonomous systems.

2. Problem Statement

2.1. Existing Evaluation Methods

Autonomous systems customized indicator selection based on task type and scenario, incorporating expert knowledge to build the evaluation framework. The weighting of the indicators in this framework was achieved using subjective, objective, and hybrid methodologies:

Subjective evaluation methods assigned weights based on expert judgment. A widely used approach was the Delphi method [13], which gathered expert consensus to establish indicator weights. Although simple and leveraging expert insights, it can be prone to bias and inconsistencies due to varying expert opinions. The Analytic Hierarchy Process (AHP) [14] breaks down complex decisions into hierarchical structures, assigning weights through pairwise comparisons. While systematic, this method may be subject to individual biases and become cumbersome with numerous indicators. The network analysis method [15] studies relationships and dependencies between factors through a network structure, offering insights into factor interactions. However, it can be challenging to model and often requires significant data for accurate representation;
Objective evaluation methods derive weights from the inherent characteristics of data. Techniques such as principal component analysis (PCA) [16] reduce the dimensionality of data and assign weights based on the variance explained by principal components. This is effective for handling large datasets but may neglect the significance of individual indicators not captured in the components. Information weighting techniques [17] allocate weights based on the information contribution of each indicator, such as through information entropy. While quantitative, these methods might overlook contextual factors affecting indicator significance. The CRITIC method [18] assigns weights based on contrast intensity and criteria conflict, balancing indicator characteristics. Its performance depends on the quality and availability of statistical data. Factor analysis [19] explores hidden relationships between variables and assigns weights based on their contribution to overall variance, simplifying complex datasets but potentially obscuring the importance of individual indicators. Entropy weighting [20] assigns greater weights to indicators with more variability, offering a data-driven approach, but it may not fully capture practical relevance;
Hybrid evaluation methods integrate subjective and objective techniques for a more comprehensive assessment. A commonly used approach is fuzzy AHP [21], which combines fuzzy logic with AHP to address uncertainty in multi-criteria decision-making. This method improves adaptability but can be computationally demanding. The CRITIC-G1 method [22], which merges CRITIC and G1 techniques, is another hybrid method that provides a balanced evaluation but may require substantial computational resources and data.

2.2. Comparison and Analysis of Important Algorithms Within the Last Three Years

Optimization algorithms in the evaluation of unmanned autonomous systems have made significant progress in recent years. In order to provide a more comprehensive analysis and validation of our proposed particle swarm optimization (PSO) algorithm based on the stability of data features, this paper compares several representative algorithms over the past three years from various perspectives, including performance, accuracy and adaptability, as summarized in Table 1.

The table provides a comparative analysis of four algorithms for evaluating unmanned autonomous systems based on their computational time, accuracy, and adaptability. The AHP-FCE [1] is fast but less accurate and not suitable for large datasets. AHP-GA [8] has a longer computation time and is sensitive to initial parameters, risking local optima. CRITIC [18] is quick, but its accuracy declines with uneven data, and it is limited in dynamic tasks. PSO-WOA [12] offers improved convergence, though it is sensitive to noise and performs well across different scenarios.

2.3. Existing Model Interpretation Method

To ensure that the decision-making process of the designed evaluation system is transparent, and thus to allow users to trust and verify the evaluation results, it is necessary to conduct interpretability research on the designed evaluation system. However, current research on the interpretability of evaluation systems is limited. In this regard, we can refer to the interpretability studies of black-box models, that is, analyzing the causal relationship between the input features and the output results of the black-box model, thereby determining which features have a greater impact on the output results of the black-box model.

Existing model interpretability methods, based on the scope and granularity of model interpretability, are divided into global interpretability methods and local interpretability methods:

Global interpretability methods: Based on the entire dataset, these methods analyze the impact of each feature on the output results of the black-box model, identifying which feature or features have the greatest impact on the model’s output results. Common global interpretability methods include Partial Dependence Plots (PDPs) [23] and Accumulated Local Effect (ALE) plots [24]. PDPs show the impact of a feature on the model’s prediction results, assuming other features remain unchanged. It calculates the average impact of different values of a feature on the model’s prediction, given that all other features are fixed, thus revealing the relationship between a specific feature and the prediction result, helping to understand how the model uses that feature for predictions. It is intuitive, but may be misleading in the presence of high feature correlations. ALEs, by calculating the local effects of a feature at different values and accumulating them across all sample points, shows the feature’s impact on the model’s prediction results. ALEs consider the correlation between features, avoiding the potential misleading results of PDPs, making it suitable for handling data with high feature correlation, providing more reliable explanations. However, compared to PDPs, their calculations and interpretations are more complex;
Local interpretability methods: Based on individual samples or a group of samples, these methods interpret the prediction behavior of the black-box model from the perspective of individuals, i.e., for a specific input sample, identifying which features contribute most to the black-box model. Common local interpretability methods include Local Interpretable Model-agnostic Explanations (LIMEs) [25], Individual Conditional Expectation (ICE) [26], and Shapley Additive Explanations (SHAPs) [27]. LIMEs perturb the data around a specific sample and fits simple models such as linear models or decision trees to understand the local behavior of the complex model. While they provide local interpretability, they may not capture global model behavior. ICE plots show the impact of changes in feature values on prediction results for different samples, allowing the intuitive observation of the local effects of features on individual predictions. They offer insights into local effects but may not address interactions between features. SHAPs calculate the marginal contribution of each feature in different feature combinations and averages these contributions across all features, thus quantifying the impact of each feature on a specific prediction. However, they can be computationally expensive and complex.

2.4. Limitations of the Existing Evaluation Method

While these evaluation methods are widely used across various domains, they still face the following challenges:

The evaluation results are often not comprehensive enough for certain targets;
The results obtained from different methods within the same evaluation framework can vary significantly;
The impact of individual indicators on the evaluation outcomes is not adequately explained.

Addressing these issues requires a transparent, interpretable evaluation system specifically suited for assessing autonomous intelligent systems.

3. Proposed Method

This section introduces a method that enhances the stability and reliability of evaluation results by utilizing the particle swarm optimization (PSO) algorithm in conjunction with the stability of the data distribution to fine-tune weights.

3.1. Stability of Data Distribution Characteristics

The unmanned field exploration robot system, when deployed in complex environments such as outdoor settings, collects data from various perspectives during task execution. These data distribution characteristics objectively reflect the system’s performance, thereby serving as a crucial factor in optimizing the weights of evaluation indicators derived from existing assessment methodologies.

The collected data encompasses a range of parameters that are indicative of the system’s operational efficiency and decision-making capabilities. Specifically, the data includes motion planning decision time, task decision time, task decision accuracy, and environmental complexity. These parameters are measured across multiple trials to ensure a comprehensive understanding of the system’s performance under varying conditions.

These data are then processed to fill in missing values using a random Gaussian distribution, based on the mean and variance of the corresponding evaluation indices. The dataset is subsequently normalized by a column to obtain a normalized dataset, where each element’s value range is scaled to [0, 5]. This normalization process ensures that the data are comparable across different scales and units, facilitating a more accurate weight optimization process.

3.2. Data Feature Stability-Based Optimization Algorithm

This methodology employs the particle swarm optimization (PSO) algorithm to iteratively optimize the weights of four evaluation indicators for unmanned field exploration systems. The optimization function is crafted based on the mean and variance of the data distribution characteristics of the system indicators, thereby creating an optimization evaluation model that emphasizes the stability of the data features. Figure 1 illustrates the detailed implementation process.

(1) The completion and normalization of evaluation data: Initially, the evaluation data

D

, consisting of

m

rows and

n

columns (where

m

represents the number of data points and

n

represents the types of evaluation indices), undergoes preprocessing. The mean and variance of the evaluation indices are computed, and missing values are filled using a random Gaussian distribution, producing the completed dataset

\hat{X}

. The data are then normalized by column to form the normalized dataset

X

. For each evaluation index

j

, the minimum and maximum values of

{\hat{X}}_{j}

are identified, and the normalization is performed using the following formula:

X_{i, j} = \frac{10 \times ({\hat{X}}_{j} - m i n ({\hat{X}}_{j}))}{m a x ({\hat{X}}_{j}) - m i n ({\hat{X}}_{j})}

(1)

where

{\hat{X}}_{i, j}

is the

j

-th evaluation index for the

i

-th data point in the dataset

\hat{X}

, and

X_{i, j}

represents the normalized value. The range of each element in X is [0, 5];

(2) The initialization of evaluation index weights: The weight vector

W = {w_{1}, \dots, w_{n}}

is initialized. For subjective weighting, the weights are manually assigned, while for objective or combined weighting methods,

W

is determined based on the normalized data

X

, where each weight

w_{j}

corresponds to an evaluation index;

(3) Particle swarm optimization initialization: A set of s particles

G = {G_{1}, \dots, G_{k}, \dots, G_{s}}

is used to initialize the PSO. Each particle

G_{k}

is represented by a position vector

P_{k} = {p_{k_{1}}, \dots, p_{k_{n}}}

, with its initial positions corresponding to the weight vector

W

. The initial velocities

V_{k} = {v_{k_{1}}, \dots, v_{k_{n}}}

are randomly assigned values within the range [−0.002, 0.002], allowing the exploration of the solution space;

(4) The fitness calculation for each particle: the fitness function

F (G_{k})

evaluates each particle based on the stability of the evaluation index weights. The fitness function is defined as follows:

F (G_{k}) = var (G_{k}) + Penlty (G_{k})

(2)

where

v a r (G_{k})

represents the variance obtained by multiplying the weight elements of particle

G_{k}

(

k \in [1, s]

) with the normalized scores

X

, and

P e n l t y (G_{k})

assesses the validity of the weight vector. The variance

S^{2}

and penalty functions are expressed as

var (G_{k}) = \frac{1}{s - 1} \sum_{j = 1}^{n} {(\sum_{i = 1}^{m} (X_{i, j} \times p_{k_{j}}) - \frac{1}{s} \sum_{j = 1}^{n} \sum_{i = 1}^{m} (X_{i, j} \times p_{k_{j}}))}^{2}

(3)

Penlty (G_{k}) = 100 \times (\sum_{j = 1}^{n} χ (p_{k_{j}} < 0 | p_{k_{j}} > 1) + χ ((\sum_{j = 1}^{n} p_{k_{j}}) > 1))

(4)

where

χ (*)

is an indicator function that returns 1 if the condition is met and 0 otherwise,

p_{k_{j}}

represents the

k

-th weight of the

j

-th indicator of particles,

n

represents the number of evaluation indicators,

m

represents the amount of scoring data;

(5) Particle iteration and update: In each iteration, the optimal value

P_{k}^{B e s t}

is updated for each particle if its fitness improves. Simultaneously, the global best value

G_{B e s t}

is updated if any particle surpasses the current global optimum. The position and velocity of each particle are updated as follows:

\{\begin{matrix} V_{k}^{i} = ω V_{k}^{i - 1} + C_{1} \times rand (*) \times (P_{k}^{Best} - G_{k}^{i - 1}) + C_{2} \times rand (*) \times (G_{Best} - G_{k}^{i - 1}) \\ G_{k}^{i} = G_{k}^{i - 1} + V_{k}^{i} \end{matrix}

(5)

where

ω

represents the inertia weight,

V_{k}^{i}

denotes the particle’s velocity in the current iteration,

V_{k}^{i - 1}

is the velocity in the previous iteration,

G_{k}^{i - 1}

is the position in the previous iteration,

r a n d (*)

generates a random value from a normal distribution,

C_{1}

signifies confidence in the particle’s best-known position, and

C_{2}

represents confidence in the global best position.

This iterative process ensures that particles adjust their trajectories to explore and exploit the search space effectively;

(6) Termination condition: The algorithm checks whether the maximum number of iterations

M a x I t e r

has been reached or if

G_{B e s t}

meets a predefined threshold. If either condition is met, the optimal particle is output, and its corresponding weights are considered as the optimized weights for the evaluation indices. If not, the algorithm proceeds to the next iteration (Step 5).

3.3. Algorithmic Complexity Analysis

Prior to comparative experimental analysis, an examination of the complexity of the proposed weight optimization algorithm is essential. This complexity analysis facilitates an understanding of the computational efficiency and resource consumption of the algorithm, thereby highlighting potential areas for optimization.

Table 2 presents a complexity comparison of various algorithms, utilizing specific notational conventions:

The notations used in the complexity analysis are defined as follows: T represents the total number of iterations, which is a key determinant of the exhaustiveness of the search process. s denotes the swarm size in the PSO algorithm, influencing both the diversity of the search and the computational resources required. m signifies the number of data points, which directly impacts the algorithm’s precision and generalizability. n corresponds to the number of evaluation metrics, reflecting the dimensionality of the assessment framework. Lastly, p indicates the population size in genetic algorithms, which is associated with the algorithm’s exploratory capabilities. These parameters are essential for understanding the computational demands and scalability of the proposed and comparative algorithms.

In summary, the complexity analysis revealed that the proposed method offered a competitive balance between efficiency and adaptability, particularly suitable for complex and large-scale evaluation tasks. While the AHP-FCE and CRITIC methods exhibited lower computational demands, their accuracy and adaptability were compromised. Conversely, the AHP-GA and PSO-WOA methods, despite their higher computational requirements, provided enhanced performance and adapted better to varying conditions. These insights underscore the importance of selecting an algorithm that aligns with the specific requirements and constraints of the evaluation task at hand.

4. Experiments

To verify the effectiveness of the optimization evaluation method, this study took the planning and decision-making capabilities of an unmanned field exploration robot system as an example. First, the CRITIC method was used to determine the initial weights of the evaluation indicators, and then these weights were further refined through a weighted optimization process. Next, model interpretability techniques were used to thoroughly analyze the specific impact of each evaluation indicator on the evaluation results to ensure the effectiveness and reliability of the evaluation system.

4.1. Intelligent System Architecture

Utilizing the robots depicted in Figure 2, this study evaluated their planning and decision-making capabilities during field exploration tasks. The robots, designed for operation in complex or harsh environments, were equipped with advanced intelligent systems that facilitated task execution with minimal human intervention. The planning and decision-making capabilities refer to the robots’ capacity to effectively make decisions and formulate plans based on real-time environmental interactions.

The assessment framework for planning and decision-making capabilities was divided into three dimensions, as shown in Figure 3. The most basic evaluation metrics included those that can be directly measured when a robotic system performs field exploration tasks, namely:

(1): Motion planning decision time: the average time taken by a field exploration robot to initiate task execution after receiving a mission during multiple experiments;
(2): Task decision time: the average time taken from the commencement to the conclusion of the decision-making process in a field exploration robot across multiple experiments;
(3): Task decision accuracy: the ratio of successful decision-making instances to the total number of decisions made by a field exploration robot in accomplishing specified tasks during multiple experiments;
(4): Environmental complexity: the degree of complexity of the environment in which a field exploration robot formulates decision plans.

The intermediate evaluation indicators were derived from the basic metrics and were divided into two primary dimensions: temporal efficiency and decision effectiveness. These dimensions were utilized to measure the planning and decision-making capabilities of robotic systems under various environmental conditions.

The highest level of evaluation synthesized these intermediate indicators, providing a comprehensive assessment of the robot system’s planning and decision-making capabilities. This culminated in a quantitative evaluation that offered a nuanced understanding of the system’s performance, which in turn informed targeted enhancements for improved operational efficiency and decision accuracy.

4.2. Performance of the Optimizing Evaluation System

Within the purview of the performance evaluation system for the intelligent field exploration robot detection system as previously delineated, an objective evaluation method was initially applied to ascertain the weights of each evaluation indicator at various hierarchical levels within the system. Following this, the weight optimization approach referenced in this paper was employed to further refine these weights. To accomplish this, intelligent field exploration robots were deployed in uncharted terrains, amassing real-time data using an array of sensors. This included GPS for positional tracking, cameras for visual perception, ultrasonic sensors for proximity detection, and microelectromechanical systems (MEMS) sensors for measuring acceleration and angular velocity. A comprehensive dataset had been aggregated from the operational execution of 1200 intelligent field exploration robots, laying the empirical groundwork for the evaluation of metrics pertaining to planning and decision-making capabilities, as detailed in Table 3.

The initial weights for these evaluation indicators were assigned using the CRITIC method, yielding a starting set of weights: 0.253, 0.323, 0.213, 0.267. Thereafter, the meaning and variance of the data for each indicator, as presented in Table 3, were computed. These statistical parameters were then applied within the optimization algorithm to refine the weights of the evaluation indicators, with the goal of enhancing the evaluation system’s accuracy and reliability in assessing the robot’s planning and decision-making capabilities.

The weight optimization method for the evaluation system began by optimizing the initial weights (0.253, 0.323, 0.213, 0.196). The iteration count was set to 40, and the upper and lower bounds for each evaluation weight were established at 1.2 and 0.8 times the initial values, respectively. Additionally, the velocity limits were defined as 0.0009 and −0.0009. The parameter ω was set to 0.68, with both c1 and c2 set to 1.65. The objective function in this method iteratively updated the local and global optimal weight values throughout the process.

The updated evaluation weights are shown in Figure 4, illustrating the optimization process of obtaining the ideal weights for four indicators through 40 iterations. The optimized weights (0.365, 0.448, 0.118, 0.237) were applied to the original system to recalculate the field exploration robot’s environmental understanding capability, using the data from Table 2.

Figure 5 presents the recalculated results compared to the standard deviation of the original evaluation. Prior to optimization, the standard deviation of the evaluation scores was 3.435. After optimization, the standard deviation decreased to 3.0228, showing an approximate 10% improvement in stability.

4.3. Evaluation System Interpretability Verification

To ensure the credibility of the optimized evaluation system, we validated the reliability of the weights corresponding to the underlying evaluation metrics using global interpretable methods such as PDP and ALE, and local interpretable methods including LIME and SHAP.

(1): PDP

We employed the algorithms from Table 4 to validate the reliability of the optimized evaluation system. Here, the input dataset X aligns with the data in Table 2, and

X_{j}

denotes the feature under analysis.

In Figure 6, the interpretability results of the experimental PDP are shown. We can observe that environmental complexity (EC), motion planning decision time (MPDT), and task decision time (TDT) have negative impacts on the field detection capability of outdoor exploration robot systems, whereas task decision accuracy (TDA) has a positive effect. Additionally, by examining the range of values on each graph’s x-axis and y-axis, we find that EC has the greatest vertical span, followed by MPDT, with TDT having the smallest vertical span, consistent with the weight ratios optimized through the method proposed in this paper.

(2): ALE

We used the algorithm from Table 5 to validate the reliability of the optimized evaluation system.

The experimentally interpretable results are shown in Figure 6. The confidence interval in the figure was set to 95%, which was used to study the model’s confidence level on the effect of a certain feature. A narrower confidence interval indicated the higher reliability of a certain evaluation metric in the designed evaluation system, while a wider interval indicated lower reliability. In Figure 7, “eff” represents the ALE value of a specific feature, and “size” indicates the distribution of corresponding data for that feature. From Figure 6, we can observe that EC, MPDT, and TDT have negative impacts on the field robot detection performance, while TDA has a positive impact. Additionally, all four evaluation metrics have very narrow confidence intervals, indicating the high reliability of the weights assigned to these metrics.

Overall, the obtained results were consistent with the interpretability results of PDP, further proving the reliability of the proposed evaluation system and its corresponding weight optimization method.

(3): LIME

We used the algorithm in Table 6 to validate the reliability of the optimized evaluation system.

We randomly selected a sample from Table 2 with EC, MPDT, TDA, and TDT values of 1.71, 21.35, 0.83, and 0.14, respectively. The LIME interpretability verification results are displayed in Figure 8. In this figure, red indicates a negative impact on the robot’s planning and decision-making capabilities, while green represents a positive impact. The length of each bar reflects the degree of influence on planning and decision-making output, with longer bars indicating a greater influence. Specifically, MPDT, EC, and TDT negatively impact the planning and decision-making capabilities’ output, whereas TDA had a positive effect. Among these, EC exerted the most significant influence, followed by MPDT, with TDT having the least impact. In conclusion, the interpretability results aligned well with the optimized weighting of the evaluation indicators designed in the proposed evaluation system.

(4): SHAP

We used the weight optimization method proposed by the algorithm in Table 7 for reliability verification.

We randomly selected a sample corresponding to Table 2, with TDA, MPDT, EC, and TDT values of 1.66, 20.45, 0.86, and 0.16, respectively. The interpretability verification results obtained via LIME are shown in Figure 9, with the SHAP values in the figure output. A red SHARP means that the indicator has a positive impact on the perceived output of the environment, while a blue one means that it has a negative impact on the perceived output of the environment; and the length of the SHARP value represents the degree of impact on the planning and decision-making output of the environment. The longer the bar, the greater the degree of influence. The experiment showed that TDA has a positive impact on the planning and decision-making capabilities’ output, while the other three indicators had negative impacts. Additionally, EC had the largest impact, and TDT had the smallest impact. The obtained results are consistent with the aforementioned interpretability analysis results and expectations, further demonstrating the reliability of the optimized evaluation system designed.

5. Conclusions

To address the issue of inconsistent evaluation results, low reliability, and lack of transparency when using different evaluation methods on the same subject, this paper proposed a method for optimizing evaluation index weights and verifying reliability. Unlike traditional subjective methods that rely on expert experience to assign weights to evaluation indices, our method took into account data distribution characteristics and used the particle swarm optimization method to assign weights to evaluation indices. To ensure the effectiveness and reliability of the proposed method, we evaluated an intelligent field exploration robot using the designed evaluation system. Furthermore, to ensure the reliability of the obtained evaluation results, we employed model interpretability methods to analyze the influence of evaluation indices on the evaluation results, thereby verifying the effectiveness and reliability of the proposed method.

Although the proposed method enhanced the reliability and transparency in the evaluation compared to traditional approaches, it encountered challenges with computational complexity, particularly with large-scale datasets. Future research directions may include optimizing the computational efficiency of the PSO algorithm, integrating advanced computational techniques, or investigating alternative optimization algorithms that offer a better trade-off between speed and accuracy. Expanding the validation scope to various UAS applications is also essential to confirm the method’s versatility and robustness across different operational scenarios.

Author Contributions

Conceptualization, W.W., L.J. and L.F.; formal analysis, H.Z. and H.W.; funding acquisition, L.F.; investigation, H.W.; methodology, H.Y., H.Z. and W.W.; project administration, H.Y. and D.W.; resources, H.Z., L.J., L.F. and D.W.; software, H.W.; supervision, L.J. and L.F.; visualization, L.J. and D.W.; writing—original draft, H.Y., H.W., W.W., L.J. and L.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, grant number 62103184.

Data Availability Statement

The data presented in this study are currently not publicly available. Efforts are underway to make the data publicly accessible in the near future. Until then, the data are stored securely and can be made available upon reasonable request to the corresponding author, subject to privacy and ethical restrictions.

Conflicts of Interest

The authors Hang Yi, Wenming, Wang, Hao Wang and Haisong Zhang were employed by Beijing Aerospace Wanyuan Science and Technology Company Ltd. The remaining authors (Dong Wang and Lihang Feng) declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Han, Y.; Fang, D.; Li, Y.; Zhang, H. Efficiency Evaluation of Intelligent Swarm Based on AHP Entropy Weight Method. In Proceedings of the Journal of Physics: Conference Series, Hulun Buir, China, 25–27 September 2020. [Google Scholar]
Alharasees, O.; Abdalla, M.S.; Kale, U. Evaluating AI-UAV Systems: A Combined Approach with Operator Group Comparison. In Proceedings of the 2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), Istanbul, Turkey, 8–10 June 2023. [Google Scholar]
Yunbin, Y.A.N.; Weiwei, P.E.I.; Wanku, S.U.N.; Jianzhou, Y.E. Research on Maintenance Quality Evaluation Method for Unmanned Aerial Vehicle. In Proceedings of the 2019 IEEE 3rd Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Chongqing, China, 11–13 October 2019. [Google Scholar]
Xiaohong, W.; Zhang, Y.; Lizhi, W.; Dawei, L.U.; Guoqi, Z. Robustness Evaluation Method for Unmanned Aerial Vehicle Swarms Based on Complex Network Theory. Chin. J. Aeronaut. 2020, 33, 352–364. [Google Scholar]
Zhu, Z.; Wang, J.; Zhu, Y.; Chen, Q.; Liang, X. Systematic Evaluation and Optimization of Unmanned Aerial Vehicle Tilt Photogrammetry Based on Analytic Hierarchy Process. Appl. Sci. 2022, 12, 7665. [Google Scholar] [CrossRef]
Chen, G.; Zhang, W. Comprehensive Evaluation Method for Performance of Unmanned Robot Applied to Automotive Test Using Fuzzy Logic and Evidence Theory and FNN. Comput. Ind. 2018, 98, 48–55. [Google Scholar] [CrossRef]
Sun, Y.; Yang, H.; Meng, F. Research on an Intelligent Behavior Evaluation System for Unmanned Ground Vehicles. Energies 2018, 11, 1764. [Google Scholar] [CrossRef]
Dong, S.; Yu, F.; Wang, K. Safety Evaluation of Rail Transit Vehicle System Based on Improved AHP-GA. PLoS ONE 2022, 17, e0273418. [Google Scholar] [CrossRef] [PubMed]
Fei, C.-W.; Li, H.; Liu, H.-T.; Lu, C.; An, L.-Q.; Han, L.; Zhao, Y.-J. Enhanced Network Learning Model with Intelligent Operator for the Motion Reliability Evaluation of Flexible Mechanism. Aerosp. Sci. Technol. 2020, 107, 106342. [Google Scholar] [CrossRef]
Sheh, R. Evaluating Machine Learning Performance for Safe, Intelligent Robots. In Proceedings of the 2021 IEEE International Conference on Intelligence and Safety for Robotics (ISR), Virtual, Nagoya, Japan, 4–6 March 2021. [Google Scholar]
Leitzke, P.M.; Wehrmeister, M.A. Real-Time Performance Evaluation for Robotics. J. Intell. Robot. Syst. 2021, 101, 37. [Google Scholar]
Jia, L.; Chen, S.; Feng, L. An optimization evaluation approach to enhance the reliability of intelligent system. In Proceedings of the 2023 10th International Forum on Electrical Engineering and Automation, IFEEA 2023, Nanjing, China, 3–5 November 2023; pp. 1208–1211. [Google Scholar]
Linstone, H.A. The Delphi Technique. In Environmental Impact Assessment, Technology Assessment, and Risk Analysis; Covello, V.T., Mumpower, J.L., Stallen, P.J.M., Uppuluri, V.R.R., Eds.; Springer: Berlin/Heidelberg, Germany, 1985; pp. 621–649. [Google Scholar]
Nguyen, G. The Analytic Hierarchy Process: A Mathematical Model for Decision Making Problems. Bachelor’s Thesis, The College of Wooster, Wooster, OH, USA, 2014. [Google Scholar]
Roger, T.A.J.L. Modeling Social Influence through Network Autocorrelation: Constructing the Weight Matrix. Soc. Netw. 2002, 24, 21–47. [Google Scholar]
Groth, D.; Hartmann, S.; Klie, S.; Selbig, J. Principal Components Analysis. In Computational Toxicology; Reisfeld, B., Mayeno, A.N., Eds.; Methods in Molecular Biology; Humana Press: Totowa, NJ, USA, 2013; Volume 930, pp. 527–547. [Google Scholar]
He, D.; Xu, J.; Chen, X. Information-Theoretic-Entropy Based Weight Aggregation Method in Multiple-Attribute Group Decision-Making. Entropy 2016, 18, 171. [Google Scholar] [CrossRef]
Diakoulaki, D.; Mavrotas, G.; Papayannakis, L. Determining Objective Weights in Multiple Criteria Problems: The Critic Method. Comput. Oper. Res. 1995, 22, 763–770. [Google Scholar] [CrossRef]
Lawley, D.N.; Maxwell, A.E. Factor Analysis as a Statistical Method. J. R. Stat. Society. Ser. D Stat. 1962, 12, 209–229. [Google Scholar] [CrossRef]
Wu, R.M.; Zhang, Z.; Yan, W.; Fan, J.; Gou, J.; Liu, B.; Gide, E.; Soar, J.; Shen, B.; Fazal-e-Hasan, S. A Comparative Analysis of the Principal Component Analysis and Entropy Weight Methods to Establish the Indexing Measurement. PLoS ONE 2022, 17, e0262261. [Google Scholar] [CrossRef] [PubMed]
Ghazanfari, M.; Rouhani, S.; Jafari, M. A Fuzzy TOPSIS Model to Evaluate the Business Intelligence Competencies of Port Community Systems. Group 2014, 12, 14. [Google Scholar] [CrossRef]
Zhao, H.; Wang, Y.; Liu, X. The Evaluation of Smart City Construction Readiness in China Using CRITIC-G1 Method and the Bonferroni Operator. IEEE Access 2021, 9, 70024–70038. [Google Scholar] [CrossRef]
Moosbauer, J.; Herbinger, J.; Casalicchio, G.; Lindauer, M.; Bischl, B. Explaining Hyperparameter Optimization via Partial Dependence Plots. Adv. Neural Inf. Process. Syst. 2021, 34, 2280–2291. [Google Scholar]
Okoli, C. Statistical Inference Using Machine Learning and Classical Techniques Based on Accumulated Local Effects (ALE). arXiv 2024, arXiv:2310.09877. [Google Scholar]
Zafar, M.R.; Khan, N. Deterministic Local Interpretable Model-Agnostic Explanations for Stable Explainability. Mach. Learn. Knowl. Extr. 2021, 3, 525–541. [Google Scholar] [CrossRef]
Goldstein, A.; Kapelner, A.; Bleich, J.; Pitkin, E. Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation. J. Comput. Graph. Stat. 2015, 24, 44–65. [Google Scholar] [CrossRef]
Mangalathu, S.; Hwang, S.-H.; Jeon, J.-S. Failure Mode and Effects Analysis of RC Members Based on Machine-Learning-Based SHapley Additive exPlanations (SHAP) Approach. Eng. Struct. 2020, 219, 110927. [Google Scholar] [CrossRef]

Figure 1. Flowchart of weight optimization method based on data feature consistency.

Figure 2. Physical image of an unmanned field exploration robot.

Figure 3. Evaluation framework for planning and decision-making capabilities of intelligent field detection robots.

Figure 4. Perform weight iterative optimization solution process.

Figure 5. The overall mean and variance distribution of each score before and after the optimization of the evaluation system.

Figure 6. PDP interpretability results.

Figure 7. ALE interpretability results.

Figure 8. LIME interpretability results.

Figure 9. SHAP interpretability results.

Table 1. Comparison of important algorithms in terms of average computational time, accuracy, and adaptability over the past three years.

Algorithm	Average Computational Time	Accuracy	Adaptability
AHP-FCE [1]	120 s	Subjective judgment, low accuracy	Poor on large datasets
AHP-GA [8]	250 s	Local optimum risk	Sensitive to initial parameters
PSO-WOA [12]	180 s	Improved convergence, noise sensitive	Good across scenarios
CRITIC [18]	90 s	Uneven data reduces accuracy	Limited in dynamic tasks

Table 2. Algorithm complexity comparison.

Methods	Time Complexity	Space Complexity
Proposed method	$O (T \times s \times m \times n)$	$O (s \times n + m \times n)$
AHP-FCE method	$O (n^{3} + m \times n)$	$O (n^{2} + m \times n)$
Two-level analytic hierarchy model	$O (n^{3})$	$O (n^{2})$
OODA loop + AHP method	$O (m + n^{3})$	$O (m + n^{2})$
AHP-GA method	$O (n^{3} + p \times T \times w 1)$	$O (n^{2} + p \times n)$
EAHP method	$O (n^{3})$	$O (n^{2})$
Fuzzy logic + evidence theory + fuzzy neural networks	$O (m \times n + n^{2} + T \times n)$	$O (n^{2})$
GRNN + MPGA	$O (m \times n + p \times T \times w 2)$	$O (m \times n + p \times n)$
Robotstone benchmark	$O (n)$	$O (n)$
ML-based evaluation system	$O (m \times n \times T)$	$O (n^{2})$

Table 3. Data for scenario planning decision-making capability.

Collection Serial Number	Motion Planning Decision Time	Environment Complexity	Task Decision Time	Task Decision Accuracy
1	58.6 s	84%	146 s	85%
2	79.4 s	80%	193 s	88%
…	…	…	…	…
1200	64.7 s	89%	179 s	91%

Table 4. PDP algorithm.

Input
Output

X

PDP Curve

P D P (X_{j})

Step

S1: Initialization
Set

X_{j}

as the feature to analyze.
S2: Select feature values
Determine a range of values

{x_{j 1}, x_{j 2}, \dots, x_{j N}}

for

X_{j}

based on the data distribution or specific intervals of interest.
S3: Iterate through feature values
For each value

x_{j i}

in the determined range of

X_{j}

, perform the following:
Replace

X_{j}

in the dataset

X

with

x_{j}

, keeping other features unchanged.
Compute the expected value of the model’s prediction:

\hat{f} (X) = E [\hat{f} (X | X_{j} = x_{j})

S4: Record results
Store each

x_{j}

and its corresponding

E [\hat{f} (X | X_{j} = x_{j})

.
S5: Plot PDP
Use the recorded data

{x_{j}, E [\hat{f} (X | X_{j} = x_{j})]}

to plot the PDP curve.

Table 5. ALE algorithm.

Input
Output

X

ALE Plot

Step

S1: Initialization
Set

X_{j}

as the feature to analyze.
S2: Select feature values
Determine a range of values

{x_{j 1}, x_{j 2}, \dots, x_{j N}}

for

X_{j}

based on the data distribution or specific intervals of interest.
S3: Iterate through feature values
For each value

x_{j i}

in the determined range of

X_{j}

, perform the following:
Replace

X_{j}

in the dataset

X

with

x_{j}

, keeping other features unchanged.
Compute the expected value of the model’s prediction:

\hat{f} (X) = E [\hat{f} (X | X_{j} = x_{j})

S4: Record results
Store each

x_{j}

and its corresponding

E [\hat{f} (X | X_{j} = x_{j})

.
S5: Plot PDP
Use the recorded data

{x_{j}, E [\hat{f} (X | X_{j} = x_{j})]}

to plot the PDP curve.

Table 6. LIME algorithm.

Input
Output

x_{0}

Local Explanation Plot

Step

S1: Select instance
Choose a specific instance

x_{0}

from the dataset for which you want to explain the model prediction.
S2: Generate perturbations
Perturb

x_{0}

to create a dataset of similar instances (perturbed samples) by sampling around

x_{0}

and introducing small changes in the feature values.
S3: Model prediction
Obtain predictions from the black-box model for the perturbed samples.
S4: Fit interpretable model
Fit an interpretable model (e.g., linear regression, decision tree) locally around

x_{0}

to explain the predictions made by the black-box model on the perturbed samples.
S5: Interpretation
Interpret the coefficients or feature importance of the interpretable model to understand which features and their values contribute most to the prediction for

x_{0}

.

Table 7. SHAP algorithm.

Input
Output

x_{0}

SHAP Values

Step

S1: Initialization
Compute the model prediction

f (x_{0})

.
S2: Calculate baseline
Compute the average prediction over the background dataset:

E [f (X)] = \frac{1}{N} \sum_{i = 1}^{N} f (x_{i})

, where

N

is the number of instances in

X

.
S3: Iterate through features
For each feature

X_{j},

create subsets of instances by conditioning on

X_{j},

being present or absent in

x_{0}

:
①

X_{j = 1}

: Subset where

X_{j} = 1

in

x_{0}

.
②

X_{j = 0}

: Subset where

X_{j} = 0

in

x_{0}

.
S4: Calculate contributions (SHAP values)
For each feature

X_{j}

:
Compute the difference in predictions due to the presence of

X_{j}

:
①

∆ f_{j} (x) = f (x_{j = 1}) - f (x_{j = 0})

.
②Weight the contribution using Shapley values:

ϕ_{j} (x_{0}) = \sum_{S \subseteq J ∖ j} \frac{| S |! (| J | - | S | - 1)!}{| J |!} [∆ f_{j} (S \cup {j}) - ∆ f_{j} (S)]

.
S5: Aggregate SHAP values
Sum up the contributions across all features to obtain the SHAP values for

x_{0}

ϕ_{j} (x_{0}) = \frac{1}{M} \sum_{m = 1}^{M} ϕ_{j}^{m} (x_{0})

, where

M

is the number of features.
S6: Output
Return SHAP values

ϕ_{j} (x_{0})

for each feature

X_{j}

, indicating their impact on the prediction for

x_{0}

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yi, H.; Zhang, H.; Wang, H.; Wang, W.; Jia, L.; Feng, L.; Wang, D. An Explainable Data-Driven Optimization Method for Unmanned Autonomous System Performance Assessment. Electronics 2024, 13, 4469. https://doi.org/10.3390/electronics13224469

AMA Style

Yi H, Zhang H, Wang H, Wang W, Jia L, Feng L, Wang D. An Explainable Data-Driven Optimization Method for Unmanned Autonomous System Performance Assessment. Electronics. 2024; 13(22):4469. https://doi.org/10.3390/electronics13224469

Chicago/Turabian Style

Yi, Hang, Haisong Zhang, Hao Wang, Wenming Wang, Lixin Jia, Lihang Feng, and Dong Wang. 2024. "An Explainable Data-Driven Optimization Method for Unmanned Autonomous System Performance Assessment" Electronics 13, no. 22: 4469. https://doi.org/10.3390/electronics13224469

APA Style

Yi, H., Zhang, H., Wang, H., Wang, W., Jia, L., Feng, L., & Wang, D. (2024). An Explainable Data-Driven Optimization Method for Unmanned Autonomous System Performance Assessment. Electronics, 13(22), 4469. https://doi.org/10.3390/electronics13224469

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Explainable Data-Driven Optimization Method for Unmanned Autonomous System Performance Assessment

Abstract

1. Introduction

2. Problem Statement

2.1. Existing Evaluation Methods

2.2. Comparison and Analysis of Important Algorithms Within the Last Three Years

2.3. Existing Model Interpretation Method

2.4. Limitations of the Existing Evaluation Method

3. Proposed Method

3.1. Stability of Data Distribution Characteristics

3.2. Data Feature Stability-Based Optimization Algorithm

3.3. Algorithmic Complexity Analysis

4. Experiments

4.1. Intelligent System Architecture

4.2. Performance of the Optimizing Evaluation System

4.3. Evaluation System Interpretability Verification

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI