1. Introduction
In recent years, deep learning has developed rapidly in image processing [
1,
2], natural language processing [
3,
4], speech recognition [
5,
6] and other related fields and has shown to surpass human capacity in all walks of life, which makes people increasingly reliant on decisions made by AI. To meet the increasing requirements of people in healthcare [
7,
8] and the precision industry, the complexity of the model has also increased significantly. But the more complex the model is, the more difficult it is for people to understand its structure and the more difficult it is to explain why it makes this decision, and this creates a problem: people begin to distrust the decisions made by the model. To solve the above problems, XAI has become a hot field. And through methods in this field, people can think more about the reasons why models have this effect. Such thinking is conducive to preserving important features in images, and people can use these important features to train other models, which can reduce computational cost. This thinking can help better understand the model and improve the service quality of the model.
In the field of image processing, XAI methods can be divided into ante hoc interpretability and post hoc interpretability. Ante hoc usually involves data preprocessing and model selection; the purpose of the former is to show the distribution of features, and the purpose of the latter is to explain the decision-making process by constructing a structurally interpretable model, such as a linear model [
9] and a decision tree [
10,
11]. Post hoc interpretability is to make visible an analysis of the decision-making process of the model or to analyze the importance of features [
12,
13,
14]. Although the former is relatively simple and the cost is relatively small, the structure of many deep learning models is very complex and generally unknown to users, so post hoc interpretability has more advantages in the field of deep learning.
Post hoc interpretability also contains two categories: global and local. Global interpretation focuses on the operating principle of the model when it works, including activation maximization [
15,
16] and knowledge distillation [
17,
18], while the focus of local interpretation is on the impact of the sample itself when the model makes a decision, including at the pixel level [
19,
20], concept level [
21,
22] and picture level [
23,
24]. Although the former can effectively improve the transparency of the model, the latter can often be more easily understood by users and is easier to mine because of the direct relationship between features.
In the field of XAI, especially for white-box models, there are already some methods for mining the relationship between features. These methods can partly solve the problem of only considering the importance of features and ignoring the relationship between features [
25], and at the same time, these methods’ results are not intuitive, and their methods cannot adapt to the research of XAI methods around deep learning models to a certain extent. And for black-box models, relationships between features are effectively mined in the XAI methods for tabular data or text data, but the manner in which to obtain and evaluate relationships cannot be well applied to image data [
26,
27]. Therefore, the idea of this paper is to combine mining relationships between features with LIME (local interpretable model-agnostic explanations), which is an XAI method based on black-box that obtains explanations by locally approximating simple models, while using LIME [
12] to obtain important superpixel blocks as feature blocks, obtain relationships between feature blocks, and optimize the visualization effect of the results, that is, visualize the importance of features and the relationship between features at the same time. The contributions of this paper can be summarized as follows:
- (1)
We analyze the shortcomings of existing XAI methods for obtaining features’ relationships. And then, to solve these problems, we propose an interpretation method based on masking to consider relationships between features in the process of interpreting a mode, which makes the interpretation more complete and improves the credibility of the interpretation.
- (2)
We perform a lot of experiments in this paper, and the results prove the correctness of relationships between features obtained in this paper and show that our method achieves higher accuracy, fidelity, and consistency compared to LIME.
2. Related Work
The proposed method’s characteristics are mainly reflected in two aspects: (1) methods based on black boxes or white boxes; and (2) methods to obtain relationships between features. So, we analyze the existing methods from these two aspects:
In terms of explaining black-box models, J. H. Friedman proposed partial dependence plots (PDPs) in 2001 [
28]. This method can show the marginal effect of one or two features on the prediction results of models, that is, the probability of a specific category under different feature values of a feature, thus showing that the relationship between the target and the feature is linear, monotonic, or more complex. In 2016, Marco et al. proposed the LIME method. This method focuses on training a local proxy model to interpret a single prediction. It selects an interesting instance and uses it as the input of the original model, then perturbs this instance to generate a new data set, which is composed of perturbed samples and corresponding predictions of the black-box model. Finally, on this new data set, LIME trains an interpretable model that is weighted by the distance between disturbed instances and interesting instances. At the same time, Marco et al. also proposed the S-LIME method, which uses the hypothesis testing framework based on the central limit theorem to determine the number of disturbance points required to ensure the stability of the result interpretation rather than using just random disturbance. After that, on the basis of LIME, in 2018, Marco and others proposed the model-agnostic method Anchors [
29], which is based on finding a minimum subset of features. As long as any instance has this feature subset, the prediction result of the black box is the same, independent of other features, and this subset can be used as an explanation. It can also be regarded as the anchor of the black-box model to accurately explain the relatively complex local black-box prediction model. The Shapley value can be understood as a method of allocating expenses according to the contribution of players to total expenses. In XAI, features are players, and the model prediction is the total expenses. Because the difference between the prediction and the average prediction can be perfectly distributed between the characteristics through this method, it has become very popular as a way to explain the black-box model prediction. In 2016, Lundberg et al. proposed the SHAP method, which replaced the method of weighting samples according to their proximity to the original instance in LIME with the method of weighting the samples according to the weights obtained by the alliance in the Shapley value estimation. In 2020, Messalas et al. proposed a MASHAP method [
30]. It first builds a global proxy model on the interested instance, then transfers the proxy model as an original model to the Tree SHAP method, and then generates an explanation. Because this method simplifies the original model in the SHAP method, it also achieves faster results than SHAP and LIME. However, in the above methods, the relationship between features is not well considered for image data, which also reduces the credibility of the interpretable algorithm. Therefore, in this paper, we consider introducing the method of calculating the relationship between features at the image data level to mine the relationship between features and improve the interpretation reliability.
There are also many studies in XAI on obtaining relationships between features. In terms of white-box models, Wang et al. [
31] proposed spatial activation concept vector, which considers the spatial location relationship. Ge et al. [
32] demonstrated the relationships between features by extracting important visual concepts related to a specific category and representing the image as a structured visual concept map. They proposed a visual reasoning explanation framework (VRX) that can obtain structural concept graphs similar to that shown
Figure 1. And the colors of components’ scores from high to low are: blue, green, and pink.
However, the available models of these methods are limited to some extent, and they cannot directly show how relationships between features impact the models’ predictions, so the degree of visualization is also limited. In terms of black-box models, there are also many characteristic relationship calculation methods for the classic XAI methods based on black-box models. For example, for LIME, Zoumpolia et al. proposed the GLIME method [
25], which relies on the combination of LIME and the graphical least absolute shrinkage and selection operator to generate the undirected Gaussian graph model. In addition, regularization reduces the small partial correlation coefficient to zero to provide a more sparse and interpretable graphical interpretation. For the Shapley value, KJERSTI et al. proposed a method [
26] that extends the kernel SHAP method to deal with dependency features. These two kinds of methods can effectively determine the relationship between features through experiments, and KJERSTI proves the correctness of the found relationship through experimental comparison. Although it effectively solved the problem of the universality of the use model and the problem of non-intuitive results, these methods can only be limited to text and tabular data. It has two problems: first, tabular and text data can easily change in numerical value to affect the model prediction, while for the pixel or pixel block in the picture as a feature, it cannot be simply changed to observe the impact on the model prediction. Generally, for the feature of a picture, there are only two possibilities: existence and non-existence. The second is that the features in the tabular and text data can be artificially set so that the features have relationships, such as Gaussian, but the picture is difficult to make, which makes it a big problem to design an indicator to verify the correctness of relationships.
Therefore, in this paper, we draw on some ideas of finding feature association from XAI methods based on black-box models for text and tabular data, combine the idea of masking features with the LIME method, and propose an interpretable method based on black-box that can obtain the relationship between features through the combination masking of feature blocks for image data, which not only makes the relationship between features more intuitive but also improves the universality of the method.
4. Results
This paper will select common evaluation indicators in the evaluation of XAI algorithms to evaluate this algorithm, namely, sensitivity [
33], fidelity/accuracy [
34], and stability/consistency [
35], and mainly compare them with classic XAI methods based on black-box models and analyze them based on classic XAI methods based on black-box models. This paper selects InceptionV3 and ResNet50 as deep neural network models for research and conducts experiments on ImageNet data sets.
4.1. Analysis of Results
As shown in
Figure 5, setting the number of features required for interpretation K = 3. Compared to LIME, the algorithm in this paper can visualize the relationships between important features. The blue line indicates that the relationships between feature blocks play a direct and positive role in the model prediction results, while the red line indicates that the relationships between feature blocks have a direct and negative impact on the model prediction results; the stronger the effect, the wider the line. Compared to the result in
Figure 1 which can only display features that are related, our method has a better visual effect.
In
Figure 6, we construct an undirected graph by using superpixels as vertices and the strength of the relationships between superpixels as edge weights, making the relationships between superpixels more intuitive, especially when the superpixels are close.
Combining the features’ relationships matrix in
Table 1, it can be found that the method in this paper can obtain a relationship size between feature blocks in the image compared to the feature relationships found in the XAI method based on white boxes, and this relationship shows the direct impact of the relationship between two features on the model prediction results rather than simply showing the positional relationship or the degree of correlation between the features. This makes the interpretation more intuitive and more consistent with the general idea of the black-box interpretable method. For example, the relationship between feature 24 and feature 35 increases the probability of the model predicting the result class by 0.32375.
4.2. Fidelity/Accuracy Analysis
Currently, the methods used to prove the fidelity/accuracy of XAI algorithms are mainly implemented based on the idea of perturbation. Therefore, the evaluation of accuracy in this article is based on the idea derived from the SHAP method: subtracting the main effect of the feature from the total effect to obtain the pure interaction effect to obtain the feature association size. This uses the following formula:
where
is a subset of pixel blocks;
i and
j are feature blocks;
is the predicted result of the model; and
is the features’ relationship based on the above idea.
The main purpose of this article is to obtain the pairwise low-order relationship between superpixel blocks. Therefore, to minimize the impact of the high-order relationship between superpixel blocks on the effect, the S in the experimental section will contain fewer non-important superpixel blocks with a small spatial correlation (relatively distant). And we use the results obtained from Formula (3) as a benchmark to calculate the similarity between our method and it:
where
is the correlation between features obtained by this paper’s method,
is the size of the relationship between features obtained during validation, and the results are shown in
Table 2.
From the experiments, it can be seen that the relationships between features obtained by the method in this paper and the relationships between features obtained by Formula (3) are highly similar, so it means our method can obtain a result similar to that of the SHAP method. Instead of being essentially the same, it may be because using black pixel blocks during occlusion results in artifacts in the image, and there may be a higher-order relationship between pixel blocks, resulting in changes in the output of the model. However, after eliminating this effect as much as possible, experiments can demonstrate the correctness of the relationship obtained in this article.
4.3. Stability/Consistency Analysis
This section focuses mainly on verifying whether the interpretation results of the algorithm will be the same and whether the algorithm can achieve better stability/consistency for the same input sample with constant parameters. The method used in this paper is calculating the similarity of the selected feature set. First, we compare with LIME and define as an evaluation index, where T1 is the one we believe to be the most accurate, and T is the number of experiments. The results are as follows.
In
Table 3, it can be seen that when the important feature selection method in this article is used to replace the important feature selection method in LIME, the proportion of interpretation results consistent with the standard increases.
Also, we chose quantitative testing with concept activation vectors (TCAV) [
21] for the second comparative experiment. This method clusters superpixels and then uses concept vector scores to select important concepts. And for the same, we choose the one with the highest number of occurrences as the correct result and calculate the result through
. The results are as follows.
In
Table 4, it can be seen that our method obtains more stable results than LIME and TCAV.
Finally, we choose randomized input sampling for explanation (RISE) [
36] for comparative experiments. This method generates multiple masks through Monte Carlo sampling and then weights the masks to obtain the results. As the result of this method is heat maps, we convert the results obtained by our method into heat maps through the superpixels’ weights. And we choose the structural similarity index (SSIM) as our evaluation index. SSIM calculates the similarity score between two images by comparing their similarities in brightness, contrast, and structure. And the results are as follows.
In
Table 5, it can be seen that, in terms of selecting features, our method can achieve higher stability/consistency. And by evaluating the similarity of heat maps, our methods can obtain higher structure similarity index measure (SSIM) scores than RISE and obtain similar results as LIME. From the above experiments, it can be seen that our method obtains better stability/consistency, which proves that when considering the correlation between features, the stability/consistency of the interpretable algorithm can be effectively improved.
4.4. Sensitivity Analysis
This section mainly focuses on whether the algorithm is sensitive to parameters when replacing the important feature selection method in LIME, that is, whether the interpretation results of the interpretable algorithm will significantly change when the parameters change. The key parameter studied in this section is the number of neighborhood data generated, N. For comparison purposes, this paper still selects LIME as the standard to change the number N of neighborhood data, which are N = 100, N = 500, N = 1000, N = 3000, and N = 5000, including 500 test images. Each image is repeated to obtain 100 interpretation results. Similarly, the sequence that occurs most when N = 5000 is considered the standard interpretation. At the same time, compare the interpretation results of algorithms under different N conditions. In order to control other conditions, the number of selected features K is set to 3, and the sensitivity is reflected by the difference in the proportion of correct explanations that can be obtained under different N conditions.
In
Table 6, it can be seen that when the important feature selection method in LIME is replaced, the algorithm in this article can achieve a sensitivity that is basically similar to that of LIME. And when N is large enough, the proportion of correct results changes less as N changes. But generally, the algorithm in this article is more insensitive than LIME.