Next Article in Journal
Detecting Parallel Covert Data Transmission Channels in Video Conferencing Using Machine Learning
Previous Article in Journal
Robustness Assessment of Cyber–Physical System with Different Interdependent Mechanisms
 
 
Article
Peer-Review Record

An Empirical Survey on Explainable AI Technologies: Recent Trends, Use-Cases, and Categories from Technical and Application Perspectives

Electronics 2023, 12(5), 1092; https://doi.org/10.3390/electronics12051092
by Mohammad Nagahisarchoghaei 1,*, Nasheen Nur 2,*, Logan Cummins 1, Nashtarin Nur 3, Mirhossein Mousavi Karimi 1, Shreya Nandanwar 2, Siddhartha Bhattacharyya 2 and Shahram Rahimi 1,*
Reviewer 1:
Reviewer 2:
Electronics 2023, 12(5), 1092; https://doi.org/10.3390/electronics12051092
Submission received: 22 December 2022 / Revised: 20 January 2023 / Accepted: 23 January 2023 / Published: 22 February 2023
(This article belongs to the Special Issue Explainable Artificial Intelligence: Efficiency and Sustainability)

Round 1

Reviewer 1 Report

Strong points:

- concerning precious and modern topics of XAI, and also explaining differences between concepts related to XAI, e.g. interpretability vs explainability

- all describe contributions, from 93 to 108 lines, are very valuable and useful

 

Weak points

- From time to time, the text is hard to read; however, I believe one more reading by the Authors or proofreading can help.

- I do not understand one contribution: "We ran our analytics on open-source datasets to evaluate the usability of various XAI 96 techniques and tools."

- It isn't easy to differentiate throughout the paper which dataset and results are reproduced by the Authors and which are described based on other research. I believe it should be clarified.

 

Minor tips:

- Please change the background of the figures to white, not so dark

- I do not understand line 27 on 1st page "..., including from governments". Maybe this part can be removed.

- line 44 on the 2nd page: "XAI should be able to explain its capabilities and understanding" - I do not understand the meaning of "understanding" in this context. Is it how the model understands the data?

- In line 11, there should be a space before the citation: "... According to De Bellis[15],...", also, in 179, 181, 209, 243, 253 lines and many others.

- "Trust, Accountability,and Fairness" section title needs a space after the 2nd comma

- Fig. 8 and Fig. 15 can be rearranged to be more visible. This categorization is valuable; however, it is tough to read and explore the notions in their current form.

Author Response

Comments and Suggestions for Authors

Strong points:

- concerning precious and modern topics of XAI, and also explaining differences between concepts related to XAI, e.g. interpretability vs explainability

- all describe contributions, from 93 to 108 lines, are very valuable and useful

Weak points

- From time to time, the text is hard to read; however, I believe one more reading by the Authors or proofreading can help.

 

-> Our response:  Thank you for your feedback. Our team reviewed the write-up thoroughly and made every effort to fix the complex sentences and the flow of the text. Additionally, we have removed redundancy and repetitions from the document, resulting in a shorter document. 

- I do not understand one contribution: "We ran our analytics on open-source datasets to evaluate the usability of various XAI 96 techniques and tools." 

 

-> Our response: Thank you for your feedback. We changed the line “We ran our analytics on open-source datasets to evaluate the usability of various XAI 96 techniques and tools.” to “ We provided various examples using open-source data sets to compare various XAI techniques and tools.” which makes more sense now.

- It isn't easy to differentiate throughout the paper which dataset and results are reproduced by the Authors and which are described based on other research. I believe it should be clarified.

 

-> Our response: Thank you very much for your feedback. The reproduced figures or results do not have citations in the captions. However, the figures from other literature are cited.

Minor tips:

- Please change the background of the figures to white, not so dark

 

-> Our response: Thank you very much for your feedback. We changed the background of the figures to white.

- I do not understand line 27 on 1st page "..., including from governments". Maybe this part can be removed.

- line 44 on the 2nd page: "XAI should be able to explain its capabilities and understanding" - I do not understand the meaning of "understanding" in this context. Is it how the model understands the data?

- In line 11, there should be a space before the citation: "... According to De Bellis[15],...", also, in 179, 181, 209, 243, 253 lines and many others.

- "Trust, Accountability,and Fairness" section title needs a space after the 2nd comma

-> Our response: Thank you for your feedback. After reviewing, we also found “including from governments” this phrase is unnecessary. We removed it.

We changed the line “XAI should be able to explain its capabilities and understanding” to “ XAI technology should explain its capabilities and features to improve its usability” which makes more sense now.

We addressed all the spacing issues.

 

- Fig. 8 and Fig. 15 can be rearranged to be more visible. This categorization is valuable; however, it is tough to read and explore the notions in their current form.

 

-> Our response:  Thank you very much for your feedback. We rearranged the figures for more readability. We also modified shapes and fonts in visio files to make them bigger. 

 

 

 

Reviewer 2 Report

 The submission deals with a timely topic and is interesting. More work and overview papers on XAI are necessary. However, since several already exist, they need to make very clear where their contribution lies.

 It might be the case that the authors try to cover too much ground. A very long article is also less useful for the reader.

 

Several serious issues remain to make the paper publishable. In my opinion, the issues lie in the structure and setup (which is connected to the contribution of the paper) and the criteria for inclusion and exclusion.

 

Maybe it might also be better to structure the entire article on XAI technology on domains (image, text, numerical data). As it is there is first a technology-based pass and then there comes one section on: 6. XAI Applications in NLP & Language Models. It is unclear why this is helpful because it leads to much repetition.

 

Also the contribution of section 7 is very weak. The introduction of this section is vague and gives on clear definition on topics which might appear here.

 

What is different from other review papers?

“Several categorizations in Molnar’s 307

Interpretable Machine Learning book can provide insight”

 

What is your contribution? From a literature selection strategy, we expect that a new categorization emerges. Else, explain what you contributed to Molnars categories and why?

The authors mention 48,060 articles that were identified. How did you come to the 300 papers that are actually cited? For a review article, we expect a strategy that leads from a search to an analysis and then to a structure of the paper.

 

been reviewed in the survey paper[71] -> what follows from this statement? Did you not consider them? How did you go beyond the other survey?

 

Self-Explainable and Post-hoc Explainable Modeling -> not clear for me why these were mixed in one sub-section

 

Page 41: Why is Reinforcement learning mixed with CAMs?

 

“While searching the 281  articles, words, including attention, were erased to prevent the dominance of the attention 282 mechanism in our work and help other explainability methods be seen.”

This decision is quite questionable. Also it is not followed up upon consequently.

 

“this survey has not included the pre- 349 modeling explainability because”

Another statement on inclusion and exclusion criteria. These are scattered throughout the paper. The reader needs to be informed about all criteria on one place and early in the paper.

 

 

 

It is important to note 546

that although the California Housing dataset is tabular, as far as the data-driven machine 547

learning point of view, a different type of data is convertible. For example, we can train the 548

model on textual data instead and apply Permutation Feature Importance (PFI) to illustrate 549

the importance ranking of features, or we can employ Partial Dependence (PD) Plot to 550

demonstrate the effect of a feature on the model’s prediction.

The authors seem to suggest that “tabular” data (they mean numerical data) is equivalent to textual data. This is not useful for a technology like XAI that is highly dependent on the domain.

 

 

Explain the drop in 2022. It seems misleading to include a year without full coverage into the graph

 

Others. Speech and information retrieval are also NLP related

 

For figure 4, other sources than title could have been considered.

 

Figure 6: this statistics is not useful, this can be better expressed in the text.

 

Figure 27: why useful?

 

Figure 28: does it really explain anything?

 

“demonstrates the most important vocabularies” most important words, does it really explain anything?

 

 

 

Figure 7 brings little value for a reader. It might be much more clear after some pruning.

 

Figure 8: what do the colors mean?

 

 

Global and local interpretability methods help users trust a model and 346  prediction.

è Statement seems to be true for all methods anyway

 

 

“are successful in theory,”  not sure, maybe in research? What is the successful AI theory?

 

 

“We ran our analytics on” very vague for a contribution

 

 

 

Figure 1 and others: do not use a black background

 

Reference [38], careful, not blue in text

 

Reference 8, better use Lipton, “The myth of …” ACM

 

 

Structure: “Examples provided in the Global and Local model-agnostic sections are all based on 545

the California Housing dataset (table 1) in the sklearn package.”

This sentence is part of 5.1 but seems also to be applicable for 5.2, so it needs to be logically put in 5.

 

Structure: the idea to run the same dataset through several systems is appealing. However, it needs more thought. What is the purpose? Just illustration or also a comparison? The reader gets confronted with several “explanations” for the same dataset. Towards the end of the subsection, this idea is not pursued anymore.

 

“human users can comprehend, appropriately trust, and effectively manage the new generation of artificially intelligent partners”  the statement is too strong, it is not clear whether and to which extent the current XAI technology is capable of this task yet.

 

 

 

Referencing style: reference in a sentence is not part of a sentence. Something like the following phrases are not correct. Check: https://libguides.murdoch.edu.au/ieee/text

introduced in [75]

is seen in [76]

[77], 381 has talked about

 

 

 

Overall, the paper can be understood, but there are too many issues.

Here is only a small selection of spots which require attention:

 

summaries have been developed -> summaries has been developed

can’t -> cannot

 

list of uses for XAI methods

articles belongs to NLP

related to other computer science

in the black-box model

 

many missing blanks between words and references, e.g. “alternatives[2426]” and around punctuation: “Accountability,and

 

 

On the other hand, We can

 

Here, they are using -> explain where

been reviewed in the survey paper[71]. -> a survey paper

 

 

linear, mono- 569tonic, or complicated.

 

Author Response

Comments and Suggestions for Authors

The submission deals with a timely topic and is interesting. More work and overview papers on XAI are necessary. However, since several already exist, they need to make very clear where their contribution lies.

It might be the case that the authors try to cover too much ground. A very long article is also less useful for the reader.

Several serious issues remain to make the paper publishable. In my opinion, the issues lie in the structure and setup (which is connected to the contribution of the paper) and the criteria for inclusion and exclusion.

Maybe it might also be better to structure the entire article on XAI technology on domains (image, text, numerical data).

 

-> Our response: Thank you so much for your comments. Our contribution has been in a broader approach to the classification of existing XAI research into different categories, identifying application of XAI to different domains based on use cases, as well as how XAi can be applied to the AI pipeline. We have also evaluated the usability of XAI approaches. Other papers have focused on one categorization or application. We are pointing out several ways to categorize XAI research and provide examples of their applications. Other survey papers either focused on one XAI approach or provided review on a particular application area or domain.  We are providing a richer repository for the XAI researchers.

We have addressed your comment and emphasized our contribution. To summarize, we did the categorization based on a standard AI pipeline.

We have removed lots of repetition that we identified and pointed out by the reviewers in the new version.

 As it is there is first a technology-based pass and then there comes one section on: 6. XAI Applications in NLP & Language Models. It is unclear why this is helpful because it leads to much repetition.Also the contribution of section 7 is very weak. The introduction of this section is vague and gives on clear definition on topics which might appear here.

-> Our response: Thank you for your feedback. Our team reviewed the write-up thoroughly and made every effort to fix the complex sentences and the flow of the text. We have also removed redundancy and repetitions from the document, resulting in a shorter one. As part of this effort, we removed the section on “6. XAI Applications in NLP & Language Models.” We moved the subsection on XAI use cases from section 02 to the end of the paper (before the conclusion). We also removed the previous section 7 (XAI in human-centered design) and merged a part with the “Applications of XAI Technologies” in the new section 06.

What is different from other review papers?

“Several categorizations in Molnar’s 307

Interpretable Machine Learning book can provide insight”What is your contribution? From a literature selection strategy, we expect that a new categorization emerges. Else, explain what you contributed to Molnars categories and why?

 

-> Our response: 

Our selection category has been influenced by the objectives, as mentioned in our contributions to identify papers that focused on XAI applications, usability, and  broadly classified categories. Also, in the search for literature, we selected articles that focused on discussing, where XAI can be applied in the AI pipeline. We updated the categorization with the new types of XAI categories or sub-categories.

 

The authors mention 48,060 articles that were identified. How did you come to the 300 papers that are actually cited? For a review article, we expect a strategy that leads from a search to an analysis and then to a structure of the paper.

been reviewed in the survey paper[71] -> what follows from this statement? Did you not consider them? How did you go beyond the other survey?

 

-> Our response: Thank you for your feedback. Unfortunately, we didn’t elaborate on the filtering procedure because of the length of the paper. We added the following explanation to the appendix about the strategy. After performing the queries on August 24th, 2022, and 48,060 conference and journal articles are listed in the Scopus databases, and 8,180 articles are related to Computer Vision, 6,204 articles belong to NLP. Then we filtered the result by excluding some unrelated journals and conferences, languages except English, unrelated research areas, and unrelated keywords with the following query:      

(LIMIT-TO ( SRCTYPE , "j" ) OR LIMIT-TO ( SRCTYPE , "p" ) ) AND ( LIMIT-TO ( PUBSTAGE , "final" ) ) AND ( LIMIT-TO ( DOCTYPE , "ar" ) OR LIMIT-TO ( DOCTYPE , "cp" ) ) AND ( LIMIT-TO ( LANGUAGE , "english" ) ) AND ( LIMIT-TO ( EXACTKEYWORD , "attention mechanisms" ) OR LIMIT-TO ( EXACTKEYWORD , "deep learning" ) OR LIMIT-TO ( EXACTKEYWORD , "machine learning" ) OR LIMIT-TO ( EXACTKEYWORD , "attention mechanism" ) OR LIMIT-TO ( EXACTKEYWORD , "interpretability" ) OR LIMIT-TO ( EXACTKEYWORD , "learning systems" ) OR LIMIT-TO ( EXACTKEYWORD , "forecasting" ) OR LIMIT-TO ( EXACTKEYWORD , "artificial intelligence" ) OR LIMIT-TO ( EXACTKEYWORD , "classification (of information)" ) OR LIMIT-TO ( EXACTKEYWORD , "convolutional neural networks" ) OR LIMIT-TO ( EXACTKEYWORD , "convolution" ) OR LIMIT-TO ( EXACTKEYWORD , "algorithms" ) OR LIMIT-TO ( EXACTKEYWORD , "neural networks" ) OR LIMIT-TO ( EXACTKEYWORD , "deep neural networks" ) OR LIMIT-TO ( EXACTKEYWORD , "algorithm" ) OR LIMIT-TO ( EXACTKEYWORD , "feature extraction" ) OR LIMIT-TO ( EXACTKEYWORD , "convolutional neural network" ) OR LIMIT-TO ( EXACTKEYWORD , "data mining" ) OR LIMIT-TO ( EXACTKEYWORD , "long short-term memory" ) OR LIMIT-TO ( EXACTKEYWORD , "decision making" ) OR LIMIT-TO ( EXACTKEYWORD , "decision trees" ) OR LIMIT-TO ( EXACTKEYWORD , "speech recognition" ) OR LIMIT-TO ( EXACTKEYWORD , "prediction" ) OR LIMIT-TO ( EXACTKEYWORD , "natural language processing systems" ) OR LIMIT-TO ( EXACTKEYWORD , "state of the art" ) OR LIMIT-TO ( EXACTKEYWORD , "computer vision" ) OR LIMIT-TO ( EXACTKEYWORD , "image enhancement" ) OR LIMIT-TO ( EXACTKEYWORD , "speech communication" ) OR LIMIT-TO ( EXACTKEYWORD , "learning algorithms" ) OR LIMIT-TO ( EXACTKEYWORD , "signal processing" ) OR LIMIT-TO ( EXACTKEYWORD , "recurrent neural networks" ) OR LIMIT-TO ( EXACTKEYWORD , "image segmentation" ) OR LIMIT-TO ( EXACTKEYWORD , "regression analysis" ) OR LIMIT-TO ( EXACTKEYWORD , "classification" ) OR LIMIT-TO ( EXACTKEYWORD , "image processing" ) OR LIMIT-TO ( EXACTKEYWORD , "computer simulation" ) OR LIMIT-TO ( EXACTKEYWORD , "embeddings" ) OR LIMIT-TO ( EXACTKEYWORD , "state-of-the-art methods" ) OR LIMIT-TO ( EXACTKEYWORD , "sensitivity and specificity" ) OR LIMIT-TO ( EXACTKEYWORD , "computational linguistics" ) OR LIMIT-TO ( EXACTKEYWORD , "pattern recognition" ) OR LIMIT-TO ( EXACTKEYWORD , "image analysis" ) OR LIMIT-TO ( EXACTKEYWORD , "intelligibility" ) OR LIMIT-TO ( EXACTKEYWORD , "artificial neural network" ) OR LIMIT-TO ( EXACTKEYWORD , "explainable ai" ) OR LIMIT-TO ( EXACTKEYWORD , "neural-networks" ) OR LIMIT-TO ( EXACTKEYWORD , "mathematical models" ) OR LIMIT-TO ( EXACTKEYWORD , "object detection" ) OR LIMIT-TO ( EXACTKEYWORD , "neural networks, computer" ) OR LIMIT-TO ( EXACTKEYWORD , "speech processing" ) OR LIMIT-TO ( EXACTKEYWORD , "image classification" ) OR LIMIT-TO ( EXACTKEYWORD , "support vector machines" ) OR LIMIT-TO ( EXACTKEYWORD , "feature selection" ) OR LIMIT-TO ( EXACTKEYWORD , "visualization" ) OR LIMIT-TO ( EXACTKEYWORD , "convolutional networks" ) OR LIMIT-TO ( EXACTKEYWORD , "object recognition" ) ). 

After filtering, 1038 papers remained in the publication pool. We sorted the result based on the number of citations. Moreover, we screened the papers for relevance by reading the title, abstract, and keywords, and finalized the list of papers.



Self-Explainable and Post-hoc Explainable Modeling -> not clear for me why these were mixed in one sub-section

-> Our response: Thank you for your feedback. We separated them into two sub-sections. 




Page 41: Why is Reinforcement learning mixed with CAMs?

 

-> Our response: Thank you for the feedback. It was an unintentional mistake. We removed it.

“While searching the 281 articles, words, including attention, were erased to prevent the dominance of the attention 282 mechanism in our work and help other explainability methods be seen.” This decision is quite questionable. Also it is not followed up upon consequently.

 

-> Our response: Thank you for the feedback. We considered all the words, and this sentence was removed.  

“this survey has not included the pre- 349 modeling explainability because”

Another statement on inclusion and exclusion criteria. These are scattered throughout the paper. The reader needs to be informed about all criteria on one place and early in the paper.

 

-> Our response: Thank you for the feedback. because the goal of the pre-modeling stage is to gain more useful insights from data and use them for model development. Moreover, this can be seen as a set approach from classical statistics for a better understanding of data rather than the model itself which is out of scope of this review paper.

“It is important to note 546

that although the California Housing dataset is tabular, as far as the data-driven machine 547

learning point of view, a different type of data is convertible. For example, we can train the 548

model on textual data instead and apply Permutation Feature Importance (PFI) to illustrate 549 the importance ranking of features, or we can employ Partial Dependence (PD) Plot to 550

demonstrate the effect of a feature on the model’s prediction.”

 

The authors seem to suggest that “tabular” data (they mean numerical data) is equivalent to textual data. This is not useful for a technology like XAI that is highly dependent on the domain.

 

Explain the drop in 2022. It seems misleading to include a year without full coverage into the graph

 

-> Our response: Thank you for the feedback. Thank you for catching the mistake. We replaced "tabular" data with "numerical" data. We also removed the sentence from 546-550. 

We considered eight months of 2022 data. We gathered data until August 24th, 2022, since we started writing the data from that point. We also removed the 2022 data to avoid confusing the audiences. 

Others. Speech and information retrieval are also NLP related

For figure 4, other sources than title could have been considered.

Figure 6: this statistics is not useful, this can be better expressed in the text.

Figure 27: why useful?

Figure 28: does it really explain anything?

 

-> Our response: Thank you for the feedback. Thank you for catching the mistake. We modified the graph. The rationale for considering the title was that when the researchers used explainability synonyms in the title of a paper, it is highly likely that the paper belongs to the XAI domain.  We removed figure 6. We removed figure 27.

 

Figure 28 shows two models performing poorly, and are not able to recognize the panda from background pixels and need to be trained for more epochs.    

  

“demonstrates the most important vocabularies” most important words, does it really explain anything?

Figure 7 brings little value for a reader. It might be much more clear after some pruning.

 

-> Our response: Thank you for the feedback. We revised the caption of figure 27 (Textual LIME)  to “The words deemed important for predicting the "Tech" class are highlighted, in red (positive influence) and blue (negative influence).” In other words, it shows which vocabulary influenced the model to classify the example as a “Tech class”.  

Figure 8: what do the colors mean?

Global and local interpretability methods help users trust a model and 346 prediction.

è Statement seems to be true for all methods anyway

“are successful in theory,” not sure, maybe in research? What is the successful AI theory?

“We ran our analytics on” very vague for a contribution

Figure 1 and others: do not use a black background

 

-> Our response: Thank you for the feedback. Blue means Self-Explainable Modeling, black means Model-Agnostic Explainability, and red means Model-Specific Explainability.

The global methods help users trust the model and Local ones help causality between input data and prediction. 

Changed the “theory” to  “research.”

The contribution changed to “We provided various examples using open-source data sets to compare various XAI techniques and tools.”

We changed the background of the figures to white.

 

Reference [38], careful, not blue in text

Reference 8, better use Lipton, “The myth of ...” ACM

 

-> Our response: Thank you for the feedback. All blue text were removed and cited properly.

Reference 8 was replaced. 

 

Structure: “Examples provided in the Global and Local model- agnostic sections are all based on 545

the California Housing dataset (table 1) in the sklearn package.”

This sentence is part of 5.1 but seems also to be applicable for 5.2, so it needs to be logically put in 5.

 

-> Our response: We addressed these comments in the new draft.

Structure: the idea to run the same dataset through several systems is appealing. However, it needs more thought. What is the purpose? Just illustration or also a comparison? The reader gets confronted with several “explanations” for the same dataset. Towards the end of the subsection, this idea is not pursued anymore.

“human users can comprehend, appropriately trust, and effectively manage the new generation of artificially intelligent partners” the statement is too strong, it is not clear whether and to which extent the current XAI technology is capable of this task yet.

Referencing style: reference in a sentence is not part of a sentence. Something like the following phrases are not correct. Check: https://libguides.murdoch.edu.au/ieee/text

introduced in [75]

is seen in [76]

[77], 381 has talked about

Overall, the paper can be understood, but there are too many issues.

Here is only a small selection of spots which require attention:

summaries have been developed -> summaries has been developed

can’t -> cannot

list of uses for XAI methods articles belongs to NLP

related to other computer science in the black-box model

 

many missing blanks between words and references, e.g. “alternatives[24–26]” and around punctuation: “Accountability,and”

On the other hand, We can

Here, they are using -> explain where

been reviewed in the survey paper[71]. -> a survey paper

linear, mono- 569tonic, or complicated. 

 

-> Our response: The aforementioned issues have been addressed in the new draft. We are throughly checking all issues.

 

Round 2

Reviewer 2 Report

The paper has improved much.

Changes are shown in the paper, however, the revision report is not very detailed.

"We have addressed your comment and emphasized our contribution. To summarize, we did the categorization based on a standard AI pipeline."  Statements like this are too general and vague to really check the improvement on this.

 

Figure 30: caption, California hosing

 

 

Back to TopTop