Explainable Artificial Intelligence for Trustworthy Machine Learning and Deep Learning Models in Healthcare

A special issue of Diagnostics (ISSN 2075-4418). This special issue belongs to the section "Machine Learning and Artificial Intelligence in Diagnostics".

Deadline for manuscript submissions: closed (31 July 2024) | Viewed by 15935

Special Issue Editor


E-Mail Website
Guest Editor
Information Systems Department, Faculty of Computers and Artificial Intelligence, Benha University, Banha 13518, Egypt
Interests: explainable AI; deep learning; machine learning; trustworthy AI; medical informatics; uncertainty quantification
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

With the advent of machine learning (ML)- and deep learning (DL)-empowered applications in critical domains such as healthcare, explainability has become one of the most heavily debated topics. The black-box nature of various DL and ML models is a roadblock to clinical utilization. Therefore, to gain the trust of clinicians and patients, we need to provide explanations about the decisions of DL and ML models. In this upcoming Special Issue, we welcome various research articles or reviews on explainable and interpretable ML techniques for various healthcare applications. The objective of this Special Issue is to explore the recent advances and techniques in the XAI (explainable artificial intelligence) area. Research topics of interest include (but are not limited to) the following:

  • Transparent-by-design machine learning models;
  • Transparent machine learning pipeline, from data collection to training, testing, and production;
  • Ante-hoc and post-hoc XAI approaches in the medical domain;
  • Context-sensitive, human-in-the-loop, and human-centric XAI algorithms;
  • Explainable and interpretable state-of-the-art neural network architectures and algorithms (e.g., transformers) and non-neural network models (e.g., trees, kernel methods; clustering algorithms) for healthcare applications; 
  • Interactive XAI using chatbots;
  • Human–computer interaction for designing user interfaces for explainability;
  • Black-box model auditing using XAI;
  • Knowledge representation for human-centric explanations;
  • Knowledge-enhanced semantic explanations;
  • Role of fuzzy knowledge representation in XAI;
  • Detecting data bias and algorithmic bias using XAI methods;
  • Visualizing causal relationships;
  • Integrating social and ethical aspects of explainability;
  • Designing new explanation modalities;
  • Multimodal XAI;
  • Design, development, and evaluation of responsible XAI;
  • Role of natural language generation in XAI;
  • Novel criteria to evaluate explanation and interpretability;
  • Applications of ontologies for explainability and trustworthiness in specific domains;
  • Factual and counterfactual explanations;
  • Causal thinking, reasoning, and modeling;
  • Uncertainty quantification in XAI algorithms;
  • Exploring existing and proposing new theoretical aspects of explainability and interpretability;
  • Fairness, accountability, and transparency in healthcare XAI;
  • Explainable AI, big data, electronic health record, and clinical decision support systems;
  • Role of ontologies and knowledge in XAI;
  • Empirical studies of (human-centric) XAI applications in healthcare, bioinformatics, and medical informatics.

Dr. Shaker El-Sappagh
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Diagnostics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 786 KiB  
Article
Forecasting Patient Early Readmission from Irish Hospital Discharge Records Using Conventional Machine Learning Models
by Minh-Khoi Pham, Tai Tan Mai, Martin Crane, Malick Ebiele, Rob Brennan, Marie E. Ward, Una Geary, Nick McDonald and Marija Bezbradica
Diagnostics 2024, 14(21), 2405; https://doi.org/10.3390/diagnostics14212405 - 29 Oct 2024
Viewed by 235
Abstract
Background/Objectives: Predicting patient readmission is an important task for healthcare risk management, as it can help prevent adverse events, reduce costs, and improve patient outcomes. In this paper, we compare various conventional machine learning models and deep learning models on a multimodal dataset [...] Read more.
Background/Objectives: Predicting patient readmission is an important task for healthcare risk management, as it can help prevent adverse events, reduce costs, and improve patient outcomes. In this paper, we compare various conventional machine learning models and deep learning models on a multimodal dataset of electronic discharge records from an Irish acute hospital. Methods: We evaluate the effectiveness of several widely used machine learning models that leverage patient demographics, historical hospitalization records, and clinical diagnosis codes to forecast future clinical risks. Our work focuses on addressing two key challenges in the medical fields, data imbalance and the variety of data types, in order to boost the performance of machine learning algorithms. Furthermore, we also employ SHapley Additive Explanations (SHAP) value visualization to interpret the model predictions and identify both the key data features and disease codes associated with readmission risks, identifying a specific set of diagnosis codes that are significant predictors of readmission within 30 days. Results: Through extensive benchmarking and the application of a variety of feature engineering techniques, we successfully improved the area under the curve (AUROC) score from 0.628 to 0.7 across our models on the test dataset. We also revealed that specific diagnoses, including cancer, COPD, and certain social factors, are significant predictors of 30-day readmission risk. Conversely, bacterial carrier status appeared to have minimal impact due to lower case frequencies. Conclusions: Our study demonstrates how we effectively utilize routinely collected hospital data to forecast patient readmission through the use of conventional machine learning while applying explainable AI techniques to explore the correlation between data features and patient readmission rate. Full article
Show Figures

Figure 1

18 pages, 2149 KiB  
Article
Interpretable Clinical Decision-Making Application for Etiological Diagnosis of Ventricular Tachycardia Based on Machine Learning
by Min Wang, Zhao Hu, Ziyang Wang, Haoran Chen, Xiaowei Xu, Si Zheng, Yan Yao and Jiao Li
Diagnostics 2024, 14(20), 2291; https://doi.org/10.3390/diagnostics14202291 - 16 Oct 2024
Viewed by 481
Abstract
Background: Ventricular tachycardia (VT) can broadly be categorised into ischemic heart disease, non-ischemic structural heart disease, and idiopathic VT. There are few studies related to the application of machine learning for the etiological diagnosis of VT, and the interpretable methods are still in [...] Read more.
Background: Ventricular tachycardia (VT) can broadly be categorised into ischemic heart disease, non-ischemic structural heart disease, and idiopathic VT. There are few studies related to the application of machine learning for the etiological diagnosis of VT, and the interpretable methods are still in the exploratory stage for clinical decision-making applications. Objectives: The aim is to propose a machine learning model for the etiological diagnosis of VT. Interpretable results based on models are compared with expert knowledge, and interpretable evaluation protocols for clinical decision-making applications are developed. Methods: A total of 1305 VT patient data from 1 January 2013 to 1 September 2023 at the Arrhythmia Centre of Fuwai Hospital were included in the study. Clinical data collected during hospitalisation included demographics, medical history, vital signs, echocardiographic results, and laboratory test outcomes. Results: The XGBoost model demonstrated the best performance in VT etiological diagnosis (precision, recall, and F1 were 88.4%, 88.5%, and 88.4%, respectively). A total of four interpretable machine learning methods applicable to clinical decision-making were evaluated in terms of visualisation, clinical usability, clinical applicability, and efficiency with expert knowledge interpretation. Conclusions: The XGBoost model demonstrated superior performance in the etiological diagnosis of VT, and SHAP and decision tree interpretable methods are more favoured by clinicians for decision-making. Full article
Show Figures

Figure 1

16 pages, 2960 KiB  
Article
Improving the Generalizability and Performance of an Ultrasound Deep Learning Model Using Limited Multicenter Data for Lung Sliding Artifact Identification
by Derek Wu, Delaney Smith, Blake VanBerlo, Amir Roshankar, Hoseok Lee, Brian Li, Faraz Ali, Marwan Rahman, John Basmaji, Jared Tschirhart, Alex Ford, Bennett VanBerlo, Ashritha Durvasula, Claire Vannelli, Chintan Dave, Jason Deglint, Jordan Ho, Rushil Chaudhary, Hans Clausdorff, Ross Prager, Scott Millington, Samveg Shah, Brian Buchanan and Robert Arntfieldadd Show full author list remove Hide full author list
Diagnostics 2024, 14(11), 1081; https://doi.org/10.3390/diagnostics14111081 - 22 May 2024
Viewed by 879
Abstract
Deep learning (DL) models for medical image classification frequently struggle to generalize to data from outside institutions. Additional clinical data are also rarely collected to comprehensively assess and understand model performance amongst subgroups. Following the development of a single-center model to identify the [...] Read more.
Deep learning (DL) models for medical image classification frequently struggle to generalize to data from outside institutions. Additional clinical data are also rarely collected to comprehensively assess and understand model performance amongst subgroups. Following the development of a single-center model to identify the lung sliding artifact on lung ultrasound (LUS), we pursued a validation strategy using external LUS data. As annotated LUS data are relatively scarce—compared to other medical imaging data—we adopted a novel technique to optimize the use of limited external data to improve model generalizability. Externally acquired LUS data from three tertiary care centers, totaling 641 clips from 238 patients, were used to assess the baseline generalizability of our lung sliding model. We then employed our novel Threshold-Aware Accumulative Fine-Tuning (TAAFT) method to fine-tune the baseline model and determine the minimum amount of data required to achieve predefined performance goals. A subgroup analysis was also performed and Grad-CAM++ explanations were examined. The final model was fine-tuned on one-third of the external dataset to achieve 0.917 sensitivity, 0.817 specificity, and 0.920 area under the receiver operator characteristic curve (AUC) on the external validation dataset, exceeding our predefined performance goals. Subgroup analyses identified LUS characteristics that most greatly challenged the model’s performance. Grad-CAM++ saliency maps highlighted clinically relevant regions on M-mode images. We report a multicenter study that exploits limited available external data to improve the generalizability and performance of our lung sliding model while identifying poorly performing subgroups to inform future iterative improvements. This approach may contribute to efficiencies for DL researchers working with smaller quantities of external validation data. Full article
Show Figures

Figure 1

12 pages, 2587 KiB  
Article
Automated Prediction of Neoadjuvant Chemoradiotherapy Response in Locally Advanced Cervical Cancer Using Hybrid Model-Based MRI Radiomics
by Hua Yang, Yinan Xu, Mohan Dong, Ying Zhang, Jie Gong, Dong Huang, Junhua He, Lichun Wei, Shigao Huang and Lina Zhao
Diagnostics 2024, 14(1), 5; https://doi.org/10.3390/diagnostics14010005 - 19 Dec 2023
Cited by 1 | Viewed by 1253
Abstract
Background: This study aimed to develop a model that automatically predicts the neoadjuvant chemoradiotherapy (nCRT) response for patients with locally advanced cervical cancer (LACC) based on T2-weighted MR images and clinical parameters. Methods: A total of 138 patients were enrolled, and T2-weighted MR [...] Read more.
Background: This study aimed to develop a model that automatically predicts the neoadjuvant chemoradiotherapy (nCRT) response for patients with locally advanced cervical cancer (LACC) based on T2-weighted MR images and clinical parameters. Methods: A total of 138 patients were enrolled, and T2-weighted MR images and clinical information of the patients before treatment were collected. Clinical information included age, stage, pathological type, squamous cell carcinoma (SCC) level, and lymph node status. A hybrid model extracted the domain-specific features from the computational radiomics system, the abstract features from the deep learning network, and the clinical parameters. Then, it employed an ensemble learning classifier weighted by logistic regression (LR) classifier, support vector machine (SVM) classifier, K-Nearest Neighbor (KNN) classifier, and Bayesian classifier to predict the pathologic complete response (pCR). The area under the receiver operating characteristics curve (AUC), accuracy (ACC), true positive rate (TPR), true negative rate (TNR), and precision were used as evaluation metrics. Results: Among the 138 LACC patients, 74 were in the pCR group, and 64 were in the non-pCR group. There was no significant difference between the two cohorts in terms of tumor diameter (p = 0.787), lymph node (p = 0.068), and stage before radiotherapy (p = 0.846), respectively. The 109-dimension domain features and 1472-dimension abstract features from MRI images were used to form a hybrid model. The average AUC, ACC, TPR, TNR, and precision of the proposed hybrid model were about 0.80, 0.71, 0.75, 0.66, and 0.71, while the AUC values of using clinical parameters, domain-specific features, and abstract features alone were 0.61, 0.67 and 0.76, respectively. The AUC value of the model without an ensemble learning classifier was 0.76. Conclusions: The proposed hybrid model can predict the radiotherapy response of patients with LACC, which might help radiation oncologists create personalized treatment plans for patients. Full article
Show Figures

Figure 1

24 pages, 3970 KiB  
Article
Deep Learning-Based Approaches for Enhanced Diagnosis and Comprehensive Understanding of Carpal Tunnel Syndrome
by Marwa Elseddik, Khaled Alnowaiser, Reham R. Mostafa, Ahmed Elashry, Nora El-Rashidy, Shimaa Elgamal, Ahmed Aboelfetouh and Hazem El-Bakry
Diagnostics 2023, 13(20), 3211; https://doi.org/10.3390/diagnostics13203211 - 14 Oct 2023
Cited by 4 | Viewed by 1753
Abstract
Carpal tunnel syndrome (CTS) is a prevalent medical condition resulting from compression of the median nerve in the hand, often caused by overuse or age-related factors. In this study, a total of 160 patients participated, including 80 individuals with CTS presenting varying levels [...] Read more.
Carpal tunnel syndrome (CTS) is a prevalent medical condition resulting from compression of the median nerve in the hand, often caused by overuse or age-related factors. In this study, a total of 160 patients participated, including 80 individuals with CTS presenting varying levels of severity across different age groups. Numerous studies have explored the use of machine learning (ML) and deep learning (DL) techniques for CTS diagnosis. However, further research is required to fully leverage the potential of artificial intelligence (AI) technology in CTS diagnosis, addressing the challenges and limitations highlighted in the existing literature. In our work, we propose a novel approach for CTS diagnosis, prediction, and monitoring disease progression. The proposed framework consists of three main layers. Firstly, we employ three distinct DL models for CTS diagnosis. Through our experiments, the proposed approach demonstrates superior performance across multiple evaluation metrics, with an accuracy of 0.969%, precision of 0.982%, and recall of 0.963%. The second layer focuses on predicting the cross-sectional area (CSA) at 1, 3, and 6 months using ML models, aiming to forecast disease progression during therapy. The best-performing model achieves an accuracy of 0.9522, an R2 score of 0.667, a mean absolute error (MAE) of 0.0132, and a median squared error (MdSE) of 0.0639. The highest predictive performance is observed after 6 months. The third layer concentrates on assessing significant changes in the patients’ health status through statistical tests, including significance tests, the Kruskal-Wallis test, and a two-way ANOVA test. These tests aim to determine the effect of injections on CTS treatment. The results reveal a highly significant reduction in symptoms, as evidenced by scores from the Symptom Severity Scale and Functional Status Scale, as well as a decrease in CSA after 1, 3, and 6 months following the injection. SHAP is then utilized to provide an understandable explanation of the final prediction. Overall, our study presents a comprehensive approach for CTS diagnosis, prediction, and monitoring, showcasing promising results in terms of accuracy, precision, and recall for CTS diagnosis, as well as effective prediction of disease progression and evaluation of treatment effectiveness through statistical analysis. Full article
Show Figures

Figure 1

24 pages, 6781 KiB  
Article
Empowering Foot Health: Harnessing the Adaptive Weighted Sub-Gradient Convolutional Neural Network for Diabetic Foot Ulcer Classification
by Abdullah Alqahtani, Shtwai Alsubai, Mohamudha Parveen Rahamathulla, Abdu Gumaei, Mohemmed Sha, Yu-Dong Zhang and Muhammad Attique Khan
Diagnostics 2023, 13(17), 2831; https://doi.org/10.3390/diagnostics13172831 - 1 Sep 2023
Cited by 4 | Viewed by 1420
Abstract
In recent times, DFU (diabetic foot ulcer) has become a universal health problem that affects many diabetes patients severely. DFU requires immediate proper treatment to avert amputation. Clinical examination of DFU is a tedious process and complex in nature. Concurrently, DL (deep learning) [...] Read more.
In recent times, DFU (diabetic foot ulcer) has become a universal health problem that affects many diabetes patients severely. DFU requires immediate proper treatment to avert amputation. Clinical examination of DFU is a tedious process and complex in nature. Concurrently, DL (deep learning) methodologies can show prominent outcomes in the classification of DFU because of their efficient learning capacity. Though traditional systems have tried using DL-based models to procure better performance, there is room for enhancement in accuracy. Therefore, the present study uses the AWSg-CNN (Adaptive Weighted Sub-gradient Convolutional Neural Network) method to classify DFU. A DFUC dataset is considered, and several processes are involved in the present study. Initially, the proposed method starts with pre-processing, excluding inconsistent and missing data, to enhance dataset quality and accuracy. Further, for classification, the proposed method utilizes the process of RIW (random initialization of weights) and log softmax with the ASGO (Adaptive Sub-gradient Optimizer) for effective performance. In this process, RIW efficiently learns the shift of feature space between the convolutional layers. To evade the underflow of gradients, the log softmax function is used. When logging softmax with the ASGO is used for the activation function, the gradient steps are controlled. An adaptive modification of the proximal function simplifies the learning rate significantly, and optimal proximal functions are produced. Due to such merits, the proposed method can perform better classification. The predicted results are displayed on the webpage through the HTML, CSS, and Flask frameworks. The effectiveness of the proposed system is evaluated with accuracy, recall, F1-score, and precision to confirm its effectual performance. Full article
Show Figures

Figure 1

11 pages, 1518 KiB  
Article
Usefulness of Heat Map Explanations for Deep-Learning-Based Electrocardiogram Analysis
by Andrea M. Storås, Ole Emil Andersen, Sam Lockhart, Roman Thielemann, Filip Gnesin, Vajira Thambawita, Steven A. Hicks, Jørgen K. Kanters, Inga Strümke, Pål Halvorsen and Michael A. Riegler
Diagnostics 2023, 13(14), 2345; https://doi.org/10.3390/diagnostics13142345 - 11 Jul 2023
Cited by 2 | Viewed by 2845
Abstract
Deep neural networks are complex machine learning models that have shown promising results in analyzing high-dimensional data such as those collected from medical examinations. Such models have the potential to provide fast and accurate medical diagnoses. However, the high complexity makes deep neural [...] Read more.
Deep neural networks are complex machine learning models that have shown promising results in analyzing high-dimensional data such as those collected from medical examinations. Such models have the potential to provide fast and accurate medical diagnoses. However, the high complexity makes deep neural networks and their predictions difficult to understand. Providing model explanations can be a way of increasing the understanding of “black box” models and building trust. In this work, we applied transfer learning to develop a deep neural network to predict sex from electrocardiograms. Using the visual explanation method Grad-CAM, heat maps were generated from the model in order to understand how it makes predictions. To evaluate the usefulness of the heat maps and determine if the heat maps identified electrocardiogram features that could be recognized to discriminate sex, medical doctors provided feedback. Based on the feedback, we concluded that, in our setting, this mode of explainable artificial intelligence does not provide meaningful information to medical doctors and is not useful in the clinic. Our results indicate that improved explanation techniques that are tailored to medical data should be developed before deep neural networks can be applied in the clinic for diagnostic purposes. Full article
Show Figures

Figure 1

21 pages, 2183 KiB  
Article
Polycystic Ovary Syndrome Detection Machine Learning Model Based on Optimized Feature Selection and Explainable Artificial Intelligence
by Hela Elmannai, Nora El-Rashidy, Ibrahim Mashal, Manal Abdullah Alohali, Sara Farag, Shaker El-Sappagh and Hager Saleh
Diagnostics 2023, 13(8), 1506; https://doi.org/10.3390/diagnostics13081506 - 21 Apr 2023
Cited by 23 | Viewed by 5286
Abstract
Polycystic ovary syndrome (PCOS) has been classified as a severe health problem common among women globally. Early detection and treatment of PCOS reduce the possibility of long-term complications, such as increasing the chances of developing type 2 diabetes and gestational diabetes. Therefore, effective [...] Read more.
Polycystic ovary syndrome (PCOS) has been classified as a severe health problem common among women globally. Early detection and treatment of PCOS reduce the possibility of long-term complications, such as increasing the chances of developing type 2 diabetes and gestational diabetes. Therefore, effective and early PCOS diagnosis will help the healthcare systems to reduce the disease’s problems and complications. Machine learning (ML) and ensemble learning have recently shown promising results in medical diagnostics. The main goal of our research is to provide model explanations to ensure efficiency, effectiveness, and trust in the developed model through local and global explanations. Feature selection methods with different types of ML models (logistic regression (LR), random forest (RF), decision tree (DT), naive Bayes (NB), support vector machine (SVM), k-nearest neighbor (KNN), xgboost, and Adaboost algorithm to get optimal feature selection and best model. Stacking ML models that combine the best base ML models with meta-learner are proposed to improve performance. Bayesian optimization is used to optimize ML models. Combining SMOTE (Synthetic Minority Oversampling Techniques) and ENN (Edited Nearest Neighbour) solves the class imbalance. The experimental results were made using a benchmark PCOS dataset with two ratios splitting 70:30 and 80:20. The result showed that the Stacking ML with REF feature selection recorded the highest accuracy at 100 compared to other models. Full article
Show Figures

Figure 1

Back to TopTop