A Feature-Augmented Explainable Artificial Intelligence Model for Diagnosing Alzheimer’s Disease from Multimodal Clinical and Neuroimaging Data

Fatima Hasan Al-bakri; Wan Mohd Yaakob Wan Bejuri; Mohamed Nasser Al-Andoli; Raja Rina Raja Ikram; Hui Min Khor; Yus Sholva; Umi Kalsom Ariffin; Noorayisahbe Mohd Yaacob; Zuraida Abal Abas; Zaheera Zainal Abidin; Siti Azirah Asmai; Asmala Ahmad; Ahmad Fadzli Nizam Abdul Rahman; Hidayah Rahmalan; Md Fahmi Abd Samad

doi:10.3390/diagnostics15162060

Abstract

Background/Objectives: This study presents a survey-based evaluation of an explainable AI (Feature-Augmented) approach, which was designed to support the diagnosis of Alzheimer’s disease (AD) by integrating clinical data, MMSE scores, and MRI scans. The approach combines rule-based reasoning and example-based visualization to improve the explainability of AI-generated decisions. Methods: Five doctors participated in the survey: two with 6 to 10 years of experience and three with more than 10 years of experience in the medical field and expertise in AD. The participants evaluated different AI outputs, including clinical feature-based interpretations, MRI-based visual heat maps, and a combined interpretation approach. Results: The model achieved a 100% trust score, with 20% of the participants reporting full trust and 80% expressing conditional trust, understanding the diagnosis but seeking further clarification. Overall, the participants reported that the integrated explanation format improved their understanding of the model decisions and enhanced their confidence in using AI-assisted diagnosis. Conclusions: To our knowledge, this paper is the first to gather the views of medical experts to evaluate the explainability of an AI decision-making model when diagnosing AD. These preliminary findings suggest that explainability plays a key role in building trust and ease of use of AI tools in clinical settings, especially when used by experienced clinicians to support complex diagnoses, such as AD.

Keywords:

Alzheimer’s disease; artificial intelligence; explainable AI; ensemble learning; meta-model; clinical data; MRI; MMSE; lateral ventricles; CNN; clinical decision; SHAP; Grad-CAM

1. Introduction

Alzheimer’s disease (AD) presents major diagnostic challenges, particularly in its early stages [1,2,3]. Recent developments in artificial intelligence (AI) have led to the development of decision support tools [4,5,6,7]. Some enable doctors to analyze clinical data and medical imaging [2,8,9]. However, the ambiguous nature of many AI models, known as the “black box” problem, limits their use in clinical settings. Explainable AI (XAI) aims to address this limitation by making AI decisions explainable and reliable for end users [10,11,12,13]. Explainability is especially important in clinical contexts [1,14,15], where clinicians must understand and justify diagnoses based on both medical knowledge and patient data [8,13,16,17].

To improve the explainability of diagnostic models for AD, we introduce the Feature-Augmented framework, an AI framework that integrates rule-based explanations with example-based visualizations. This approach combines the strengths of interpreting clinical data with visual signals from MRI scans to provide a comprehensive explanation of the model’s decision-making process. This paper presents the results of a small-scale survey conducted with experienced clinicians to assess the effectiveness of the “Feature-Augmented” approach. This study explores how integrated explanation mechanisms can support clinical decision-making and enhance clinicians’ confidence in AI-based diagnostics.

2. Related Work

2.1. Background of Explainable Artificial Intelligence (XAI) in Alzheimer’s Diagnosis

Recent years have seen significant development in the use of AI models for the early detection of AD [18,19,20,21], particularly using MRI data and clinical data, such as the Mini-Mental State Examination (MMSE). Despite the remarkable success in improving the accuracy of predictive models, many of these models rely on deep and complex neural networks [22,23,24,25,26,27,28,29,30,31,32,33], whose decisions are difficult to interpret, reinforcing the “black box” problem in sensitive medical systems [20]. For example, some models have relied on processing all brain segments to extract predictive features [26,28,34], which has complicated the output and made it difficult for non-specialized clinicians to track the logic of decision-making [35].

To address the explainability problem, several studies have used XAI tools in AD systems. SHAP techniques are effective at displaying the relative importance of each patient’s clinical characteristics, enabling clinicians to follow the model’s rationale more clearly. Furthermore, Grad-CAM has been utilized with CNN models to analyze MRI images and identify brain regions that affect diagnosis, thereby providing visual interpretations that support model decisions [35]. Some studies [36,37] have built hybrid models to combine outputs and achieve high accuracy. However, interpretation remains challenging when presenting the results to users [17,23]. The use of ensemble learning with parallel learning enables the filtering of irrelevant features and focuses on those most strongly associated with AD [38,39]. This approach ensures that XAI outputs are generated based on more reliable predictions, thereby increasing clinicians’ trust in the model.

However, while these approaches have improved interpretability to some extent, the evaluation of how well clinicians understand and trust XAI outputs remains largely unexplored. To our knowledge, existing XAI studies in Alzheimer’s diagnosis (e.g., those integrating SHAP or Grad-CAM with clinical datasets) have not incorporated formal metrics for trust or usability, highlighting an important area for future work in evaluating explainability effectiveness.

Figure 1 is a conceptual illustration comparing two approaches to interpreting XAI-based clinical decisions. The upper section illustrates a typical XAI scenario, where feature extraction involves both essential and non-essential features, often relying on a single model, all MRI slices, or limited clinical data, resulting in insufficient annotation methods and low physician confidence. In contrast, the bottom section represents our proposed Feature-Augmented XAI approach, which uses ensemble learning with parallel learning to retain only the most important features, and applies multiple annotation methods, including SHAP for rule-based feature attribution and Grad-CAM for MRI saliency mapping, to make AI decisions more transparent and actionable for healthcare providers.

Figure 1. Conceptual comparison of standard and proposed XAI feature extraction approaches for clinical decision support.

2.2. Problem Formulation

Despite significant advances in the performance of AI models in diagnosing AD, the “black box” problem remains a major challenge in adopting these models in actual clinical contexts. Most deep learning-based models are highly accurate but lack explainability, which undermines physicians’ confidence in their results and hinders their integration into clinical decisions [24,31,32,40,41].

Previous studies relied on fully processing MRI using all slices; this approach contributed to the complexity of clinical interpretation of the results, as it was not clear whether all slices were of equal importance in diagnosis or not [18]. This leads many current AI methods, such as Grad-CAM [27,29] or LIME [28,34], to highlight areas outside the brain as most important for classification, reducing trust in such AI tools within the medical domain. Moreover, many interpretation tools provide results that are difficult for end users to understand. For example, applying SHAP to MRI data often produces outputs that are challenging to interpret [26]. Current explanation tools are limited to uninterpretable heatmaps generated by Grad-CAM or to numerical values produced by SHAP without a direct connection to the clinical context [30].

Most current studies focus on providing only one explanation using SHAP or Grad-CAM [26,27,28,29,30,34], which is insufficient to build real confidence in the model’s decisions, particularly in an Alzheimer’s diagnosis, a sensitive medical application [14]. While several recent studies have integrated SHAP or Grad-CAM with clinical datasets and MRI, routine deployment within hospital workflows remains rare due to limited trust in the decision-making process and concerns over the reliability of AI-generated explanations.

It is essential to provide multiple, integrated explanations—visual, textual, and numerical—to meet the needs of clinicians, researchers, and practitioners. Such an approach would enhance the model’s interpretability and user confidence, while also supporting local interpretation of individual cases, as clinical decisions are often based on assessing a specific patient’s condition.

Explanation methods in XAI are classified according to several key dimensions that help with choosing the most appropriate tool or methodology based on the type of model and the purpose of the explanation [35]. (1) In terms of scope, explanations can be global, explaining the decision-making mechanism of the model as a whole, or local, focusing on explaining a specific decision for a specific data point. (2) In terms of the implementation method, some methods are applied after training the model (post hoc) to interpret it without modification, while others are integrated into the model structure from the beginning (ante hoc). (3) In terms of the form of explanation, results may be visual (e.g., heat maps), textual (human-readable descriptions), numerical (highlighting the quantitative impact of each variable), or logical (rule-based). (4) In terms of applicability, there are model-agnostic methods usable with any machine learning model and model-specific methods tailored to certain architectures.

By reviewing [35], our model contributes to addressing several gaps and criticisms raised by this reference. This gap was addressed by developing a Feature-Augmented approach, which combines multiple interpretive tools to present accurate and understandable results for users in real-world clinical settings. Figure 2 illustrates the structure of the proposed Feature-Augmented approach, showing how SHAP-based clinical explanations and Grad-CAM MRI visualizations are combined into a single, integrated interface that enhances both the accuracy and interpretability for clinical decision-making.

Figure 2. Classification of explanation methods used in the proposed Feature-Augmented approach.

The reliance on single explanations, highlighting irrelevant brain regions, and producing outputs disconnected from the clinical context are limitations in current XAI tools for AD diagnosis. These limitations motivated the development of our proposed framework, which integrates parallel ensemble learning with multimodal explanations to enhance the trust and interpretability. The innovation in our study was the exclusive reliance on the mid-slice to reduce the model complexity and enhance the transparency, enabling clinicians to understand model decisions more easily. In addition, clinical data and MRI results were integrated within the parallel ensemble learning model and subsequently employed (SHAP and Grad-CAM) to provide targeted visual and textual interpretations for different categories of users, including doctors and medical students.

3. Methods

3.1. Objective and Theoretical Foundation of the Feature-Augmented Framework

The main objective of this research is to improve the explainability and usability of AI tools for medical experts in real-world clinical settings. To achieve this, the Feature-Augmented approach is proposed for early AD classification. This framework integrates rule-based interpretations (SHAP) with example-based visual interpretations (Grad-CAM) into a unified system that supports clinical decision-making in crowded or resource-limited environments where specialists may be absent.

The approach combines feature attribution techniques and image saliency maps, enhanced with textual explanations, to help general practitioners and non-specialists identify key clinical and imaging indicators of AD. To ensure accurate interpretation, clinical data are integrated with imaging information from only the middle MRI slices that show the lateral ventricles, as ventricular dilatation is a well-established hallmark of AD progression [42]. Focusing on this slice provides a high signal-to-noise ratio by excluding the less informative upper and lower brain regions, thereby reducing irrelevant activation in the Grad-CAM heatmaps. From a diagnostic perspective, concentrating on the mid-slice allows the model to learn patterns closely aligned with radiological practice, which improves both the classification accuracy and interpretability without requiring full-volume MRI processing [42].

After extracting these relevant features, ensemble learning is applied to emphasize the important attributes and filter out irrelevant ones, ensuring reliable predictions before presenting the interpretation results.

The theoretical formulation of the Feature-Augmented approach relies on three key components: First, a confidence score is provided to reflect the reliability of the prediction. Second, the feature attribution explanation is generated using SHAP values, which quantify the contribution of each clinical feature to the model’s prediction:

S_{c l i n i c a l} (x) = φ_{1} \cdot x_{1} + φ_{2} \cdot x_{2} + \dots + φ ₙ \cdot x ₙ

(1)

The most important values of SHAP on which the interpretation is based are explained. Third, the visual explanation is provided using Grad-CAM, which highlights important regions in the MRI slice that influenced the model’s decision:

S_{i m a g e} (I) = \sum ₖ α ₖ \cdot A ₖ

(2)

This visual map is displayed alongside a textual description of the affected brain regions. Since the MRI was trained on only the middle slice of the brain, which clearly shows the lateral ventricles, Grad-CAM correctly visualizes them as the regions that most influenced the decision, increasing the confidence in the model. Thus, the explanation is reliable and can be used clinically.

Table 1 provides an organized overview of the study phases conducted in this study. Each stage addresses a specific aspect of the proposed enhancement. The table summarizes the research topic, methodology, performance measures, and validation approach taken at each stage.

Table 1. Overview of the research phases, methodology, and evaluation for the Feature-Augmented approach in this study.

3.2. Feature-Augmented Explainable Artificial Intelligence

This section describes the methodology employed in this research, which comprises three main phases: design, implementation, and evaluation. These stages are summarized in Figure 3, which displays a schematic diagram of the general framework.

Figure 3. Block diagram of the design, implementation, and evaluation stages for the Feature-Augmented approach in this study.

3.2.1. Model Architecture and Ensemble Design

As detailed in our previous work [42], clinical data are preprocessed by encoding categorical variables, creating additional ratio features, and scaling numeric values. MRIs are processed separately by extracting deep features using a pretrained ResNet50 model, followed by normalization. After preprocessing, the clinical and MRI features are horizontally concatenated to form a combined feature vector representing both data types for each patient. This fusion allows base classifiers to learn from the integrated information in a unified input space. Prediction probabilities from each base model are then combined, and a logistic regression meta-learner learns to balance them for improved accuracy. To preserve interpretability, explanations are generated separately for each data modality (SHAP for clinical features and Grad-CAM for MRI images), ensuring that combining the outputs does not reduce the clarity of the explanations presented to the end user.

We used an ensemble learning strategy [38,39,43,44,45,46,47]; ensemble learning allows each model within our selected set to focus on specific, high-impact features of both clinical and MRI data to produce clearer interpretations.

Our approach was based on four basic classifiers: Random Forest (RF), Extreme Gradient Boosting (XGB), Support Vector Machine (SVM), and Gradient Boosting (GB). The RF classifier aggregates predictions from multiple decision trees using majority voting. The final prediction

ŷ

is given by majority voting [45,48]:

ŷ = m o d e (y_{1}, y_{2}, \dots, y ₙ)

(3)

where

y ₙ

is the class predicted by the

i^{t h}

tree.

XGB sequentially builds trees and minimizes a regularized objective function [49]:

L (t) \approx \sum_{\{i = 1\}}^{\{N\}} [g ᵢ \cdot f ₜ (x ᵢ) + (1 / 2) \cdot h ᵢ \cdot {f ₜ}^{2} (x ᵢ)] + Ω (f ₜ)

(4)

SVMs are effective for well-separated data. The binary decision function is defined as follows [50]:

f (x) = s i g n (\sum_{\{i = 1\}}^{\{T\}} α ᵢ y ᵢ K (x ᵢ, x) + b)

(5)

where

α ᵢ

are Lagrange multipliers, yᵢ ∈ {±1},

k

is the kernel function, and

b

is the bias.

GB iteratively fits new learners to the negative gradient of the loss function [49]:

g_{t (x)} = E_{y} [\begin{matrix} \frac{\partial ψ (y, f (x))}{\partial f (x)} \end{matrix}| x] w h e r e f (x) = f^{\{t - 1\} (x)}

(6)

Ensemble learning works by aggregating predictions from baseline models, reducing unnecessary features, and enhancing the model explainability. Our meta-model learns optimal weights to combine predictions through five-fold cross-validation. It performed the final classification into three classes: Alzheimer’s disease (AD), mild cognitive impairment (MCI), and normal cognitive impairment (CN). This architecture improves the performance by leveraging tabular and image-based features while maintaining explainability. The equation of the loss function used to train the meta-learner in our ensemble model is as follows [43]:

L (θ) = - \sum_{\{i = 1\}}^{\{N\}} [y ᵢ l o g (ŷ ᵢ) + (1 - y ᵢ) l o g (1 - ŷ ᵢ)] + {λ ‖ θ ‖}^{2}

(7)

Binary cross-entropy (logarithmic loss) measures how closely the predicted probabilities

ŷ ᵢ

match the correct binary labels

y ᵢ

. The

L 2

regularization threshold is controlled by the parameter

λ

, which penalizes large weights in the model to reduce overfitting. This combination helps the model make accurate predictions and achieve better generalization.

3.2.2. Explainability Framework

We adopted a dual interpretation strategy using two integrated XAI methods. For the clinical model, SHAP (SHapley Additive Interpretation) was used to generate feature importance values. As for the CNN-based MRI model, Grad-CAM (Gradient-Weighted Class Activation Mapping) was used to generate heatmaps indicating which image regions contribute to the model’s prediction.

This dual framework addresses concerns about ambiguity resulting from using multiple XAI tools on the same pattern of data. By assigning SHAP to clinical data and Grad-CAM to images, we avoided interpretive inconsistencies and enhanced the clarity of the interpretations.

4. Experimental Results and Discussion

This dual framework addresses concerns about ambiguity resulting from using multiple XAI tools on the same pattern of data. By assigning SHAP to clinical data and Grad-CAM to images, we avoided interpretive inconsistencies and enhanced the clarity of interpretations.

4.1. Data Processing

For the MRI data, we specifically selected medial slices that prominently showed the lateral ventricles since these regions are clinically associated with brain atrophy in AD. This approach is consistent with recommendations in previous references to focus on regions with clearer underlying anatomical facts to obtain more interpretable output.

Figure 4 displays the Grad-CAM heatmaps produced by the proposed improved model for each category. The visualizations highlight the lateral ventricles, which are the primary focus areas used by the model for decision-making. These regions correspond to known clinical indicators of AD progression.

Figure 4. Grad-CAM heatmap showing the highlighted lateral ventricles in a mid-slice MRI; this was produced by the proposed Feature-Augmented XAI approach to explain how the diagnosis of AD was made.

Clinical data were also used, with clinical characteristics selected based on their proven relationship in the medical literature with the stages of AD progression, along with dilatation of the lateral ventricles on MRI images. Clinical features included demographic factors, such as age and gender, given their direct impact on the risk of developing AD, as well as years of education as an indicator of cognitive reserve, which may influence resistance to cognitive decline. The Mini-Mental State Examination (MMSE) scores, a primary standardized tool for assessing cognitive function, and the Clinical Dementia Score (CDR), which helps classify the severity of the condition, were also included. In addition, other indicators were used, such as delayed memory performance and the level of independence in daily activities, as represented by the FAQ scale.

All MRIs were scaled and normalized, and the clinical data underwent processing steps, such as missing value imputation and normalization. The final dataset was weighted to address problems of class imbalance observed in previous studies. The final model, trained on clinical data and MRI mid-slices, achieved 99.00% accuracy. Moreover, the use of ensemble learning significantly improved the computational efficiency and reduced the processing time. After the high-performance prediction phase, the next crucial step is the explanation process, where the XAI module is activated to provide clear and explainable justifications for every decision the model makes.

The full implementation code used for this study is available at https://github.com/F-H5/diagnosis_explanation_module (accessed on 15 May 2025).

4.2. Model Configuration and XAI Integration

The proposed model accepts two main types of input: (1) clinical features, including demographic and cognitive test results, and (2) intermediate MRI slices, specifically selected to capture the lateral ventricles, which are known to show structural changes in AD patients.

To ensure transparency, each data type is associated with a custom annotation method:

SHAP is applied to clinical data. This model quantifies the contribution of each feature to the final prediction and presents the results in graphical and textual form, making them easier for clinicians to understand.
Grad-CAM is used for MRI images. It produces heat maps superimposed on the original slices to highlight the brain regions that influenced the classification, helping non-radiologists visualize relevant anatomical patterns.

The outputs of the Feature-Augmented XAI approach consist of three key components: (1) A confidence score is provided to reflect the reliability of the prediction. (2) The feature attribution explanation is generated using SHAP values, which quantify the contribution of each clinical feature to the model’s prediction. The most important values of SHAP on which the interpretation is based are explained. (3) The visual explanation is provided using Grad-CAM, which highlights important regions in the MRI slice that influenced the model’s decision. This visual map is displayed alongside a textual description of the affected brain regions. Since the MRI data were trained on only the middle slice of the brain, which clearly shows the lateral ventricles, Grad-CAM correctly visualizes them as the regions that most influenced the decision. This increases confidence in the model and supports clinical use.

4.3. Performance Evaluation

4.3.1. Questionnaire Design and Expert Involvement

To evaluate the clarity, usefulness, and reliability of the explanations generated by the Feature-Based Explainable, a case study was designed and conducted [51]. The study targeted healthcare professionals to assess how different user groups perceive and interact with the outcomes of the explanations. Participants were shown model predictions, SHAP-based feature importance, Grad-CAM visualizations, and accompanying textual explanations, and then asked to provide feedback through a structured questionnaire assessing the interpretability, confidence, and overall satisfaction.

A total of five participants specialized in the field of AD, each with 6 to 10 years of experience, completed the questionnaire. Each case was accompanied by one of the following:

i.: SHAP-based clinical explanation only: Highlighting key clinical features.
ii.: Grad-CAM heatmaps only: Showing important brain regions in the MRI.
iii.: Feature-Augmented XAI explanation: Combining SHAP-based features, Grad-CAM heatmaps, and textual explanations.

While the expert validation in this study provides valuable insights, it is important to recognize the limitations imposed by the small sample size (n = 5). Since this study was designed as a pilot study, the main objective was to explore the feasibility and initial impressions of the proposed Feature-Augmented approach rather than to achieve statistical generalizability. The limited number of participants is primarily due to time constraints and the availability of experienced physicians willing to participate, as noted in similar research challenges [14]. Therefore, the results should be interpreted with caution, as they may not fully represent the views of the wider medical community.

Evaluation criteria: These criteria represent the evaluation instrument used to assess participants’ perceptions of the XAI outputs. The participants were asked to rate the model’s interpretability and usefulness based on these four dimensions:

i.: Clarity: How easy the interpretations were to understand.
ii.: Clinical relevance: The extent to which interpretations align with known biomarkers of AD.
iii.: Decision support: The extent to which interpretations contribute to clinical decision-making.
iv.: Confidence in AI: Participants’ degree of confidence in AI-assisted diagnosis after reviewing the interpretations.

A Likert scale (1–5) was used, and qualitative feedback was collected to gain deeper insights into participants’ perspectives.

4.3.2. Expert Validation Results

The main performance indicator employed in this phase was the trust score [52], which quantifies expert confidence in the model’s explanations. The model achieved a 100% trust score, with 20% of participants reporting full trust and 80% expressing conditional trust. Feedback from Alzheimer’s specialists confirmed its potential to support clinical decision-making, especially in contexts with limited access to specialists.

The proposed Feature-Augmented XAI demonstrated improved interpretability and clinical relevance by integrating SHAP for clinical features and Grad-CAM for MRI imaging, providing both visual evidence and quantitative feature contributions.

Figure 5 compares the three interpretation approaches: SHAP alone, Grad-CAM alone, and Feature-Augmented XAI. SHAP alone was found difficult to understand without additional explanation. Grad-CAM was visually understandable but insufficient for diagnostic confidence, while Feature-Augmented XAI improved both understanding and trust (Figure 6).

Figure 5. Comparison of expert understanding between SHAP, Grad-CAM, and the proposed Feature-Augmented approach in this study.

Figure 6. Comparison of expert trust levels across SHAP-Only, Grad-CAM-Only, and proposed Feature-Augmented explainability methods.

Trust means that the specialist can understand the interpretation and trust the diagnosis based on it. To calculate the trust score in this study, a direct measurement approach was adopted based on the experts’ responses. This method is consistent with the approach used by [52], who calculated the trust score as the average of self-reported confidence levels using a Likert scale ranging from 1 to 5. The formula for the direct trust score is defined as follows [52]:

T r u s t S c o r e = \frac{N u m b e r o f p a r t i c i p a n t s w h o t r u s t t h e m o d e l}{T o t a l n u m b e r o f p a r t i c i p a n t s} \times 100 %

(8)

Based on the participants’ responses, since all participants indicated trust, either directly or conditionally, the overall trust score for Feature-Augmented XAI is considered 100%, with a note that most users desire enhanced interpretability.

As shown in Table 2, all experts had more than ten years of experience and demonstrated familiarity with AD diagnosis and XAI, with varying degrees of confidence in the proposed model.

Table 2. Summary of expert participants’ background and confidence in XAI.

Key findings from the expert evaluations include the following:

All five experts agreed that integrating clinical, MMSE, and MRI data improves the diagnostic accuracy.
Four out of five agreed that AI models like ours help detect subtle disease patterns that are not easily visible through human interpretation.
All experts rated SHAP and Grad-CAM explanations as either “understandable” or “very understandable” and described them as useful for gaining insight into the model decisions.
Three out of five experts suggested that the textual explanations could be made clearer, especially when intended for non-expert users.
All five participants emphasized that such explainability tools should be used to support, not replace, human physicians and showed strong support for integrating these models into future clinical practice.

As noted in the comments, experts generally found the model explanations clear and helpful; however, some have suggested further simplifying the textual output for non-expert audiences. Suggested improvements include using simpler medical terminology, shorter sentences, and adjusting the level of detail to suit the needs of different clinical populations to ensure explanations are clear and accessible to a wider range of users.

4.3.3. Expert Reflections on Explanation and Clinical Relevance

In addition to the quantitative responses, experts provided open feedback. They confirmed that the visual interpretations—those focusing on the lateral ventricles—matched their clinical expectations for diagnosing AD. One expert noted that combining textual justifications with visual output “bridges the gap for non-specialists and supports clinical training.” Another expert highlighted the utility of SHAP outputs in “understanding the contribution of MMSE subitems, especially in borderline MCI cases.” A common theme among the responses was an appreciation for the clear segmentation between interpretations of clinical data and interpretations of imaging, which facilitated a more intuitive understanding of the decision-making process.

One expert raised a valuable point regarding the specificity of cognitive assessments, noting that “cognitive tests may be abnormal in depression or other psychiatric conditions.” This highlights the importance of considering differential diagnoses and reinforces the need for multimodal approaches that combine imaging and clinical data to reduce the potential for misclassification. Moreover, the experts agreed that these interpretive approaches are critical not only for enhancing confidence in AI models but also for promoting their adoption in multidisciplinary medical teams. These qualitative insights demonstrate the importance and usability of the Feature-Augmented XAI approach in the real world.

4.4. Discussion

In our previous study, we developed an improved Feature-Augmented XAI framework for AD detection, balancing high accuracy and ease of interpretation [42]. Our proposed model addresses limitations highlighted in prior works [15,16,35,53,54,55] by involving experts in validation, interpreting both clinical and imaging data, and avoiding overlap between explanation tools.

Addressing XAI Challenges Identified in Prior Work

Based on the limitations highlighted in [4,5,35], our model introduces several solutions that directly address existing gaps in XAI applications for AD diagnosis:

Involvement of medical experts: Specialists with over 10 years of experience evaluated the outputs through a structured questionnaire. The model achieved a 100% trust score, confirming improved confidence in AI decisions.
Integrating multiple modalities: SHAP was applied to clinical data and Grad-CAM to MRI slices, ensuring comprehensive modality-specific interpretations.
Reducing ambiguity from multiple XAI tools: Using several explainability frameworks on the same data can lead to contradictory outputs [35]. Our model avoids this issue by assigning SHAP exclusively to clinical data and Grad-CAM to image data, preventing interpretive overlap and maintaining clarity.
Focusing on specific MRI slices to enhance explainability: Our model restricts analysis to middle slices that show the lateral ventricles, which are well-known indicators of AD-related atrophy. This strategy enhances the explainability and clinical relevance of the visual output.
Formulating tailored explanations: Textual interpretations were adapted for physicians and medical students to improve the clarity and trust across user groups.

In addition to these advancements, it is important to consider how the mode of presenting explanations, whether SHAP-only, Grad-CAM-only, or the integrated Feature-Augmented approach, impacts clinicians’ cognitive load and decision-making speed, especially in busy clinical settings. While the current study focuses on accuracy and interpretability, future evaluations should incorporate usability testing to assess how sequential versus combined presentation of explanations affects physicians’ comprehension, mental effort, and diagnostic efficiency. Optimizing the delivery of explanations to minimize cognitive load has the potential to enhance user experience and facilitate wider adoption of the model in real-world clinical practice.

5. Conclusions

By integrating multi-modal data and relevant clinical characteristics, the Feature-Augmented approach addresses the “black box” problem and helps non-specialists make informed diagnostic decisions. The Feature-Augmented approach successfully achieved a balance between the model accuracy and explainability, an achievement that distinguishes our work from previous studies, which often sacrifice explainability for high accuracy or compromise accuracy to obtain clearer explanations. The Feature-Augmented approach lays the foundation for future developments in transparent AI in healthcare.

Expert feedback confirmed the transparency of the Feature-Augmented explanations, with a 100% trust score indicating strong confidence in the model’s decisions. The provided explanations were found to enhance understanding of the model predictions, facilitate cross-validation with medical knowledge, and increase confidence in the model decisions. The use of regions showing the lateral ventricles in the middle MRI slices, a key marker of AD progression, improved the agreement between Grad-CAM maps and known clinical markers of AD.

As a result, the key outputs of the Feature-Augmented approach were achieved: (1) enhanced explanation quality, (2) reliable clinical decision support, and (3) clear justification. Furthermore, textual explanations tailored to different audience categories have been praised for improving the explainability across user types.

Future Work

It is important to acknowledge that this study has some limitations, such as the small number of experts involved in the validation phase and reliance on a single data source; these factors affect the generalizability of the results.

Future work aims to incorporate data collected from local populations to validate the tool’s performance across diverse demographic and genetic backgrounds, thereby ensuring the broader generalizability of the results.

Moreover, in future work, multiple models could be trained, each focusing on a single region of interest for AD diagnosis on MRIs. These models could then be combined into a single model that offers multiple interpretations, with each one specific to the model responsible for the training region, and displays textual information for each region to support decision-making.

In addition, future development will explore ways to simplify the model’s textual explanations by using more accessible medical terminology and shorter sentences to meet the needs of different medical audiences, including non-specialists and medical students.

Further evaluations will include a broader range of healthcare professionals, such as general practitioners, nurses, and neurologists, as well as participants from diverse clinical backgrounds, to improve the generalizability and robustness of the conclusions and evaluate the tool’s performance across various clinical aspects and specialties. Future works will compare the Feature-Augmented approach with other XAI approaches using standardized explainability metrics and user satisfaction surveys to position it within the broader landscape of XAI in medicine.

Future work could also focus on deploying this approach within portable diagnostic devices to use in neurology departments and clinical decision support systems. Through these approaches, the Feature-Augmented approach aims to develop into a robust and widely applicable solution that bridges the gap between high-performance AI models and real-world clinical needs, particularly in resource-limited or high-stress settings.

Author Contributions

Conceptualization, F.H.A.-b.; Methodology, F.H.A.-b., W.M.Y.W.B. and M.N.A.-A.; Validation, W.M.Y.W.B. and M.N.A.-A.; Formal analysis, F.H.A.-b. and H.M.K.; Investigation, F.H.A.-b., W.M.Y.W.B., M.N.A.-A. and H.M.K.; Data curation, F.H.A.-b. and H.M.K.; Writing—original draft preparation, F.H.A.-b.; Writing—review and editing, F.H.A.-b., W.M.Y.W.B., M.N.A.-A., R.R.R.I., H.M.K., Y.S., U.K.A., N.M.Y., Z.A.A., Z.Z.A., S.A.A., A.A., A.F.N.A.R., H.R. and M.F.A.S.; Visualization, F.H.A.-b.; Supervision, W.M.Y.W.B.; Co-supervision and guidance on ensemble learning and XAI, M.N.A.-A.; Medical validation and clinical review, H.M.K. All authors have read and agreed to the published version of the manuscript.

Funding

Research Management Center (RMC), Multimedia University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The source code used to preprocess the data, train the models, and generate the diagnostic explanations is publicly available at the following GitHub repository (Python 3.10): https://github.com/F-H5/diagnosis_explanation_module (accessed on 15 May 2025). The data used in this study were collected through a structured survey of five experts. The survey responses were used solely for research purposes. Due to privacy and ethical considerations, the raw survey data is not publicly available. However, aggregated data and analysis results are available upon reasonable requests from the corresponding author.

Acknowledgments

The authors would like to thank the Optimas Research Group, Center for Advanced Computing Technology (C-ACT), Fakulti Kecerdasan Buatan dan Keselamatan Siber (FAIX), Fakulti Teknologi Maklumat dan Komunikasi (FTMK), and the Centre for Research and Innovation Management (CRIM), Universiti Teknikal Malaysia Melaka (UTeM), for providing the facilities and support for this research activities. The authors also thank the Research Management Center (RMC), Multimedia University, for their valuable support in this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Fabrizio, C.; Termine, A.; Caltagirone, C.; Sancesario, G. Artificial Intelligence for Alzheimer’s Disease: Promise or Challenge? Diagnostics 2021, 11, 1473. [Google Scholar] [CrossRef]
Vrahatis, A.G.; Skolariki, K.; Krokidis, M.G.; Lazaros, K.; Exarchos, T.P.; Vlamos, P. Revolutionizing the Early Detection of Alzheimer’s Disease through Non-Invasive Biomarkers: The Role of Artificial Intelligence and Deep Learning. Sensors 2023, 23, 4184. [Google Scholar] [CrossRef]
Majd, S.; Power, J.; Majd, Z.; Majd, S.; Power, J.; Majd, Z. Alzheimer’s Disease and Cancer: When Two Monsters Cannot Be Together. Front. Neurosci. 2019, 13, 155. [Google Scholar] [CrossRef]
Khan, P.; Kader, F.; Islam, S.M.R.; Rahman, A.B.; Kamal, S.; Toha, M.U.; Kwak, K.-S. Machine Learning and Deep Learning Approaches for Brain Disease Diagnosis: Principles and Recent Advances. IEEE Access 2021, 9, 37622–37655. [Google Scholar] [CrossRef]
Garcia, S.d.l.F.; Ritchie, C.W.; Luz, S. Artificial Intelligence, Speech, and Language Processing Approaches to Monitoring Alzheimer’s Disease: A Systematic Review. J. Alzheimer’s Dis. 2020, 78, 1547–1574. [Google Scholar] [CrossRef]
Ben Hassen, S.; Neji, M.; Hussain, Z.; Hussain, A.; Alimi, A.M.; Frikha, M. Deep learning methods for early detection of Alzheimer’s disease using structural MR images: A survey. Neurocomputing 2024, 576, 127325. [Google Scholar] [CrossRef]
Zakaria, M.M.A.; Doheir, M.; Akmaliah, N.; Yaacob, N.B.M. Infinite potential of AI chatbots: Enhancing user experiences and driving business transformation in e-commerce: Case of Palestinian e-commerce. J. Ecohumanism 2024, 3, 216–229. [Google Scholar] [CrossRef]
Villain, N.; Fouquet, M.; Baron, J.-C.; Mézenge, F.; Landeau, B.; de La Sayette, V.; Viader, F.; Eustache, F.; Desgranges, B.; Chételat, G. Sequential relationships between grey matter and white matter atrophy and brain metabolic abnormalities in early Alzheimer’s disease. Brain 2010, 133, 3301–3314. [Google Scholar] [CrossRef] [PubMed]
Battista, P.; Salvatore, C.; Berlingeri, M.; Cerasa, A.; Castiglioni, I. Artificial intelligence and neuropsychological measures: The case of Alzheimer’s disease. Neurosci. Biobehav. Rev. 2020, 114, 211–228. [Google Scholar] [CrossRef] [PubMed]
Al Olaimat, M.; Martinez, J.; Saeed, F.; Bozdag, S.; Initiative, A.D.N. PPAD: A deep learning architecture to predict progression of Alzheimer’s disease. Bioinformatics 2023, 39, i149–i157. [Google Scholar] [CrossRef]
Arafa, D.A.; Moustafa, H.E.-D.; Ali-Eldin, A.M.T.; Ali, H.A. Early detection of Alzheimer’s disease based on the state-of-the-art deep learning approach: A comprehensive survey. Multimedia Tools Appl. 2022, 81, 23735–23776. [Google Scholar] [CrossRef]
Martínez-Murcia, F.; Górriz, J.; Ramírez, J.; Puntonet, C.; Salas-González, D. Computer Aided Diagnosis tool for Alzheimer’s Disease based on Mann–Whitney–Wilcoxon U-Test. Expert Syst. Appl. 2012, 39, 9676–9685. [Google Scholar] [CrossRef]
Woodbright, M.D.; Morshed, A.; Browne, M.; Ray, B.; Moore, S. Toward Transparent AI for Neurological Disorders: A Feature Extraction and Relevance Analysis Framework. IEEE Access 2024, 12, 37731–37743. [Google Scholar] [CrossRef]
Saif, F.H.; Al-Andoli, M.N.; Bejuri, W.M.Y.W. Explainable AI for Alzheimer Detection: A Review of Current Methods and Applications. Appl. Sci. 2024, 14, 10121. [Google Scholar] [CrossRef]
Bazarbekov, I.; Razaque, A.; Ipalakova, M.; Yoo, J.; Assipova, Z.; Almisreb, A. A review of artificial intelligence methods for Alzheimer’s disease diagnosis: Insights from neuroimaging to sensor data analysis. Biomed. Signal Process. Control 2024, 92, 106023. [Google Scholar] [CrossRef]
Arya, A.D.; Verma, S.S.; Chakarabarti, P.; Chakrabarti, T.; Elngar, A.A.; Kamali, A.-M.; Nami, M. A systematic review on machine learning and deep learning techniques in the effective diagnosis of Alzheimer’s disease. Brain Inform. 2023, 10, 17. [Google Scholar] [CrossRef] [PubMed]
Abadir, P.; Oh, E.; Chellappa, R.; Choudhry, N.; Demiris, G.; Ganesan, D.; Karlawish, J.; Marlin, B.; Li, R.M.; Dehak, N.; et al. Artificial Intelligence and Technology Collaboratories: Innovating aging research and Alzheimer’s care. Alzheimer’s Dement. 2024, 20, 3074–3079. [Google Scholar] [CrossRef]
Fujita, K.; Katsuki, M.; Takasu, A.; Kitajima, A.; Shimazu, T.; Maruki, Y. Development of an artificial intelligence-based diagnostic model for Alzheimer’s disease. Aging Med. 2022, 5, 167–173. [Google Scholar] [CrossRef]
Bordin, V.; Coluzzi, D.; Rivolta, M.W.; Baselli, G. Explainable AI Points to White Matter Hyperintensities for Alzheimer’s Disease Identification: A Preliminary Study. In Proceedings of the 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Glasgow, UK, 11–15 July 2022. [Google Scholar] [CrossRef]
Angelov, P.P.; Soares, E.A.; Jiang, R.; Arnold, N.I.; Atkinson, P.M. Explainable artificial intelligence: An analytical review. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2021, 11, e1424. [Google Scholar] [CrossRef]
Kamal, M.S.; Chowdhury, L.; Nimmy, S.F.; Rafi, T.H.H.; Chae, D.-K. An Interpretable Framework for Identifying Cerebral Microbleeds and Alzheimer’s Disease Severity using Multimodal Data. In Proceedings of the 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia, 24–27 July 2023. [Google Scholar] [CrossRef]
Yousefzadeh, N.; Tran, C.; Ramirez-Zamora, A.; Chen, J.; Fang, R.; Thai, M.T. Neuron-level explainable AI for Alzheimer’s Disease assessment from fundus images. Sci. Rep. 2024, 14, 7710. [Google Scholar] [CrossRef]
Jahan, S.; Abu Taher, K.; Kaiser, M.S.; Mahmud, M.; Rahman, S.; Hosen, A.S.M.S.; Ra, I.-H.; Mridha, M.F. Explainable AI-based Alzheimer’s prediction and management using multimodal data. PLoS ONE 2023, 18, e0294253. [Google Scholar] [CrossRef]
Achilleos, K.G.; Leandrou, S.; Prentzas, N.; Kyriacou, P.A.; Kakas, A.C.; Pattichis, C.S. Extracting Explainable Assessments of Alzheimer’s disease via Machine Learning on brain MRI imaging data. In Proceedings of the 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE), Cincinnati, OH, USA, 26–28 October 2020. [Google Scholar] [CrossRef]
Guan, H.; Wang, C.; Cheng, J.; Jing, J.; Liu, T. A parallel attention-augmented bilinear network for early magnetic resonance imaging-based diagnosis of Alzheimer’s disease. Hum. Brain Mapp. 2022, 43, 760–772. [Google Scholar] [CrossRef]
Yilmaz, D. Development and Evaluation of an Explainable Diagnostic AI for Alzheimer’s Disease. In Proceedings of the 2023 International Conference on Artificial Intelligence Science and Applications in Industry and Society (CAISAIS), Galala, Egypt, 3–5 September 2023. [Google Scholar] [CrossRef]
Mansouri, D.; Echtioui, A.; Khemakhem, R.; Hamida, A.B. Explainable AI Framework for Alzheimer’s Diagnosis Using Convolutional Neural Networks. In Proceedings of the 2024 IEEE 7th International Conference on Advanced Technologies, Signal and Image Processing (ATSIP), Sousse, Tunisia, 11–13 July 2024. [Google Scholar] [CrossRef]
Shad, H.A.; Rahman, Q.A.; Asad, N.B.; Bakshi, A.Z.; Mursalin, S.; Reza, T.; Parvez, M.Z. Exploring Alzheimer’s Disease Prediction with XAI in various Neural Network Models. In Proceedings of the TENCON 2021-2021 IEEE Region 10 Conference (TENCON), Auckland, New Zealand, 7–10 December 2021. [Google Scholar] [CrossRef]
Tima, J.; Wiratkasem, C.; Chairuean, W.; Padongkit, P.; Pangkhiao, K.; Pikulkaew, K. Early Detection of Alzheimer’s Disease: A Deep Learning Approach for Accurate Diagnosis. In Proceedings of the 2024 21st International Joint Conference on Computer Science and Software Engineering (JCSSE), Phuket, Thailand, 19–22 June 2024. [Google Scholar] [CrossRef]
Haddada, K.; Khedher, M.I.; Jemai, O.; Khedher, S.I.; El-Yacoubi, M.A. Assessing the Interpretability of Machine Learning Models in Early Detection of Alzheimer’s Disease. In Proceedings of the 2024 16th International Conference on Human System Interaction (HSI), Paris, France, 8–11 July 2024. [Google Scholar] [CrossRef]
Alarjani, M. Alzheimer’s Disease Detection based on Brain Signals using Computational Modeling. In Proceedings of the 2024 Seventh International Women in Data Science Conference at Prince Sultan University (WiDS PSU), Riyadh, Saudi Arabia, 3–4 March 2024. [Google Scholar] [CrossRef]
Alvarado, M.; Gómez, D.; Nuñez, A.; Robles, A.; Marecos, H.; Ticona, W. Implementation of an Early Detection System for Neurodegenerative Diseases Through the use of Artificial Intelligence. In Proceedings of the 2023 IEEE XXX International Conference on Electronics, Electrical Engineering and Computing (INTERCON), Lima, Peru, 2–4 November 2023. [Google Scholar] [CrossRef]
Bloch, L.; Friedrich, C.M.; Alzheimer’s Disease Neuroimaging Initiative. Machine Learning Workflow to Explain Black-box Models for Early Alzheimer’s Disease Classification Evaluated for Multiple Datasets. SN Comput. Sci. 2022, 3, 509. [Google Scholar] [CrossRef]
Deshmukh, A.; Kallivalappil, N.; D’souza, K.; Kadam, C. AL-XAI-MERS: Unveiling Alzheimer’s Mysteries with Explainable AI. In Proceedings of the 2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE), Vellore, India, 22–23 February 2024. [Google Scholar] [CrossRef]
Viswan, V.; Shaffi, N.; Mahmud, M.; Subramanian, K.; Hajamohideen, F. Explainable Artificial Intelligence in Alzheimer’s Disease Classification: A Systematic Review. Cogn. Comput. 2023, 16, 1–44. [Google Scholar] [CrossRef]
Umeda-Kameyama, Y.; Kameyama, M.; Tanaka, T.; Son, B.-K.; Kojima, T.; Fukasawa, M.; Iizuka, T.; Ogawa, S.; Iijima, K.; Akishita, M. Screening of Alzheimer’s disease by facial complexion using artificial intelligence. Aging 2021, 13, 1765–1772. [Google Scholar] [CrossRef]
Nguyen, K.; Nguyen, M.; Dang, K.; Pham, B.; Huynh, V.; Vo, T.; Ngo, L.; Ha, H. Early Alzheimer’s disease diagnosis using an XG-Boost model applied to MRI images. Biomed. Res. Ther. 2023, 10, 5896–5911. [Google Scholar] [CrossRef]
AlMohimeed, A.; Saad, R.M.A.; Mostafa, S.; El-Rashidy, N.M.; Farrag, S.; Gaballah, A.; Elaziz, M.A.; El-Sappagh, S.; Saleh, H. Explainable Artificial Intelligence of Multi-Level Stacking Ensemble for Detection of Alzheimer’s Disease Based on Particle Swarm Optimization and the Sub-Scores of Cognitive Biomarkers. IEEE Access 2023, 11, 123173–123193. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, T.; Lanfranchi, V.; Yang, P. Explainable Tensor Multi-Task Ensemble Learning Based on Brain Structure Variation for Alzheimer’s Disease Dynamic Prediction. IEEE J. Transl. Eng. Heal. Med. 2022, 11, 1–12. [Google Scholar] [CrossRef]
Syed, M.R.; Kothari, N.; Joshi, Y.; Gawade, A. Eadda: Towards Novel and Explainable Deep Learning for Early Alzheimer’s Disease Diagnosis Using Autoencoders. Int. J. Intell. Syst. Appl. Eng. 2023, 11, 234–246. [Google Scholar]
Qu, Y.; Wang, P.; Liu, B.; Song, C.; Wang, D.; Yang, H.; Zhang, Z.; Chen, P.; Kang, X.; Du, K.; et al. AI4AD: Artificial intelligence analysis for Alzheimer’s disease classification based on a multisite DTI database. Brain Disord. 2021, 1, 100005. [Google Scholar] [CrossRef]
Al-Bakri, F.H.; Bejuri, W.M.Y.W.; Al-Andoli, M.N.; Ikram, R.R.R.; Khor, H.M.; Tahir, Z.; Initiative, T.A.D.N. A Meta-Learning-Based Ensemble Model for Explainable Alzheimer’s Disease Diagnosis. Diagnostics 2025, 15, 1642. [Google Scholar] [CrossRef]
Ganaie, M.; Hu, M.; Malik, A.; Tanveer, M.; Suganthan, P. Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 2022, 115, 105151. [Google Scholar] [CrossRef]
Al-Andoli, M.; Cheah, W.P.; Tan, S.C. Deep learning-based community detection in complex networks with network partitioning and reduction of trainable parameters. J. Ambient. Intell. Humaniz. Comput. 2020, 12, 2527–2545. [Google Scholar] [CrossRef]
Rashmi, U.; Singh, T.; Ambesange, S. MRI image based Ensemble Voting Classifier for Alzheimer’s Disease Classification with Explainable AI Technique. In Proceedings of the 2023 IEEE 8th International Conference for Convergence in Technology (I2CT), Lonavla, India, 7–9 April 2023. [Google Scholar] [CrossRef]
Al-Andoli, M.N.; Tan, S.C.; Cheah, W.P. Distributed parallel deep learning with a hybrid backpropagation-particle swarm optimization for community detection in large complex networks. Inf. Sci. 2022, 600, 94–117. [Google Scholar] [CrossRef]
Al-Andoli, M.N.; Tan, S.C.; Sim, K.S.; Lim, C.P.; Goh, P.Y. Parallel Deep Learning with a hybrid BP-PSO framework for feature extraction and malware classification. Appl. Soft Comput. 2022, 131, 109756. [Google Scholar] [CrossRef]
Ravikiran, H.K.; Deepak, R.; Deepak, H.A.; Prapulla Kumar, M.S.; Sharath, S.; Yogeesh, G.H. A robust framework for Alzheimer’s disease detection and staging: Incorporating multi-feature integration, MRMR feature selection, and Random Forest classification. Multimed. Tools Appl. 2024, 84, 24903–24931. [Google Scholar] [CrossRef]
Uddin, K.M.M.; Alam, M.J.; Anawar, J.E.; Uddin, A.; Aryal, S. A Novel Approach Utilizing Machine Learning for the Early Diagnosis of Alzheimer’s Disease. Biomed. Mater. Devices 2023, 1, 882–898. [Google Scholar] [CrossRef]
Menagadevi, M.; Mangai, S.; Madian, N.; Thiyagarajan, D. Automated prediction system for Alzheimer detection based on deep residual autoencoder and support vector machine. Optik 2022, 272, 170212. [Google Scholar] [CrossRef]
van der Waa, J.; Nieuwburg, E.; Cremers, A.; Neerincx, M. Evaluating XAI: A comparison of rule-based and example-based explanations. Artif. Intell. 2021, 291, 103404. [Google Scholar] [CrossRef]
Rong, Y.; Castner, N.; Bozkir, E.; Kasneci, E. User Trust on an Explainable AI-based Medical Diagnosis Support System. arXiv 2022, arXiv:2204.12230. [Google Scholar] [CrossRef]
Vimbi, V.; Shaffi, N.; Mahmud, M. Interpreting artificial intelligence models: A systematic review on the application of LIME and SHAP in Alzheimer’s disease detection. Brain Inform. 2024, 11, 10. [Google Scholar] [CrossRef] [PubMed]
González-Alday, R.; García-Cuesta, E.; Kulikowski, C.A.; Maojo, V. A Scoping Review on the Progress, Applicability, and Future of Explainable Artificial Intelligence in Medicine. Appl. Sci. 2023, 13, 10778. [Google Scholar] [CrossRef]
Yang, K.; Mohammed, E.A. A Review of Artificial Intelligence Technologies for Early Prediction of Alzheimer’s Disease. arXiv 2020, arXiv:2101.01781. [Google Scholar] [CrossRef]

Figure 1. Conceptual comparison of standard and proposed XAI feature extraction approaches for clinical decision support.

Figure 2. Classification of explanation methods used in the proposed Feature-Augmented approach.

Figure 3. Block diagram of the design, implementation, and evaluation stages for the Feature-Augmented approach in this study.

Figure 4. Grad-CAM heatmap showing the highlighted lateral ventricles in a mid-slice MRI; this was produced by the proposed Feature-Augmented XAI approach to explain how the diagnosis of AD was made.

Figure 5. Comparison of expert understanding between SHAP, Grad-CAM, and the proposed Feature-Augmented approach in this study.

Figure 6. Comparison of expert trust levels across SHAP-Only, Grad-CAM-Only, and proposed Feature-Augmented explainability methods.

Table 1. Overview of the research phases, methodology, and evaluation for the Feature-Augmented approach in this study.

Phase	Subject Matter	Methodology	Measure of Performance	Result Validation
Phase 1	Enhanced XAI (Feature-Augmented Design)	Develop an ensemble meta-model using mid-slice MRI and clinical data; apply SHAP and Grad-CAM for explainability	Accuracy, recall, precision, and F1-score	Compared against standard XAI models; validated using ADNI and OASIS datasets
Phase 2	Integration of Tabular and Image Features	Extract predefined clinical features (e.g., MMSE, age); train a hybrid model with SHAP for interpretation	Explanation coverage and clinical explainability	Measured improvements in classification, accuracy, and explanation quality
Phase 3	Limitations of Current XAI Techniques	Literature gap analysis; compare SHAP-only and Grad-CAM-only models; evaluate enhanced model	Usability score, explanation clarity, and time to understand	Theoretical and practical comparison; discuss model-agnostic vs. model-specific XAI limitations
Phase 4	Evaluation of Explanation Quality and Performance	Collect user and expert feedback via surveys to assess clarity, usefulness, and trust in explanations	Trust, clarity, and usefulness	Validated through structured feedback from experts

Table 2. Summary of expert participants’ background and confidence in XAI.

Expert	Years of Experience	Familiarity with Alzheimer’s Diagnosis	Familiarity with XAI in Medicine	Confidence in XAI
Person 1	More than 10 years	Excellent knowledge	Good	Strongly Agree
Person 2	More than 10 years	Excellent knowledge	Moderate	Agree
Person 3	More than 10 years	Excellent knowledge	Excellent	Strongly Agree
Person 4	6–10 years	Good knowledge	Moderate	Agree
Person 5	6–10 years	Good knowledge	Moderate	Neutral

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.