Towards Transparent Diabetes Prediction: Combining AutoML and Explainable AI for Improved Clinical Insights
Abstract
:1. Introduction
- Evaluate the performance of AutoML models enhanced with XAI techniques, such as SHAP and LIME, for accurate and interpretable diabetes risk prediction.
- Ensure the model’s robustness and applicability across diverse populations through feature engineering, cross-validation, and data augmentation.
- Provide global and local interpretability using techniques like SHAP, LIME, IG, and CA.
- Leverage AutoGluon’s ensemble capabilities to optimize model configurations, balancing accuracy, robustness, and computational efficiency.
- Visualize global and local feature importance.
- Understand model predictions for individual patients.
- Explore hypothetical scenarios through CA for personalized interventions.
- By integrating AutoML with XAI techniques like SHAP, LIME, IG, and CA, it addresses the dual needs of predictive accuracy and interpretability, essential for clinical adoption.
- The interactive application provides clinicians with an intuitive platform for exploring and interpreting model predictions, enhancing usability and trust.
- Through advanced feature engineering and validation across diverse datasets, the model demonstrates strong generalization capabilities, making it suitable for deployment in various healthcare contexts.
2. Literature Review
2.1. Support for AutoML in Diabetes Prediction
2.2. Limitations of AutoML
2.3. Alternative Approaches to Improve Interpretability
Comparison of Existing Studies
2.4. Key Findings and Gaps
2.5. Gaps Analysis
3. Methodology
3.1. Dataset Preparation and Data Splitting
3.1.1. Features and Target Variable
3.1.2. Train-Test Split
3.1.3. Generalization Techniques
- Feature Engineering: Features like Glucose and BMI were standardized, and categorical encoding was applied where necessary. These transformations aimed to reduce the impact of feature scale disparities on model generalization [26].
- Data Augmentation: Synthetic examples were generated for underrepresented classes, helping to balance the dataset and improve generalizability [27].
- Cross-Validation: A 5-fold stratified cross-validation was implemented, preserving class distributions across each fold [28]. This ensured that the model was evaluated on multiple data splits, reducing variance and enhancing the robustness of results.
3.2. Model Training and Architecture with AutoGluon
3.2.1. AutoGluon Presets and Configuration
- Preset: The best_quality preset was chosen to prioritize model accuracy. This preset automatically configures training parameters to optimize performance, though it may require longer computation times.
- Time Limit: The training process was capped at 600 s. This constraint helped to manage computational resources while allowing enough time to explore a variety of model architectures and combinations.
3.2.2. Neural Network Architecture
- Input Layer: Eight neurons representing the dataset features—Pregnancies, Glucose, Blood Pressure, Skin Thickness, Insulin, BMI, Diabetes Pedigree Function, and Age.
- Hidden Layers:
- ○
- Hidden Layer 1: 64 neurons with ReLU activation.
- ○
- Hidden Layer 2: 32 neurons with ReLU activation and a dropout rate of 0.5 to mitigate overfitting.
- ○
- Hidden Layer 3: 16 neurons with ReLU activation and a dropout rate of 0.5 to mitigate overfitting.
- Output Layer: A single neuron with a sigmoid activation function to predict the binary outcome (diabetes risk).
3.2.3. Stacking, Bagging, and Regularization Techniques
- Level 1 Models: Base models included Random Forest, CatBoost, LightGBM, XGBoost, and neural networks. AutoGluon automatically selects and tunes the optimal neural network architecture through an automated search process. The framework tests various configurations, including different types of neural network architectures, to identify the best model for the dataset. AutoGluon also optimizes hyperparameters such as the number of layers, the number of neurons per layer, activation functions, and regularization techniques, including dropout. The optimal dropout rate is selected dynamically as part of the hyperparameter optimization process, along with other relevant parameters to prevent overfitting and ensure model generalization.
- Level 2 Ensemble: The Level 2 model layer took predictions from Level 1 models as additional features. Combining this information reduced variance, contributing to robust generalization.
- Bagging: Dynamic bagging involved creating multiple data splits to reduce variance, further aiding in model stability and resilience on unseen data [29].
3.2.4. Integration of AutoML, XAI Techniques, and Streamlit Application
- SHAP provides global and local explanations for feature importance, quantifying each feature’s contribution to predictions. This technique identifies critical predictors, such as glucose and BMI, that influence diabetes risk, offering insights aligned with clinical knowledge.
- LIME focuses on individual predictions, creating localized surrogate models to explain why specific predictions were made. This enables healthcare professionals to understand patient-specific factors affecting the model’s output.
- IG quantifies the contribution of each feature to a specific prediction by comparing model output differences between baseline and actual feature values. This approach provides deeper insights into how features like glucose and BMI influence risk predictions.
- The AM assigns weights to features, highlighting those most relevant during the model’s decision-making process. This dynamic feature prioritization adds another layer of interpretability, ensuring predictions are understandable and actionable.
- CA explores hypothetical scenarios, such as how reducing a patient’s BMI might alter their diabetes risk. This method is particularly valuable for supporting personalized interventions and care planning.
- Visualization of global feature importance to understand overall model behavior.
- Case-specific interpretation of predictions using LIME and SHAP.
- Exploration of hypothetical scenarios through Counterfactual Analysis.
3.3. Implementation of Explainable AI Techniques
- LIME is utilized to provide local interpretability of individual predictions, enabling clinicians to understand why certain predictions are made [32].
- CA was performed to illustrate how changes in feature values impact model predictions, aiding clinicians in understanding the model’s decision-making logic [33]. Building upon [3], who focused on biomarker modifications to reduce diabetes risk, we adopt a comprehensive counterfactual framework. Unlike [3] methodology, which prioritized minimal feature adjustments, our approach leverages AutoML for model optimization and integrates various XAI methods to enhance interpretability. This combined approach ensures actionable and generalizable insights for clinicians.
- IG and AM: Additionally, IG and the AM are used to provide deeper insights into the model’s decision-making process [34,35]. IG quantifies the contribution of each feature to a specific prediction, while the AM highlights which features the model focuses on most when making predictions. These techniques offer both global and local interpretability, reinforcing the model’s transparency and trustworthiness.
3.4. Evaluation Metrics
- Accuracy, Precision, and Recall: These metrics provided a well-rounded assessment of model correctness and relevance.
- F1 Score: To balance precision and recall, the F1 score offered insight into the model’s overall performance on imbalanced data.
- AUC-ROC Curve: The ROC-AUC metric was used to assess the model’s discriminative power across thresholds [36].
- Balanced Accuracy and MCC: Balanced accuracy and the Matthews Correlation Coefficient (MCC) were included to provide a clearer measure of performance on imbalanced data [37].
- Cross-Validation Stability: 5-fold stratified cross-validation results were analyzed for low variance across folds, confirming that the model’s performance was stable and generalizable.
3.5. Ethical Considerations in AI for Healthcare
4. Results
4.1. Model Performance
4.2. Generalization
4.2.1. Dataset Preparation and Feature Engineering
- ‘s1’ or ‘stab.glu’ → Glucose
- ‘s2’ or ‘hdl’ → Skin Thickness
- BMI Calculation:
- Blood Pressure: Calculated as the average of systolic and diastolic readings.
- Diabetes Pedigree Function (DPF):
- Insulin Estimation:
- Positive for diabetes: glyhb ≥ 6.5 (Outcome = 1)
- Negative: glyhb < 6.5 (Outcome = 0)
4.2.2. Generalization Performance
4.2.3. Calculating Final Accuracy
- Pima Indian Diabetes Test Set: Accuracy = 76.62%, reflecting performance on the primary dataset used for model training.
- Scikit-learn Diabetes Dataset: Accuracy = 78.65%, showing generalizability to an alternative dataset.
- Rural African-American Dataset: Accuracy = 91.36%, highlighting adaptability to a demographically different population.
4.3. Cross-Validation Results
4.4. Model Leaderboard and Ensemble Comparison
4.5. Enhancing Model Transparency Through XAI Techniques
4.5.1. SHAP Analysis for Global and Local Interpretability
4.5.2. LIME Analysis for Patient-Specific Interpretations
4.5.3. Counterfactual Instance Analysis
- Original Instance:
- ○
- Features:
- ▪
- Pregnancies: 6
- ▪
- Glucose: 148
- ▪
- Blood Pressure: 72
- ▪
- Skin Thickness: 35
- ▪
- Insulin: 0
- ▪
- BMI: 33.6
- ▪
- Diabetes Pedigree Function: 0.627
- ▪
- Age: 50
- Counterfactual Instance:
- ○
- Features:
- ▪
- Pregnancies: 6.000
- ▪
- Glucose: 148.000
- ▪
- Blood Pressure: 72.000
- ▪
- Skin Thickness: 35.000
- ▪
- Insulin: 0.000
- ▪
- BMI: 33.600
- ▪
- Diabetes Pedigree Function: 0.627
- ▪
- Age: 50.000
4.5.4. Feature Attribution Using Integrated Gradients
4.5.5. Dynamic Feature Prioritization via Attention Mechanism
4.5.6. Interactive Visualization of Predictions Through Streamlit
- Analyze global feature importance with SHAP.
- Explore patient-specific explanations using LIME.
- Simulate hypothetical scenarios with CA.
4.6. Confusion Matrix for Prediction Performance
5. Discussion
5.1. Model Performance and Classification Metrics
5.1.1. Sensitivity and Specificity
5.1.2. Confusion Matrix Insights
5.2. Model Stability and Cross-Validation
Low Variance in Cross-Validation
5.3. Leaderboard Insights and Ensemble Model Efficacy
Comparative Model Performance
5.4. Insights into Prediction Transparency with XAI
5.4.1. Global and Local Interpretability
5.4.2. LIME Analysis
5.4.3. Counterfactual Analysis
5.4.4. Quantifying Prediction Contributions with Integrated Gradients
5.4.5. Personalized Prediction Insights Using Attention Mechanism
5.5. Comparison with Similar Studies
- Existing studies rely on manual model selection, which requires significant expertise and may introduce bias in choosing algorithms. By integrating AutoML, our approach automates model development, ensuring optimal performance across datasets while reducing the technical barriers to implementing machine learning in healthcare.
- While SHAP and LIME provide robust global and local interpretability, CA adds a new dimension by enabling clinicians to simulate how changes in specific features (e.g., glucose or BMI) might alter outcomes. This capability supports personalized, preventative care strategies, which are less explored in previous works.
- IG and the AM provide a holistic view of the model’s decision-making, making it easier for clinicians to interpret and trust the model’s predictions. AM reveals the features most emphasized by the model, while IG quantifies their individual contributions.
- Our Streamlit application bridges the gap between machine learning advancements and clinical usability, offering a practical, accessible interface for healthcare professionals to interpret predictions and act on them in real-time. This aspect of our work emphasizes the need for clinician-friendly tools, which is often missing in theoretical studies.
5.6. Methodological Innovations
- Traditional machine learning workflows rely heavily on manual model selection, which can be time-consuming and require significant expertise. By automating the selection and optimization process through AutoML, this study ensures robust performance while democratizing access to advanced machine learning techniques.
- In addition to SHAP and LIME, which provide global and local interpretability, the inclusion of CA offers a novel approach to understanding model predictions. By allowing users to explore how minor adjustments in patient features affect outcomes, CA supports individualized treatment planning, an aspect that is underexplored in previous studies.
- Unlike purely theoretical approaches, this study bridges the gap between machine learning and real-world healthcare applications by providing a user-friendly tool for clinicians. The application integrates predictive insights and interpretability methods, making it accessible and actionable for non-technical users.
5.7. Practical Implications for Diabetes Prediction
5.7.1. Transparency for Clinical Decision-Making
5.7.2. Actionable Insights for Personalized Care
5.7.3. Using Counterfactual Analysis for Tailored Interventions
5.7.4. Shared Decision-Making: Enhancing Patient-Clinician Communication
5.8. Limitations and Areas for Improvement
5.8.1. Limitations in Interpretability Techniques
5.8.2. Model Configuration Constraints
5.8.3. Future Directions
5.9. Key Contributions
- Unlike previous studies that focus solely on either predictive accuracy or interpretability, this research simultaneously addresses both challenges by combining AutoML with XAI techniques like SHAP, LIME, and CA. This integration improves model transparency while maintaining high prediction accuracy, which is essential for clinical adoption. Our approach demonstrates that AutoML can not only automate the model development process but also produce interpretable models, a crucial aspect for healthcare applications.
- A significant contribution of this work is the development of a Streamlit-based application, which allows clinicians to interact with the model, explore predictions, and interpret the importance of different features in real time. This tool bridges the gap between advanced machine learning techniques and real-world healthcare applications, making AI more accessible to healthcare professionals without machine learning expertise.
- Our model demonstrates strong generalization capabilities, achieved through data augmentation, feature engineering, and cross-validation. This robustness ensures that the model performs consistently across diverse patient populations and datasets, addressing a key limitation of many existing diabetes prediction models that struggle with generalization.
- With SHAP and LIME, we provide clinically actionable insights into model predictions. SHAP analysis offers global insights into feature importance, while LIME provides localized, case-by-case explanations. This interpretability is essential for healthcare professionals to make informed decisions based on model predictions and ensures the AI system can be trusted in a clinical context.
- When compared to prior studies, our model achieves competitive performance while addressing critical issues of transparency and interpretability. Table 1 compares the accuracy and other evaluation metrics of our model with those of leading studies in diabetes prediction. For example, while some studies like Tasin et al. [38] achieved higher accuracy using XGBoost and SMOTE techniques, they did not provide the same level of interpretability through XAI methods. In contrast, our study prioritizes both performance and transparency, ensuring that the AI model can be reliably used in clinical settings without sacrificing accuracy.
- By comparing the results of our model with those from other studies, we see that while our accuracy 78.8% with generalization is competitive, our key contribution lies in the combination of predictive accuracy and interpretability. Prior studies, such as [15,38], achieved high accuracy but lacked the level of transparency that our model offers through XAI methods like SHAP and LIME. This dual focus on performance and interpretability is what sets our work apart and advances the field.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
List of Abbreviations
AI | Artificial Intelligence |
AutoML | Automated Machine Learning |
BMI | Body Mass Index |
F1-Score | F1 Score (Harmonic Mean of Precision and Recall) |
MCC | Matthews Correlation Coefficient |
ML | Machine Learning |
ROC-AUC | Receiver Operating Characteristic—Area Under Curve |
SHAP | SHapley Additive exPlanations |
XAI | Explainable Artificial Intelligence |
SVM | Support Vector Machine |
GDPR | General Data Protection Regulation |
LDL | Low-Density Lipoproteins |
HDL | High-Density Lipoproteins |
LIME | Local Interpretable Model-Agnostic Explanations |
References
- Jakka, A.; Vakula Rani, J. An Explainable AI Approach for Diabetes Prediction. Innov. Comput. Sci. Eng. 2023, 565, 15–25. [Google Scholar] [CrossRef]
- Zhao, Y.; Chaw, J.K.; Ang, M.C.; Daud, M.M.; Liu, L. A Diabetes Prediction Model with Visualized Explainable Artificial Intelligence (XAI) Technology. Adv. Vis. Inform. 2023, 14322, 648–661. [Google Scholar] [CrossRef]
- Lenatti, M.; Carlevaro, A.; Guergachi, A.; Keshavjee, K.; Mongelli, M.; Paglialonga, A. A novel method to derive personalized minimum viable recommendations for type 2 diabetes prevention based on counterfactual explanations. PLoS ONE 2022, 17, e0272825. [Google Scholar] [CrossRef] [PubMed]
- Waring, J.; Lindvall, C.; Umeton, R. Automated machine learning: Review of the state-of-the-art and opportunities for healthcare. Artif. Intell. Med. 2020, 104, 101822. [Google Scholar] [CrossRef] [PubMed]
- van der Schaar, M. AutoML and Interpretability: Powering the Machine Learning Revolution in Healthcare. In Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference, Virtual, 19–20 October 2020. [Google Scholar] [CrossRef]
- Mustafa, A.; Rahimi Azghadi, M. Automated Machine Learning for Healthcare and Clinical Notes Analysis. Computers 2021, 10, 24. [Google Scholar] [CrossRef]
- Thirunavukarasu, A.J.; Elangovan, K.; Gutierrez, L.; Li, Y.; Tan, I.; Keane, P.A.; Korot, E.; Ting, D.S.W. Democratizing Artificial Intelligence Imaging Analysis With Automated Machine Learning: Tutorial. J. Med. Internet Res. 2023, 25, e49949. [Google Scholar] [CrossRef] [PubMed]
- Kavakiotis, I.; Tsave, O.; Salifoglou, A.; Maglaveras, N.; Vlahavas, I.; Chouvarda, I. Machine Learning and Data Mining Methods in Diabetes Research. Comput. Struct. Biotechnol. J. 2017, 15, 104–116. [Google Scholar] [CrossRef] [PubMed]
- Olisah, C.C.; Smith, L.; Smith, M. Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Comput. Methods Programs Biomed. 2022, 220, 106773. [Google Scholar] [CrossRef] [PubMed]
- Ahmed Hashim, A.; Hameed Mousa, A. An evaluation framework for diabetes prediction techniques using machine learning. BIO Web Conf. 2024, 97, 125. [Google Scholar] [CrossRef]
- Duckworth, C.; Guy, M.J.; Kumaran, A.; O’Kane, A.A.; Ayobi, A.; Chapman, A.; Marshall, P.; Boniface, M. Explainable Machine Learning for Real-Time Hypoglycemia and Hyperglycemia Prediction and Personalized Control Recommendations. J. Diabetes Sci. Technol. 2024, 18, 113–123. [Google Scholar] [CrossRef] [PubMed]
- Dharmarathne, G.; Jayasinghe, T.N.; Bogahawaththa, M.; Meddage, D.P.P.; Rathnayake, U. A novel machine learning approach for diagnosing diabetes with a self-explainable interface. Healthc. Anal. 2024, 5, 100301. [Google Scholar] [CrossRef]
- Tigga, N.P.; Garg, S. Prediction of Type 2 Diabetes using Machine Learning Classification Methods. Procedia Comput. Sci. 2020, 167, 706–716. [Google Scholar] [CrossRef]
- Kumari, V.A.; Chitra, R. Classification of Diabetes Disease Using Support Vector Machine. Int. J. Eng. Res. Appl. 2013, 3, 1797–1801. [Google Scholar]
- Sisodia, D.; Sisodia, D.S. Prediction of Diabetes using Classification Algorithms. Procedia Comput. Sci. 2018, 132, 1578–1585. [Google Scholar] [CrossRef]
- Behera, M.K.; Chakravarty, S. Diabetic Retinopathy Image Classification Using Support Vector Machine. In Proceedings of the 2020 International Conference on Computer Science, Engineering and Applications (ICCSEA), Gunupur, India, 13–14 March 2020; pp. 1–4. [Google Scholar] [CrossRef]
- Wu, J.; Diao, Y.; Li, M.; Fang, Y.; Ma, D. A semi-supervised learning based method: Laplacian support vector machine used in diabetes disease diagnosis. Interdiscip. Sci. Comput. Life Sci. 2009, 1, 151–155. [Google Scholar] [CrossRef]
- Alghurair, N.I.; Mezher, M.A. A Survey Study Support Vector Machines and K-MEAN Algorithms for Diabetes Dataset. Acad. J. Res. Sci. Publ. 2020, 2, 14–25. [Google Scholar]
- Chang, V.; Bailey, J.; Xu, Q.A.; Sun, Z. Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Neural Comput. Applic. 2023, 35, 16157–16173. [Google Scholar] [CrossRef] [PubMed]
- Guan, Y.; Tsai, C.J.; Zhang, S. Research on Diabetes Prediction Model of Pima Indian Females. In Proceedings of the 2023 4th International Symposium on Artificial Intelligence for Medicine Science, Chengdu China, 20–22 October 2023; pp. 294–303. [Google Scholar] [CrossRef]
- Sangroya, A.; Anantaram, C.; Rawat, M.; Rastogi, M. Using Formal Concept Analysis to Explain Black Box Deep Learning Classification Models. In Proceedings of the 7th International Workshop “What Can FCA do for Artificial Intelligence”? Co-Located with International Joint Conference on Artificial Intelligence (IJCAI 2019), Macao, China, 10 August 2019. [Google Scholar]
- Dagliati, A.; Marini, S.; Sacchi, L.; Cogni, G.; Teliti, M.; Tibollo, V.; De Cata, P.; Chiovato, L.; Bellazzi, R. Machine Learning Methods to Predict Diabetes Complications. J. Diabetes Sci. Technol. 2018, 12, 295–302. [Google Scholar] [CrossRef] [PubMed]
- Erickson, N.; Mueller, J.; Shirkov, A.; Zhang, H.; Larroy, P.; Li, M.; Smola, A. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. arXiv 2020, arXiv:2003.06505. [Google Scholar]
- Joseph, V.R. Optimal ratio for data splitting. Stat. Anal. Data Min. 2022, 15, 531–538. [Google Scholar] [CrossRef]
- Verdonck, T.; Baesens, B.; Óskarsdóttir, M.; van den Broucke, S. Special issue on feature engineering editorial. Mach. Learn. 2024, 113, 3917–3928. [Google Scholar] [CrossRef]
- Shorten, C.; Taghi, M. Khoshgoftaar A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Bey, R.; Goussault, R.; Grolleau, F.; Benchoufi, M.; Porcher, R. Fold-stratified cross-validation for unbiased and privacy-preserving federated learning. J. Am. Med. Inform. Assoc. 2020, 27, 1244–1251. [Google Scholar] [CrossRef] [PubMed]
- Shchur, O.; Turkmen, C.; Erickson, N.; Shen, H.; Shirkov, A.; Hu, T.; Wang, Y. AutoGluon-TimeSeries: AutoML for Probabilistic Time Series Forecasting. arXiv 2023, arXiv:2308.05566. [Google Scholar]
- Mathotaarachchi, K.V.; Hasan, R.; Mahmood, S. Advanced Machine Learning Techniques for Predictive Modeling of Property Prices. Information 2024, 15, 295. [Google Scholar] [CrossRef]
- Ejiyi, C.J.; Qin, Z.; Amos, J.; Ejiyi, M.B.; Nnani, A.; Ejiyi, T.U.; Agbesi, V.K.; Diokpo, C.; Okpara, C. A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms. Healthc. Anal. 2023, 3, 100166. [Google Scholar] [CrossRef]
- Ghosh, S.K.; Khandoker, A.H. Investigation on explainable machine learning models to predict chronic kidney diseases. Sci. Rep. 2024, 14, 3687. [Google Scholar] [CrossRef]
- Verma, S.; Boonsanong, V.; Hoang, M.; Hines, K.; Dickerson, J.; Shah, C. Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review. ACM CSUR 2024, 56, 1–42. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, T.; Guo, X.; Shen, Z. Gradient based Feature Attribution in Explainable AI: A Technical Review. arXiv 2024, arXiv:2403.10415. [Google Scholar]
- Yan, R.; Shang, Z.; Wang, Z.; Xu, W.; Zhao, Z.; Wang, S.; Chen, X. Challenges and Opportunities of XAI in Industrial Intelligent Diagnosis:Priori-empowered. Ji Xie Gong Cheng Xue Bao 2024, 60, 1. [Google Scholar]
- Powers, D.M.W. Evaluation: From precision, recall and f-measure to roc, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar] [CrossRef]
- Chicco, D.; Tötsch, N.; Jurman, G. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Min. 2021, 14, 13. [Google Scholar] [CrossRef] [PubMed]
- Tasin, I.; Nabil, T.U.; Islam, S.; Khan, R. Diabetes prediction using machine learning and explainable AI techniques. Healthc. Technol. Lett. 2023, 10, 1–10. [Google Scholar] [CrossRef]
- Curia, F. Explainable and transparency machine learning approach to predict diabetes develop. Health Technol. 2023, 13, 769–780. [Google Scholar] [CrossRef]
- Tuppad, A.; Patil, S.D. Machine learning for diabetes clinical decision support: A review. Adv. Comp. Int. 2022, 2, 22. [Google Scholar] [CrossRef] [PubMed]
- Dewage, K.A.K.W.; Hasan, R.; Rehman, B.; Mahmood, S. Enhancing Brain Tumor Detection Through Custom Convolutional Neural Networks and Interpretability-Driven Analysis. Information 2024, 15, 653. [Google Scholar] [CrossRef]
- Ahmed, K.F.; Uz Zaman, M.S.; Peyal, H.I.; Hossain, A.; Rahman Ratul, M.T.; Abdal, M.N.; Islam, M.I. An Interpretable Framework for Predicting Type 2 Diabetes using ML and Explainable AI. In Proceedings of the 2023 26th International Conference on Computer and Information Technology (ICCIT), Cox’s Bazar, Bangladesh, 13–15 December 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Mahmud, S.M.H.; Hossin, M.A.; Ahmed, M.R.; Noori, S.R.H.; Sarkar, M.N.I. Machine Learning Based Unified Framework for Diabetes Prediction; ACM: New York, NY, USA, 2018; pp. 46–50. [Google Scholar]
- SumaLata, G.L.; Joshitha, C.; Kollati, M. Prediction of Diabetes Mellitus using Artificial Intelligence Techniques. Scalable Comput. Pract. Exp. 2024, 25, 3200–3213. [Google Scholar] [CrossRef]
- Larabi-Marie-Sainte, S.; Aburahmah, L.; Almohaini, R.; Saba, T. Current Techniques for Diabetes Prediction: Review and Case Study. Appl. Sci. 2019, 9, 4604. [Google Scholar] [CrossRef]
- Kibria, H.B.; Nahiduzzaman, M.; Goni, M.O.F.; Ahsan, M.; Haider, J. An Ensemble Approach for the Prediction of Diabetes Mellitus Using a Soft Voting Classifier with an Explainable AI. Sensors 2022, 22, 7268. [Google Scholar] [CrossRef] [PubMed]
- Vivek Khanna, V.; Chadaga, K.; Sampathila, N.; Prabhu, S.; Chadaga, P.R.; Bhat, D.; Swathi, K.S. Explainable artificial intelligence-driven gestational diabetes mellitus prediction using clinical and laboratory markers. Cogent Eng. 2024, 11, 2330266. [Google Scholar] [CrossRef]
- Singh, A.; Dhillon, A.; Kumar, N.; Hossain, M.S.; Muhammad, G.; Kumar, M. eDiaPredict: An Ensemble-based Framework for Diabetes Prediction. ACM TOMM 2021, 17, 1–26. [Google Scholar] [CrossRef]
- Tanim, S.A.; Aurnob, A.R.; Shrestha, T.E.; Emon, M.R.I.; Mridha, M.F.; Miah, M.S.U. Explainable deep learning for diabetes diagnosis with DeepNetX2. Biomed. Signal Process. Control. 2025, 99, 106902. [Google Scholar] [CrossRef]
- Hendawi, R.; Li, J.; Roy, S. A Mobile App That Addresses Interpretability Challenges in Machine Learning–Based Diabetes Predictions: Survey-Based User Study. JMIR Form. Res. 2023, 7, e50328. [Google Scholar] [CrossRef] [PubMed]
- Long, C.K.; Puri, V.; Solanki, V.K.; Jeanette Rincon Aponte, G. An Explainable AI-Enabled Framework for the Diabetes Classification. In Proceedings of the 2023 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT), San Salvador, El Salvador, 14–15 December 2023; pp. 1–6. [Google Scholar] [CrossRef]
Aspect | Lenatti et al. [3] | This Study |
---|---|---|
Objective | Recommend personalized biomarker modifications | Integrate CA with XAI techniques for broader clinical usability |
Dataset | Canadian EMR dataset | Pima Indian Diabetes dataset and generalization across multiple datasets |
Features Studied | Fasting blood sugar, BMI, HDL, and triglycerides | Glucose, BMI, Age, Blood Pressure, and other risk factors |
Counterfactual Focus | Minimal viable changes for Type 2 diabetes prevention | Hypothetical scenarios for interpretability and actionable insights |
XAI Techniques | Counterfactual explanations only | Counterfactuals combined with SHAP, LIME, and IG |
Clinical Applicability | Focused on biomarker reduction | Broader interpretability for both global and local model explanations |
Strengths | Personalized preventive recommendations | Enhanced usability through AutoML and explainable frameworks |
Limitations | Limited to specific biomarkers and a single dataset | Requires further clinical validation for deployment in diverse populations |
# | Technique | Accuracy (%) |
[13] | Logistic Regression | 74.4 |
K Nearest Neighbour | 70.8 | |
Support Vector Machine | 74.4 | |
Naive Bayes | 68.9 | |
Decision Tree | 69.7 | |
Random Forest | 75.0 | |
[14] | Support Vector Machine (RBF Kernel) | 78.0 |
[15] | Support Vector Machine | 65.1 |
Naive Bayes | 76.3 | |
Decision Tree | 73.8 | |
[16] | J48 Decision Tree | 74.78 |
Random Forest | 79.57 | |
Naive Bayes | 78.67 | |
[17] | Laplacian Support Vector Machine | 82.0 |
[18] | Linear Support Vector Machine | 83.0 |
RBF Support Vector Machine | 82.0 | |
[19] | J48 Decision Tree | 75.65 |
Random Forest | 73.91 | |
Naïve Bayes | 77.83 | |
[20] | Logistic Regression & Regression Tree | 77.48 |
# | ML Techniques | Key Findings | Limitations |
---|---|---|---|
[8] | Supervised Learning (SVM, Decision Trees) | 85% of studies used supervised algorithms; emphasizes the transformative potential of ML in diabetes management. | Lacks focus on interpretability of models for clinical application. |
[9] | Various (Feature Selection, Imputation) | High performance metrics achieved through data preprocessing optimize diabetes prediction models. | Does not address the interpretability of predictions, crucial for clinical use. |
[10] | Evaluation Framework | Developed a structured framework for evaluating ML techniques in diabetes detection; emphasizes rigorous assessment. | Lacks direct comparison of specific prediction techniques’ effectiveness. |
[21] | Deep Learning | Introduced a formal concept analysis framework for explaining deep learning outcomes; addresses interpretability issues. | Limited to a two-class classification problem, limiting broader applicability. |
[12] | SHAP, Various ML Models | Developed a self-explainable interface for diagnosing diabetes, enhancing understanding of risk factors. | Relies on extensive clinical data, necessitating further research on model complexity. |
[11] | Machine Learning (Various) | Highlighted the capability of ML models in predicting diabetes complications, enabling early intervention. | Challenges with data quality and parameter selection remain. |
[22] | Machine Learning (Various) | Identified key risk factors (e.g., HbA1c levels) for forecasting complications; enhances predictive power. | Missed opportunities to incorporate a wider range of risk factors. |
Gap | Objective |
---|---|
Lack of interpretability in AutoML models | Develop and evaluate explainable AutoML techniques for diabetes prediction. |
Limited clinical applicability of existing models | Investigate user-friendly interfaces that enhance clinician trust and understanding. |
Insufficient integration of diverse risk factors | Create comprehensive models that consider a wider range of patient data for improved predictions. |
Metric | Description | Value |
---|---|---|
Accuracy | Overall correctness of predictions. | 76.62% |
Balanced Accuracy | Adjusted for class imbalance. | 74.95% |
MCC | Correlation between observed and predicted classes. | 0.495 |
ROC-AUC | Model’s ability to distinguish between classes, reflected in the area under the ROC curve. | 0.774 |
F1 Score | Harmonic mean of precision and recall. | 0.679 |
Precision | Proportion of positive predictions that are correct. | 0.667 |
Recall | Proportion of actual positives identified correctly. | 0.691 |
Metric | Description | Sklearn | Rural African-American |
---|---|---|---|
Accuracy | Overall correctness of predictions. | 78.65% | 91.36% |
Balanced Accuracy | Adjusted for class imbalance. | 78.55% | 90.10% |
MCC | Correlation between observed and predicted classes. | 0.570 | 0.818 |
F1 Score | Harmonic mean of precision and recall. | 76.54% | 72.00% |
Precision | Proportion of positive predictions that are correct. | 75.61% | 90.00% |
Recall | Proportion of actual positives identified correctly. | 77.50% | 60.00% |
Fold | Accuracy (%) |
---|---|
1 | 76.1 |
2 | 75.5 |
3 | 74.8 |
4 | 77.3 |
5 | 76.7 |
Mean | 76.08 |
Model | Validation Accuracy (%) | Analysis |
---|---|---|
LightGBM Level 2 | 83.88 | High accuracy and fast training, making it ideal for real-time predictions. |
Weighted Ensemble Level 3 | 83.88 | Improved accuracy by combining multiple models. |
CatBoost Level 2 | 83.06 | Well-suited for non-linear data and complex patterns. |
XGBoost Level 2 | 83.06 | Robust to imbalanced classes, valuable in medical contexts. |
Feature | Mean Abs SHAP Value |
---|---|
BMI | 0.1979 |
DiabetesPedigreeFunction | 0.1974 |
Glucose | 0.1099 |
Insulin | 0.0700 |
Age | 0.0370 |
BloodPressure | 0.0342 |
Pregnancies | 0.0262 |
Feature | Mean Abs SHAP Value |
---|---|
BMI > 36.60 | +0.1622 |
DiabetesPedigreeFunction > 0.63 | +0.1490 |
Pregnancies ≤ 1.00 | −0.0673 |
SkinThickness > 32.00 | +0.0581 |
Insulin > 127.25 | −0.0456 |
BloodPressure ≤ 62.00 | +0.0429 |
Age between 29 and 41 | +0.0423 |
Glucose between 117 and 140 | +0.0101 |
Feature | Attribution Score | Impact |
---|---|---|
BMI | 0.0629 | Significant positive impact, highlighting that higher BMI increases the probability of a diabetes diagnosis. |
Glucose | 0.0447 | Positive attribution, indicating that higher glucose levels are a strong indicator of diabetes risk. |
Blood Pressure | −0.0353 | Negative contribution, suggesting that lower blood pressure may slightly reduce the risk classification in this case. |
Insight | Description |
---|---|
Feature Focus | Higher attention weights indicate features with greater influence on the model’s prediction. For example, Glucose and BMI consistently receive the highest attention scores, aligning with their clinical significance in predicting diabetes. |
Dynamic Adaptation | Attention weights vary across instances, allowing the model to emphasize different features depending on the specific input data. This enables the model to adapt to varying patient conditions and adjust its focus as needed. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hasan, R.; Dattana, V.; Mahmood, S.; Hussain, S. Towards Transparent Diabetes Prediction: Combining AutoML and Explainable AI for Improved Clinical Insights. Information 2025, 16, 7. https://doi.org/10.3390/info16010007
Hasan R, Dattana V, Mahmood S, Hussain S. Towards Transparent Diabetes Prediction: Combining AutoML and Explainable AI for Improved Clinical Insights. Information. 2025; 16(1):7. https://doi.org/10.3390/info16010007
Chicago/Turabian StyleHasan, Raza, Vishal Dattana, Salman Mahmood, and Saqib Hussain. 2025. "Towards Transparent Diabetes Prediction: Combining AutoML and Explainable AI for Improved Clinical Insights" Information 16, no. 1: 7. https://doi.org/10.3390/info16010007
APA StyleHasan, R., Dattana, V., Mahmood, S., & Hussain, S. (2025). Towards Transparent Diabetes Prediction: Combining AutoML and Explainable AI for Improved Clinical Insights. Information, 16(1), 7. https://doi.org/10.3390/info16010007