1. Introduction
Diabetes is one of the most common chronic diseases that affects people around the world, with more than 3 million Canadians diagnosed with diabetes, or 8.9% of the population, with a prevalence that increases at an average rate of 3.3% per year after adjusting for the aging population over time [
1]. High blood sugar levels characterize diabetes due to insufficient or no insulin production. There are three main types: type 1, type 2, and gestational diabetes. Early detection is important for managing type 2 diabetes through lifestyle changes, medication, and exercise. There are many tools available to treat and manage diabetes, but there are not as many tools as there are to prevent diabetes in patients who are still free of the disease but are still at risk of getting it. Preventing diabetes is especially important because there is no cure for it yet [
2].
Machine learning algorithms (ML) can predict the risk of diabetes. Different ML models such as Decision Trees, Random Forest, K-Nearest Neighbors, AdaBoost, XGBoost, Naive Bayes, and neural networks have been used in previous studies to predict the risk of diabetes [
3,
4]. For example, Kaur and Kumari [
5] utilized a supervised ML classifier, a linear kernel support vector machine (SVM-linear) with a precision of 0.89 and a recall of 0.88. However, these models are black boxes and not interpretable, and it is difficult to understand how the predictions were generated. In clinical decision-making, it is crucial to understand the underlying cause of a disease before trying to treat it; therefore, despite their precision, we decided not to employ these black-box techniques. To address this challenge, interpretable AI techniques, such as Shapley Additive exPlanations (SHAP) [
6], have been proposed to explain the prediction of many diseases. Although SHAP has been applied in other medical contexts [
7], its application to the classification and prevention of diabetes risk remains limited. Understanding patient-specific risk factors is also particularly important, as it can increase the efficiency of overloaded clinicians in a busy clinic.
In recent years, multiple AI-based models have demonstrated promising performance in predicting diabetes risk using explainable machine learning techniques. For example, Khokhar et al. (2025) combined SMOTE and SHAP in an ensemble learning setup to achieve high precision and interpretability in predicting the risk of type 2 diabetes [
8]. Similarly, Khan et al. (2024) applied SHAP with ensemble models to gain transparency in feature contributions [
9]. AutoML combined with SHAP and counterfactual explanations has also been proposed for personalized diabetes prediction, improving both predictive accuracy and interpretability [
10]. These studies highlight the growing trend toward interpretable AI in diabetes risk classification, but many lack real-time interaction capabilities, which is the objective of our research.
Many Artificial Intelligence (AI) tools, such as large language models (LLMs), have improved significantly in recent years, with numerous breakthroughs in LLMs occurring in the medical field. LLMs have revolutionized the field of natural language processing (NLP), greatly improving various tasks such as text generation, question answering, and document classification. By integrating LLM-based chatbots into the healthcare system, we will be able to improve the user experience, where patients can acquire significant insights into their risk factors and communicate with the chatbot on how to better understand how to improve their overall well-being.
For this study, we used the State-Of-The-Art (SOTA) BioMistral-7B model, a model fine-tuned for medical research. We further enhanced it with prompt engineering and few-shot learning to improve its performance in diabetes-specific scenarios with our data. Prompt engineering is an LLM technique that does not require significant computer power; instead, it relies on telling the model its tasks in the most efficient way possible. Few-shot learning, on the other hand, also does not require much computer power and relies on specifying the task to the model by providing examples of desired responses; in our case, it is based on our user data, with examples providing an analysis of the risk factors and giving constructive feedback on how to lower the chances of diabetes through a smart chatbot.
We propose to offer a smart chatbot to better understand how ML models identify diabetes risk and offer explanations for specific risk factors using LLMs. We built a novel recommender system and a personalized smart chatbot that combines explainable Artificial Intelligence (XAI) with Dash to identify and rank risk factors and display patient-specific risk factors for diabetes online. Our system focuses on security, maintainability, robustness, and usability in dynamic scenarios for people at risk of diabetes. The contributions of this work are (i) the integration of SHAP-based explainability with an LLM-driven personalized chatbot assistant, (ii) gender- and age-specific risk factor analysis, and (iii) the deployment of a secure, interactive web platform and chatbot for personalized diabetes risk explanation.
2. Methodology
2.1. Overview
We developed an online framework to predict the risk of type 2 diabetes and identify significant risk factors. Users input their data through a web application that undertakes data processing to collect risk factors. A CatBoost classifier [
6,
11] was trained using these data to make predictions. The SHAP method was used to determine the impact of each risk factor on the prediction. The results were presented to users through a web application, showing the most influential risk factors, and a smart chatbot assists users through a personalized chat based on the risk output and associated top risks.
Figure 1 shows the schematic framework for the proposed diabetes risk recommendation system.
2.2. Dataset
In this study, the publicly available diabetes dataset [
12] was used to build the prediction model. The dataset contains information on 520 patients, 16 risk factors, and class attributes corresponding to the result. The class response is diabetes risk, represented by 1 or 0 to show if there is a risk or no risk, respectively. The 16 risk factors are age, gender, polyuria, sudden weight loss, weakness, polyphagia, genital thrush, visual blurring, itching, irritability, delayed healing, partial paresis, muscle stiffness, alopecia, and obesity.
2.3. Data Preparation
Data preparation steps included data cleaning, transformation, reduction, and split. The dataset was divided into independent training sets (80%) and tests (20%). Before training the model, we applied the Jaccard similarity in the training dataset to find the relation of each risk factor with other factors, since the dataset consists mostly of binary values.
2.4. Classification Model
The classification model was created using the CatBoost classifier [
6,
11], which can handle categorical features without requiring additional preprocessing steps. It uses ordered target statistics and ordered boosting to reduce the risk of overfitting, which can occur with other gradient boosting techniques. The training dataset was used to train the model. To build the model, the number of iterations was set at 1000 and the learning rate was set at 0.2, since this training set was found to be the most efficient and accurate compared to the others. We used several model evaluation techniques to assess model performance, using the Area Under the ROC Curve (AUC) and Cohen’s Kappa score [
13,
14]. We evaluated the performance of CatBoost against other classifiers, such as Logistic Regression, to demonstrate its superiority.
2.5. SHAP Explanation
The SHAP method was used to explain the contribution of each feature to the predictions of the AI model. SHAP assigns an importance value to each feature based on its contribution to the prediction of the model [
15]. By computing SHAP values for each risk factor, we could observe their influence on the results, even if the characteristics had linear or nonlinear relationships with the target variable.
2.6. Group-Based Explanation
SHAP was used to calculate the importance of features for a group of instances. This resulted in a list of SHAP value arrays, one for each instance in the group. Analyzing the mean and variance of these SHAP values across the group allowed us to understand how the model predicted for that group. We created separate datasets based on gender and age. We divided the original dataset into datasets for males, females, individuals younger than the average age of 48, and individuals older than the average age. Four CatBoost models were trained using the same settings as the original model, and SHAP was applied to determine the importance of the features within each dataset.
2.7. Local Explanation
We used SHAP to explain the predictions for specific users. The resulting SHAP values represent the contribution of each feature to the prediction of an individual, providing a localized understanding of the behavior of the model.
2.8. Online Recommender System
We created a web-based dashboard using Python Dash (v2.18.0) to provide an interactive interface for users and clinicians to access the diabetes ML model (
Figure 2A). Dash is a Python framework for building interactive, web-based data visualization applications without requiring extensive web development knowledge. It integrates with Plotly, another Python library, to create high-quality and interactive graphs. Our website is protected through Cross-Site Request Forgery (CSRF) tokens built into Django, which requires a patient to log in to see their diabetes risk. The dashboard offers dynamic features and provides global and local explanations online. This system can be accessed from this link:
https://www.mamatjanlab.com/diabetes_home/ (accessed on 14 September 2025).
3. Results
Model evaluation: We used 80% of the total dataset, which includes 416 patients, as the training dataset to train the model. The iteration number was set to 1000 and the learning rate was set to 0.2 to achieve the best performance of the model. The classification model achieved a mean AUC of 0.99, Cohen’s Kappa of 0.978, and weighted average F1 measure of 0.99. Further validation was performed to evaluate the robustness and generalizability of these results. We compared the performance of the CatBoost classifier with the Logistic Regression model in
Table 1. For Logistic Regression, the classification model had an average AUC of 0.918 and Cohen’s Kappa score of 0.843. These results demonstrate the superiority of CatBoost in this context.
Group-based Explanation: After training the model, we provided insight into risk classification by adopting SHAP methods. We ranked the importance of each risk factor globally based on its respective SHAP values using the entire dataset. According to the results, the risk factors contributed the most were polyuria, polydipsia, gender, age, and delayed healing.
Local-based Explanation: Figure 2 shows the risk factors that influence the prognosis of the individual patient. For example, we selected a 40-year-old woman with the following characteristics: polyuria 1, polydipsia 1, sudden weight loss 0, weakness 1, polyphagia 1, genital thrush 0, visual blurring 0, itching 1, irritability 0, delayed healing 0, partial paresis 1, muscle stiffness 0, alopecia 0, and obesity 0. The model precisely predicted the individual’s risk of diabetes because it matched the actual class in the dataset.
Gender-based Explanation: Two models were developed using female and male samples separately. Using the SHAP technique, we obtained risk rankings for each group. For the male group, the most influential risk factors were polyuria, polydipsia, irritability, delayed healing, and blurred vision, as shown in
Figure 3. For the female group, the most influential risk factors were polyuria, alopecia, age, polydipsia, and polyphagia, as shown in
Figure 4.
Age-based Explanation: The authors created two models using separate datasets categorized by age. One consisted of samples from younger individuals and the other consisted of samples from individuals aged 48 years and older. Using the SHAP technique, we identified the five main risk factors that made the greatest contribution to each age group. For the younger group, these factors were polyuria, sex, polydipsia, age, and polyphagia. In contrast, the five most influential risk factors for the older group were polyuria, polydipsia, sudden weight loss, partial paresis, and alopecia.
SMOTE: SMOTE was applied only to the training dataset after splitting into 80% training and 20% testing to avoid data leakage. The pre-SMOTE imbalance was 64% male vs. 36% female; after oversampling, both classes were balanced to 50–50. This improved the AUC for females from 0.96 to 0.98 and the F1-score from 0.88 to 0.91.
Evaluation of BioMistral-7B: Incorporation of BioMistral-7B as a chatbot in our system provided users with personalized health recommendations and responses to diabetes queries in an easy-to-use interface. The chatbot assistant processes SHAP outputs to contextualize patient-specific risk factors and delivers targeted lifestyle recommendations via natural language (
Figure 5). Few-shot learning and prompt engineering improved personalization. We assessed the chatbot’s performance using user satisfaction (
Figure 5), accuracy of responses, and management of medical queries.
Evaluation and User Satisfaction: As this is a proof-of-principle study, no formal clinical trial was conducted due to a lengthy ethics approval process. The preliminary internal evaluation by the research team verified the clarity, the factual alignment with the diabetes guidelines, and the logical consistency of the chatbot responses. We polled a set of users and professionals who used the chatbot. Feedback overwhelmingly indicated that the chatbot was seen as useful and provided users with concise, clear, and actionable descriptions.
Accuracy of Responses: Chatbot responses were assessed by medical professionals for accuracy and relevance and were determined to be accurate and consistent with the latest medical knowledge. The ability of the LLM to provide accurate and actionable medical information showed to the ability of the system to act as an effective tool for patient education and health support.
Handling Medical Question-Answering Tasks: BioMistral-7B was integrated into our system to provide contextually appropriate patient-specific responses. The model can interpret SHAP values and provide explanations based on the patient’s specific risk factors. For example, when a patient asks ‘What are my main risk factors for diabetes?’, the model provides a customized response, considering the patient’s specific profile. Focusing on individual patient profiles ensures that users receive relevant and precise explanations.
Mitigation of hallucinations: We employed certain techniques, such as context-constrained question-answering, human oversight and validation, and continuous monitoring, to reduce hallucinations. Context-constrained question-answering maintained precision by limiting AI responses to information contained within a predefined dataset. To maintain quality and eliminate errors, human supervision and validation were used to review and verify AI output. Continuous monitoring also reduced hallucinations by detecting and correcting whenever the AI generated incorrect information. These strategies helped to improve the reliability and safety of the model in medical applications.
4. Discussion and Conclusions
This study presents an explainable AI framework for the prediction of diabetes risk, combining SHAP-based interpretability, an interactive web dashboard, and an LLM-supported chatbot. Although some machine learning models have been used to predict diabetes risk, many existing approaches in the literature lack explainability, accessibility, or interaction with patients. Our framework directly addressed these limitations by providing personalized, interpretable predictions and actionable health recommendations that bring AI-driven diagnostics closer to proactive patient engagement.
This study bridges the gap in the prediction of diabetes risk. Previous studies have demonstrated the importance of explainable AI in healthcare. For example, SHAP-based explanations have been used to identify key risk factors for diabetes, increasing the recurrence of the model [
16]. But many of these studies are limited to static offline analysis and do not offer real-time interaction for patients and healthcare professionals. Our approach provided instant access through a web-based platform while maintaining strong predictive accuracy with the CatBoost classifier (AUC = 0.99) compared to the Logistic Regression approach (AUC = 0.918). Unlike popular classifiers like Random Forest and XGBoost, which require extensive preprocessing of categorical variables, CatBoost processed them efficiently without the need for transformations. This resulted in more streamlined and accurate model training. Finally, CatBoost showed itself to be a highly accurate predictive model compared to other machine learning algorithms [
17].
Several recent studies further have advanced the state of diabetes risk prediction using AI with varying levels of interpretability and clinical integration. For example, multiple gradient boosting models were evaluated on a large cohort of individuals, achieving high AUC scores and demonstrating the scalability of AI in the screening [
18]. Kaliappan et al. (2024) compared feature selection strategies across datasets and confirmed the value of SHAP in identifying domain-relevant risk factors [
19]. Although these studies emphasize explainability, our framework was built on them by offering real-time user interaction and personalized chatbot-assisted risk interpretation, filling a notable gap in translational utility.
It has become increasingly important to personalize chatbots for patient engagement. A key achievement in our framework was the integration of an LLM-powered chatbot (Biomistral-7B) to provide tailored recommendations. Unlike traditional risk prediction models that generate a static risk score that must be interpreted by healthcare professionals, our chatbot explained the results in simple terms and could recommend personalized lifestyle modifications based on evidence. Research has shown that AI chatbots can improve patient adherence to healthcare recommendations by improving interaction and motivation [
20]. Our chatbot offered interactive, context-sensitive responses based on the unique risk profile of a patient.
Another important component of our framework is the interactive web dashboard, which can help patients visualize personalized risk factors. Many previous studies have presented SHAP values in research without user-friendly interfaces [
21]. Our dashboard filled this gap by translating complex model output into clear, interactive visualizations. Studies have shown that well-designed dashboards improved patient self-management of chronic disease, supporting the importance of this feature [
22].
Many existing models focus on offline datasets without providing a real-time and personalized solution. This is in part because security and accessibility are critical challenges in AI-driven healthcare applications [
23]. To enhance security and encourage user adoption, our framework ensured a secure patient login that follows best practices in data security.
Our explainable AI framework with a personalized chatbot has significant clinical and practical benefits. By visualizing SHAP values, patients can see why they are at risk, prompting them to take preventive action. Chatbot-driven guidance further encourages interaction through personalized recommendations. Healthcare professionals also benefit from it by prioritizing high-risk patients and optimizing resource usage. Patient data was anonymized and storage was secured with encryption and HTTPS, aligned with best practices for privacy (access controls), and chatbot logs are stored locally and visible only to the user.
Limitations: Despite these promising results, there are several areas for improvement. Future research should conduct clinical trials to assess the effectiveness of the framework in improving health outcomes. Furthermore, the current dataset was made up of 520 patients, which may not capture variations in diabetes risk among various populations. SMOTE may introduce synthetic bias. No prospective validation was performed; future work will include multi-center trials.
In general, our study presented an explainable AI framework that integrated SHAP-based interpretation, an interactive web dashboard, and an LLM-powered chatbot for personalized recommendations. Our diabetes classifier can serve as an additional source of information for clinicians to reduce their workload and improve efficiency. This would reduce the time to understand the cause of the problem and the clinician would focus more on treatment or mitigation plans.
Author Contributions
Conceptualization, Y.M.; methodology, E.M. and Y.M.; software, E.M., Y.M.; validation, E.M., M.A. and Y.M.; formal analysis, E.M.; investigation, Y.M.; resources, M.A. and Y.M.; data curation, E.M.; writing—original draft preparation, E.M. and Y.M.; writing—review and editing, E.M., M.A. and Y.M.; visualization, E.M.; supervision, Y.M.; project administration, Y.M. All authors have read and agreed to the published version of the manuscript.
Funding
No external funding was received for this research.
Informed Consent Statement
This online system is a research prototype intended for education and decision support. It is not a medical device and is not intended for diagnosis or clinical decision making. Patient data is anonymized and encrypted with role-based access controls.
Data Availability Statement
Acknowledgments
The authors thank Almat Bolatbekov, Nijiati Abulizi, and Noman Ahmed for their technical support in chatbot development, useful suggestions, and initial analysis of diabetes research.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
References
- Public Health Agency of Canada. Framework for Diabetes in Canada. 2022. Available online: https://www.canada.ca/en/public-health/services/publications/diseases-conditions/framework-diabetes-canada.html (accessed on 14 September 2025).
- Rubino, F.; Gagner, M. Potential of surgery for curing type 2 diabetes mellitus. Ann. Surg. 2002, 236, 554–559. [Google Scholar] [CrossRef] [PubMed]
- Jian, Y.; Pasquier, M.; Sagahyroon, A.; Aloul, F. A machine learning approach to predicting diabetes complications. Healthcare 2021, 9, 1712. [Google Scholar] [CrossRef] [PubMed]
- Andrabi, S.A.B.; Singh, I. A comparative study of machine learning techniques for diabetes prediction. In Proceedings of the 2022 4th International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 21–23 September 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 741–745. [Google Scholar] [CrossRef]
- Kaur, H.; Kumari, V. Predictive modelling and analytics for diabetes using a machine learning approach. Int. J. Diabetes Res. 2019, 5, 1–7. [Google Scholar] [CrossRef]
- Prokhorenkova, L.O.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. arXiv 2017, arXiv:1706.09516. [Google Scholar] [CrossRef]
- Okay, F.; Yildirim, M.; Ozdemir, S. Interpretable machine learning: A case study of healthcare. In Proceedings of the 2021 International Symposium on Networks, Computers and Communications (ISNCC), Dubai, United Arab Emirates, 31 October–2 November 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Khokhar, P.B.; Pentangelo, V.; Palomba, F.; Gravino, C. Towards transparent and accurate diabetes prediction using machine learning and explainable artificial intelligence. arXiv 2025, arXiv:2501.18071. [Google Scholar]
- Khan, A.; Yadav, S.; Nand, P.; Khanday, A.M.U.D.; Bhushan, B.; Jamil, A.; Hameedkhan, A.A. An explainable predictive model for diabetes detection using Shapley Additive Explanations approach. In Recent Trends and Advances in Artificial Intelligence. ICAETA 2024. Lecture Notes in Networks and Systems; Garcia, F.P., Jamil, A., Hameed, A.A., Ortis, A., Ramirez, I.S., Eds.; Springer: Cham, Switzerland, 2024; Volume 1138. [Google Scholar] [CrossRef]
- Hasan, R.; Dattana, V.; Mahmood, S.; Hussain, S. Towards transparent diabetes prediction: Combining AutoML and explainable AI for Improved Clinical Insights. Information 2025, 16, 7. [Google Scholar] [CrossRef]
- Dorogush, A.V.; Ershov, V.; Gulin, A. CatBoost: A gradient boosting with categorical features support. Computing Research Repository (CoRR). arXiv 2018, arXiv:1810.11363. [Google Scholar]
- Kaggle. Early Classification of Diabetes. 2022. Available online: https://www.kaggle.com/datasets/andrewmvd/early-diabetes-classification (accessed on 14 September 2025).
- Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
- Mchugh, M.L. Interrater reliability: The kappa statistic. Biochem. Medica 2012, 22, 276–282. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
- Tasin, I.; Nabil, T.U.; Islam, S.; Khan, R. Diabetes prediction using machine learning and explainable AI techniques. Healthc. Technol. Lett. 2022, 10, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Ibrahim, A.A.; Ridwan, R.L.; Muhammed, M.M.; Abdulaziz, R.O.; Saheed, G.A. Comparison of the CatBoost Classifier with other Machine Learning Methods. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 738–748. [Google Scholar] [CrossRef]
- Hasan, M.K.; Alam, M.A.; Das, D.; Hossain, E.; Hasan, M. Diabetes prediction using ensembling of different ML classifiers. IEEE Access 2020, 13, 1274–1289. [Google Scholar]
- Kaliappan, J.; Saravana Kumar, I.J.; Sundaravelan, S.; Anesh, T.; Rithik, R.R.; Singh, Y.; Vera-Garcia, D.V.; Himeur, Y.; Mansoor, W.; Atalla, S.; et al. Analyzing classification and feature selection strategies for diabetes prediction across diverse diabetes datasets. Front. Artif. Intell. 2024, 7, 1421751. [Google Scholar] [CrossRef] [PubMed]
- Aggarwal, A.; Tam, C.C.; Wu, D.; Li, X.; Qiao, S. Artificial intelligence-based chatbots for promoting health behavioral changes: Systematic review. J. Med. Internet Res. 2023, 25, e40789. [Google Scholar] [CrossRef] [PubMed]
- Mohanty, P.K.; Francis, S.A.J.; Barik, R.K.; Roy, D.S.; Saikia, M.J. Leveraging Shapley additive explanations for feature selection in ensemble models for diabetes prediction. Bioengineering 2024, 11, 1215. [Google Scholar] [CrossRef] [PubMed]
- van de Vijver, S.; Hummel, D.; van Dijk, A.H.; Cox, J.; Dijk, O.v.; den Broek, N.V.; Metting, E. Evaluation of a digital self-management platform for patients with chronic illness in primary care: Qualitative study of stakeholders’ perspectives. JMIR Form. Res. 2022, 6, e38424. [Google Scholar] [CrossRef] [PubMed]
- Murdoch, B. Privacy and artificial intelligence: Challenges for protecting health information in a new era. BMC Med. Ethics 2021, 22, 122. [Google Scholar] [CrossRef] [PubMed]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).