Use of Machine Learning to Predict the Incidence of Type 2 Diabetes Among Relatively Healthy Adults: A 10-Year Longitudinal Study in Taiwan
Abstract
:1. Introduction
- to select appropriate features predicting type 2 diabetes among relatively healthy adults;
- to establish a prediction model via the use of different ML algorithms and subsequently compare the performance accuracy.
2. Materials and Methods
2.1. Dataset
2.2. Data Pre-Processing
2.3. Machine Learning and Prediction Model Development
2.3.1. Random Forest (RF)
2.3.2. Logistic Regression (LR)
2.3.3. XGBoost
2.4. Data Analysis Through Machine Learning
3. Results
4. Discussion
4.1. Features of Importance
4.2. Strength and Limitations
4.3. Implications
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Khan, M.A.B.; Hashim, M.J.; King, J.K.; Govender, R.D.; Mustafa, H.; Al Kaabi, J. Epidemiology of Type 2 Diabetes—Global Burden of Disease and Forecasted Trends. J. Epidemiol. Glob. Health 2020, 10, 107–111. [Google Scholar] [CrossRef]
- Ang, G.Y. Age of onset of diabetes and all-cause mortality. World J. Diabetes 2020, 11, 95–99. [Google Scholar] [CrossRef]
- Harris, M.I.; Klein, R.; Welborn, T.A.; Knuiman, M.W. Onset of NIDDM occurs at least 4–7 yr before clinical diagnosis. Diabetes Care 1992, 15, 815–819. [Google Scholar] [CrossRef] [PubMed]
- Chatterjee, S.; Khunti, K.; Davies, M.J. Type 2 diabetes. Lancet 2017, 389, 2239–2251. [Google Scholar] [CrossRef]
- Al-Kahil, A.B.; Khawaja, R.A.; Kadri, A.Y.; Abbarh, M.S.M.; Alakhras, J.T.; Jaganathan, P.P. Knowledge and Practices Toward Routine Medical Checkup Among Middle-Aged and Elderly People of Riyadh. J. Patient Exp. 2020, 7, 1310–1315. [Google Scholar] [CrossRef] [PubMed]
- Wee, H.L.; Ho, H.K.; Li, S.C. Public awareness of diabetes mellitus in Singapore. Singap. Med. J. 2002, 43, 128–134. [Google Scholar]
- Bommer, C.; Sagalova, V.; Heesemann, E.; Manne-Goehler, J.; Atun, R.; Bärnighausen, T.; Davies, J.; Vollmer, S. Global Economic Burden of Diabetes in Adults: Projections from 2015 to 2030. Diabetes Care 2018, 41, 963–970. [Google Scholar] [CrossRef] [PubMed]
- Xiong, J.; Lipsitz, O.; Nasri, F.; Lui, L.M.W.; Gill, H.; Phan, L.; Chen-Li, D.; Iacobucci, M.; Ho, R.; Majeed, A.; et al. Impact of COVID-19 pandemic on mental health in the general population: A systematic review. J. Affect. Disord. 2020, 277, 55–64. [Google Scholar] [CrossRef] [PubMed]
- Yao, H.; Chen, J.-H.; Xu, Y.-F. Patients with mental health disorders in the COVID-19 epidemic. Lancet Psychiatry 2020, 7, e21. [Google Scholar] [CrossRef]
- Boddu, S.K.; Aurangabadkar, G.; Kuchay, M.S. New onset diabetes, type 1 diabetes and COVID-19. Diabetes Metab. Syndr. 2020, 14, 2211–2217. [Google Scholar] [CrossRef] [PubMed]
- Shrestha, D.B.; Budhathoki, P.; Raut, S.; Adhikari, S.; Ghimire, P.; Thapaliya, S.; Rabaan, A.A.; Karki, B.J. New-onset diabetes in COVID-19 and clinical outcomes: A systematic review and meta-analysis. World J. Virol. 2021, 10, 275–287. [Google Scholar] [CrossRef]
- Banerjee, M.; Pal, R.; Dutta, S. Risk of incident diabetes post-COVID-19: A systematic review and meta-analysis. Prim. Care Diabetes 2022, 16, 591–593. [Google Scholar] [CrossRef] [PubMed]
- Secinaro, S.; Calandra, D.; Secinaro, A.; Muthurangu, V.; Biancone, P. The role of artificial intelligence in healthcare: A structured literature review. BMC Med. Inf. Inform. Decis. Mak. 2021, 21, 125. [Google Scholar] [CrossRef] [PubMed]
- Chu, W.-M.; Tsan, Y.-T.; Chen, P.-Y.; Chen, C.-Y.; Hao, M.-L.; Chan, W.-C.; Chen, H.-M.; Hsu, P.-S.; Lin, S.-Y.; Yang, C.-T. A model for predicting physical function upon discharge of hospitalized older adults in Taiwan-a machine learning approach based on both electronic health records and comprehensive geriatric assessment. Front. Med. 2023, 10, 1160013. [Google Scholar] [CrossRef] [PubMed]
- Marier, A.; Olsho, L.E.; Rhodes, W.; Spector, W.D. Improving prediction of fall risk among nursing home residents using electronic medical records. J. Am. Med. Inf. Inform. Assoc. 2016, 23, 276–282. [Google Scholar] [CrossRef] [PubMed]
- Kavakiotis, I.; Tsave, O.; Salifoglou, A.; Maglaveras, N.; Vlahavas, I.; Chouvarda, I. Machine Learning and Data Mining Methods in Diabetes Research. Comput. Struct. Biotechnol. J. 2017, 15, 104–116. [Google Scholar] [CrossRef]
- Dagliati, A.; Marini, S.; Sacchi, L.; Cogni, G.; Teliti, M.; Tibollo, V.; De Cata, P.; Chiovato, L.; Bellazzi, R. Machine Learning Methods to Predict Diabetes Complications. J. Diabetes Sci. Technol. 2018, 12, 295–302. [Google Scholar] [CrossRef] [PubMed]
- Wu, Y.; Zhang, Q.; Hu, Y.; Sun-Woo, K.; Zhang, X.; Zhu, H.; Jie, L.; Li, S. Novel binary logistic regression model based on feature transformation of XGBoost for type 2 Diabetes Mellitus prediction in healthcare systems. Future Gener. Comput. Syst. 2022, 129, 1–12. [Google Scholar] [CrossRef]
- Mahajan, P.; Uddin, S.; Hajati, F.; Moni, M.A. Ensemble Learning for Disease Prediction: A Review. Healthcare 2023, 11, 1808. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; Wang, X.; Chen, A.; Jin, X.; Che, H. Prediction of Type 2 Diabetes Risk and Its Effect Evaluation Based on the XGBoost Model. Healthcare 2020, 8, 247. [Google Scholar] [CrossRef] [PubMed]
- Nohara, Y.; Matsumoto, K.; Soejima, H.; Nakashima, N. Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput. Methods Programs Biomed. 2022, 214, 106584. [Google Scholar] [CrossRef] [PubMed]
- Lundberg, S. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
- Abnoosian, K.; Farnoosh, R.; Behzadi, M.H. Prediction of diabetes disease using an ensemble of machine learning multi-classifier models. BMC Bioinform. 2023, 24, 337. [Google Scholar] [CrossRef]
- Hasan, K.; Alam, A.; Das, D.; Hossain, E.; Hasan, M. Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers. IEEE Access 2020, 8, 76516–76531. [Google Scholar] [CrossRef]
- Zou, Q.; Qu, K.; Luo, Y.; Yin, D.; Ju, Y.; Tang, H. Predicting diabetes mellitus with machine learning techniques. Front. Genet. 2018, 9, 515. [Google Scholar] [CrossRef] [PubMed]
- Olisah, C.C.; Smith, L.; Smith, M. Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Comput. Methods Programs Biomed. 2022, 220, 106773. [Google Scholar] [CrossRef]
- Shin, J.; Kim, J.; Lee, C.; Yoon, J.Y.; Kim, S.; Song, S.; Kim, H.-S. Development of various diabetes prediction models using machine learning techniques. Diabetes Metab. J. 2022, 46, 650–657. [Google Scholar] [CrossRef] [PubMed]
- Crunkhorn, S.; Patti, M.-E. Links between thyroid hormone action, oxidative metabolism, and diabetes risk? Thyroid 2008, 18, 227–237. [Google Scholar] [CrossRef]
- Falzacappa, C.V.; Mangialardo, C.; Raffa, S.; Mancuso, A.; Piergrossi, P.; Moriggi, G.; Piro, S.; Stigliano, A.; Torrisi, M.R.; Brunetti, E.; et al. The thyroid hormone T3 improves function and survival of rat pancreatic islets during in vitro culture. Islets 2010, 2, 96–103. [Google Scholar] [CrossRef]
- Chaker, L.; Ligthart, S.; Korevaar, T.I.M.; Hofman, A.; Franco, O.H.; Peeters, R.P.; Dehghan, A. Thyroid function and risk of type 2 diabetes: A population-based prospective cohort study. BMC Med. 2016, 14, 150. [Google Scholar] [CrossRef] [PubMed]
- Nishi, M. Diabetes mellitus and thyroid diseases. Diabetol. Int. 2018, 9, 108–112. [Google Scholar] [CrossRef] [PubMed]
- Khanam, J.J.; Foo, S.Y. A comparison of machine learning algorithms for diabetes prediction. ICT Express 2021, 7, 432–439. [Google Scholar] [CrossRef]
Parameter | Mean | 25% | 50% | 75% |
---|---|---|---|---|
age | 57.7 | 50 | 58 | 65 |
height (cm) | 165.7 | 159.5 | 165.9 | 171.7 |
weight (kg) | 66.2 | 56.8 | 65.6 | 74.0 |
waist (cm) | 80 | 73 | 80 | 86 |
respiration rate (/min) | 16.6 | 16 | 16 | 18 |
pulse (/min) | 70.6 | 63 | 70 | 77 |
AST (U/L) | 24.3 | 18 | 22 | 27 |
ALT (U/L) | 30.1 | 17 | 24 | 35 |
total bilirubin (mg/dL) | 0.88 | 0.6 | 0.8 | 1.0 |
direct bilirubin (mg/dL) | 0.5 | 0.5 | 0.5 | 0.5 |
r-GT (U/L) | 33.9 | 16 | 24 | 36 |
serum creatinine (mg/dL) | 0.84 | 0.70 | 0.83 | 0.99 |
BUN (mg/dL) | 12 | 10 | 12 | 14 |
eGFR (ml/min/1.73m2) | 97.91 | 83.09 | 94.74 | 109.11 |
FBS (mg/dL) | 89.6 | 84 | 89 | 95 |
HbA1c (%) | 5.5 | 5.3 | 5.5 | 5.7 |
TC (mg/dL) | 198.2 | 174 | 196 | 220 |
TG (mg/dL) | 127.5 | 72 | 105 | 155 |
HDL-C (mg/dL) | 55.1 | 44 | 53 | 64 |
LDL-C, measured (mg/dL) | 116.45 | 96.3 | 113.0 | 136.0 |
Hgb (g/dL) | 14.4 | 13.3 | 14.5 | 15.5 |
platelet (1000/μL) | 243.3 | 203 | 238 | 277 |
albumin (g/dL) | 4.5 | 4.3 | 4.5 | 4.7 |
total protein (g/dL) | 7.3 | 7.0 | 7.3 | 7.6 |
uric acid (mg/dL) | 6.2 | 5.0 | 6.1 | 7.2 |
fT4 (ng/dL) | 13.2 | 11.2 | 12.3 | 14.8 |
hsCRP (mg/L) | 0.1688 | 0.021 | 0.058 | 0.158 |
serum sodium (mEq/L) | 143.4 | 142 | 143 | 145 |
serum calcium (mg/dL) | 8.9 | 8.7 | 8.9 | 9.2 |
urine glucose (mg/dL) | 8.8 | 0 | 0 | 0 |
urine ketone (mg/dL) | 0.2 | 0 | 0 | 0 |
Class 0 (Normal Cases), Support: 6967 | Class 1 (Potential Diabetes Cases), Support: 76 | |||||||
---|---|---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision | Recall | F1-Score | Accuracy | Macro F1-Score | |
Random Forest | 0.99 | 1.00 | 0.99 | 0.25 | 0.01 | 0.03 | 0.99 | 0.51 |
Logistic regression | 0.99 | 1.00 | 0.99 | 0.00 | 0.00 | 0.00 | 0.99 | 0.50 |
XGBoost | 0.99 | 0.98 | 0.99 | 0.19 | 0.33 | 0.24 | 0.98 | 0.61 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, Y.-Q.; Chang, T.-W.; Lee, L.-C.; Chen, C.-Y.; Hsu, P.-S.; Tsan, Y.-T.; Yang, C.-T.; Chu, W.-M. Use of Machine Learning to Predict the Incidence of Type 2 Diabetes Among Relatively Healthy Adults: A 10-Year Longitudinal Study in Taiwan. Diagnostics 2025, 15, 72. https://doi.org/10.3390/diagnostics15010072
Liu Y-Q, Chang T-W, Lee L-C, Chen C-Y, Hsu P-S, Tsan Y-T, Yang C-T, Chu W-M. Use of Machine Learning to Predict the Incidence of Type 2 Diabetes Among Relatively Healthy Adults: A 10-Year Longitudinal Study in Taiwan. Diagnostics. 2025; 15(1):72. https://doi.org/10.3390/diagnostics15010072
Chicago/Turabian StyleLiu, Ying-Qiang, Tzu-Wei Chang, Lung-Chun Lee, Chia-Yu Chen, Pi-Shan Hsu, Yu-Tse Tsan, Chao-Tung Yang, and Wei-Min Chu. 2025. "Use of Machine Learning to Predict the Incidence of Type 2 Diabetes Among Relatively Healthy Adults: A 10-Year Longitudinal Study in Taiwan" Diagnostics 15, no. 1: 72. https://doi.org/10.3390/diagnostics15010072
APA StyleLiu, Y.-Q., Chang, T.-W., Lee, L.-C., Chen, C.-Y., Hsu, P.-S., Tsan, Y.-T., Yang, C.-T., & Chu, W.-M. (2025). Use of Machine Learning to Predict the Incidence of Type 2 Diabetes Among Relatively Healthy Adults: A 10-Year Longitudinal Study in Taiwan. Diagnostics, 15(1), 72. https://doi.org/10.3390/diagnostics15010072