A Comparative Analysis of Machine Learning Models: A Case Study in Predicting Chronic Kidney Disease
Abstract
:1. Introduction
- For the first time, we included primary data from CKD patients in district Buner, Kyber Pakhtunkhwa, Pakistan, to motivate developing countries to implement machine learning algorithms to reliably and efficiently classify healthy people and people with chronic kidney disease;
- To assess the consistency of the considered ML models, three different scenarios of training and testing set were adopted: (a) 90% training, 10% testing; (b) 75% training, 25% testing; and (c) 50% training, 50% testing. Additionally, within each validation scenario, the simulation was ran one thousand times to test the models’ consistency;
- The prominent machine learning models were used for the comparison of predicting CKD, including logistic, probit, random forest, decision tree, k-nearest neighbor, and support vector machine with four kernel functions (linear, Laplacian, Bessel, and radial basis kernels);
- The performance of the models is evaluated using the six performance measures, including accuracy, Brier score, sensitivity, Youdent, specificity, and F1 score. Moreover, to assess the significance of the differences in the prediction performance of the models, the Diebold and Mariano test was performed.
2. Materials and Methods
2.1. Description of Variables
2.2. Specification of Machine Learning Models
2.2.1. Logistic Regression (LR)
2.2.2. K-Nearest Neighbor (KNN)
2.2.3. Support Vector Machine (SVM)
2.2.4. Decision Tree (DT)
2.2.5. Random Forest (RF)
2.3. Performance Measures
2.3.1. Accuracy
2.3.2. Brier Score
2.3.3. Sensitivity
2.3.4. Youdent
2.3.5. Specificity
2.3.6. F1 Score
3. Results and Discussion
4. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yan, M.T.; Chao, C.T.; Lin, S.H. Chronic kidney disease: Strategies to retard progression. Int. J. Mol. Sci. 2021, 22, 10084. [Google Scholar] [CrossRef] [PubMed]
- Lozano, R.; Naghavi, M.; Foreman, K.; Lim, S.; Shibuya, K.; Aboyans, V.; Abraham, J.; Adair, T.; Aggarwal, R.; Ahn, S.Y.; et al. Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: A systematic analysis for the Global Burden of Disease Study 2010. Lancet 2012, 380, 2095–2128. [Google Scholar] [CrossRef]
- Jha, V.; Garcia-Garcia, G.; Iseki, K.; Li, Z.; Naicker, S.; Plattner, B.; Saran, R.; Wang, A.Y.M.; Yang, C.W. Chronic kidney disease: Global dimension and perspectives. Lancet 2013, 382, 260–272. [Google Scholar] [CrossRef] [PubMed]
- Eckardt, K.U.; Coresh, J.; Devuyst, O.; Johnson, R.J.; Köttgen, A.; Levey, A.S.; Levin, A. Evolving importance of kidney disease: From subspecialty to global health burden. Lancet 2013, 382, 158–169. [Google Scholar] [CrossRef]
- Rapa, S.F.; Di Iorio, B.R.; Campiglia, P.; Heidland, A.; Marzocco, S. Inflammation and oxidative stress in chronic kidney disease—Potential therapeutic role of minerals, vitamins and plant-derived metabolites. Int. J. Mol. Sci. 2019, 21, 263. [Google Scholar] [CrossRef]
- Jayasumana, C.; Gunatilake, S.; Senanayake, P. Glyphosate, hard water and nephrotoxic metals: Are they the culprits behind the epidemic of chronic kidney disease of unknown etiology in Sri Lanka? Int. J. Environ. Res. Public Health 2014, 11, 2125–2147. [Google Scholar] [CrossRef] [PubMed]
- Mubarik, S.; Malik, S.S.; Mubarak, R.; Gilani, M.; Masood, N. Hypertension associated risk factors in Pakistan: A multifactorial case-control study. J. Pak. Med. Assoc. 2019, 69, 1070–1073. [Google Scholar]
- Naqvi, A.A.; Hassali, M.A.; Aftab, M.T. Epidemiology of rheumatoid arthritis, clinical aspects and socio-economic determinants in Pakistani patients: A systematic review and meta-analysis. JPMA J. Pak. Med. Assoc. 2019, 69, 389–398. [Google Scholar]
- Hsu, R.K.; Powe, N.R. Recent trends in the prevalence of chronic kidney disease: Not the same old song. Curr. Opin. Nephrol. Hypertens. 2017, 26, 187–196. [Google Scholar] [CrossRef]
- Salazar, L.H.A.; Leithardt, V.R.; Parreira, W.D.; da Rocha Fernandes, A.M.; Barbosa, J.L.V.; Correia, S.D. Application of machine learning techniques to predict a patient’s no-show in the healthcare sector. Future Internet 2022, 14, 3. [Google Scholar] [CrossRef]
- Elsheikh, A.H.; Saba, A.I.; Panchal, H.; Shanmugan, S.; Alsaleh, N.A.; Ahmadein, M. Artificial intelligence for forecasting the prevalence of COVID-19 pandemic: An overview. Healthcare 2021, 9, 1614. [Google Scholar] [CrossRef] [PubMed]
- Khamparia, A.; Pandey, B. A novel integrated principal component analysis and support vector machines-based diagnostic system for detection of chronic kidney disease. Int. J. Data Anal. Tech. Strateg. 2020, 12, 99–113. [Google Scholar] [CrossRef]
- Zhao, Y.; Zhang, Y. Comparison of decision tree methods for finding active objects. Adv. Space Res. 2008, 41, 1955–1959. [Google Scholar] [CrossRef]
- Vijayarani, S.; Dhayanand, S.; Phil, M. Kidney disease prediction using SVM and ANN algorithms. Int. J. Comput. Bus. Res. (IJCBR) 2015, 6, 1–12. [Google Scholar]
- Dritsas, E.; Trigka, M. Machine learning techniques for chronic kidney disease risk prediction. Big Data Cogn. Comput. 2022, 6, 98. [Google Scholar] [CrossRef]
- Wickramasinghe, M.P.N.M.; Perera, D.M.; Kahandawaarachchi, K.A.D.C.P. (2017, December). Dietary prediction for patients with Chronic Kidney Disease (CKD) by considering blood potassium level using machine learning algorithms. In Proceedings of the 2017 IEEE Life Sciences Conference (LSC), Sydney, Australia, 13–15 December 2017; IEEE: Piscataway, NJ, USA, 2018; pp. 300–303.
- Gupta, A.; Eysenbach, B.; Finn, C.; Levine, S. Unsupervised meta-learning for reinforcement learning. arXiv 2018, arXiv:1806.04640. [Google Scholar]
- Lakshmi, K.; Nagesh, Y.; Krishna, M.V. Performance comparison of three data mining techniques for predicting kidney dialysis survivability. Int. J. Adv. Eng. Technol. 2014, 7, 242. [Google Scholar]
- Zhang, H.; Hung, C.L.; Chu, W.C.C.; Chiu, P.F.; Tang, C.Y. Chronic kidney disease survival prediction with artificial neural networks. In Proceedings of the 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain, 3–6 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1351–1356. [Google Scholar]
- Kavakiotis, I.; Tsave, O.; Salifoglou, A.; Maglaveras, N.; Vlahavas, I.; Chouvarda, I. Machine learning and data mining methods in diabetes research. Comput. Struct. Biotechnol. J. 2017, 15, 104–116. [Google Scholar] [CrossRef]
- Singh, V.; Asari, V.K.; Rajasekaran, R. A Deep Neural Network for Early Detection and Prediction of Chronic Kidney Disease. Diagnostics 2022, 12, 116. [Google Scholar] [CrossRef]
- Pourhoseingholi, M.A.; Vahedi, M.; Rahimzadeh, M. Sample size calculation in medical studies. Gastroenterol. Hepatol. Bed Bench 2013, 6, 14. [Google Scholar]
- Naing, L.; Winn TB, N.R.; Rusli, B.N. Practical issues in calculating the sample size for prevalence studies. Arch. Orofac. Sci. 2006, 1, 9–14. [Google Scholar]
- Nhu, V.H.; Shirzadi, A.; Shahabi, H.; Singh, S.K.; Al-Ansari, N.; Clague, J.J.; Jaafari, A.; Chen, W.; Miraki, S.; Dou, J.; et al. Shallow landslide susceptibility mapping: A comparison between logistic model tree, logistic regression, naïve bayes tree, artificial neural network, and support vector machine algorithms. Int. J. Environ. Res. Public Health 2020, 17, 2749. [Google Scholar] [CrossRef] [PubMed]
- Joachims, T. Making large-scale svm learning. In Practical Advances in Kernel Methods-Support Vector Learning; MIT Press: Cambridge, MA, USA, 1999. [Google Scholar]
- Criminisi, A.; Shotton, J.; Konukoglu, E. Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found. Trends Comput. Graph. Vis. 2012, 7, 81–227. [Google Scholar] [CrossRef]
- Tyralis, H.; Papacharalampous, G.; Langousis, A. A brief review of random forests for water scientists and practitioners and their recent history in water resources. Water 2019, 11, 910. [Google Scholar] [CrossRef]
- Shah, I.; Iftikhar, H.; Ali, S.; Wang, D. Short-term electricity demand forecasting using components estimation technique. Energies 2019, 12, 2532. [Google Scholar] [CrossRef]
- Shah, I.; Iftikhar, H.; Ali, S. Modeling and forecasting medium-term electricity consumption using component estimation technique. Forecasting 2020, 2, 163–179. [Google Scholar] [CrossRef]
- Shah, I.; Iftikhar, H.; Ali, S. Modeling and forecasting electricity demand and prices: A comparison of alternative approaches. J. Math. 2022, 2022. [Google Scholar] [CrossRef]
- Diebold, F.X.; Mariano, R.S. Comparing predictive accuracy. J. Bus. Econ. Stat. 2002, 20, 134–144. [Google Scholar] [CrossRef]
Variables | Scale of Variables | Notation (Counts) | Label |
---|---|---|---|
Age | numerical | Years (12 to 99) | - |
Ph | numerical | Mean (5.565) Sd (0.561) | - |
Specific gravity | numerical | Mean (1.016) Sd (0.0052) | - |
Gender | nominal | male (185) female (195) | 1 0 |
Urine color | nominal | yellow (243) p.yellow (137) | 1 0 |
Albumin | nominal | trace (227) nil (153) | 1 0 |
Glucose | nominal | trace (27) nil (353) | 1 0 |
Sugar | nominal | positive (63) nil (317) | 1 0 |
Ketone bodies | nominal | trace (64) not_trace (316) | 1 0 |
Bile pigment | nominal | present (64) absent (316) | 1 0 |
Urobilinogen | nominal | abnormal (38) normal (342) | 1 0 |
Blood | nominal | positive (62) negative (318) | 1 0 |
Pus cells/WBCs | nominal | normal (166) abnormal (214) | 0 1 |
Red cells/RBCs | nominal | normal (217) abnormal (163) | 0 1 |
Epithelial cells | nominal | nil (153) Positive (227) | 0 1 |
Mucus thread | nominal | present (181) none (199) | 1 0 |
Calcium oxalate | nominal | positive (112) nil (268) | 1 0 |
Granular cast | nominal | seen (94) nil (286) | 1 0 |
Bacteria | nominal | seen (123) notseen (257) | 1 0 |
Calcium carbonate | nominal | found (335) not found (45) | 1 0 |
Disease status | nominal | ckd (240) notckd (142) | 1 0 |
Models | Accuracy | Specificity | Sensitivity | Youdent | Brier Score | Error | F Score |
---|---|---|---|---|---|---|---|
Logistic | 0.8945 | 0.8736 | 0.9073 | 0.7809 | 0.0699 | 0.1055 | 0.9135 |
Probit | 0.8942 | 0.8736 | 0.9073 | 0.7809 | 0.0686 | 0.1058 | 0.9139 |
D-Tree | 0.8839 | 0.8411 | 0.9096 | 0.7507 | 0.0953 | 0.1161 | 0.8985 |
KNN | 0.6309 | 0.4826 | 0.7209 | 0.2035 | 0.2457 | 0.3691 | 0.7800 |
SVM-RB | 0.8995 | 0.8794 | 0.9127 | 0.7921 | 0.0671 | 0.1005 | 0.9202 |
SVM-L | 0.8978 | 0.9219 | 0.8846 | 0.8065 | 0.0644 | 0.1022 | 0.9108 |
SVM-LAP | 0.9171 | 0.8671 | 0.9484 | 0.8155 | 0.0643 | 0.0829 | 0.9319 |
SVM-B | 0.8961 | 0.8751 | 0.9096 | 0.7846 | 0.0672 | 0.1039 | 0.9175 |
RF | 0.9129 | 0.8808 | 0.9330 | 0.8138 | 0.0652 | 0.0871 | 0.9270 |
Models | Accuracy | Specificity | Sensitivity | Youdent | Brier Score | Error | F Score |
---|---|---|---|---|---|---|---|
Logistic | 0.8923 | 0.8736 | 0.9073 | 0.7809 | 0.0760 | 0.1077 | 0.9135 |
Probit | 0.8927 | 0.8686 | 0.9088 | 0.7774 | 0.0742 | 0.1073 | 0.9137 |
D-Tree | 0.8722 | 0.8297 | 0.9072 | 0.7369 | 0.1040 | 0.1278 | 0.9024 |
KNN | 0.6809 | 0.5811 | 0.7423 | 0.3233 | 0.1933 | 0.3191 | 0.7800 |
SVM-RB | 0.9005 | 0.8761 | 0.9151 | 0.7913 | 0.0686 | 0.0995 | 0.9191 |
SVM-L | 0.8918 | 0.9143 | 0.8841 | 0.7984 | 0.0673 | 0.1082 | 0.9125 |
SVM-LAP | 0.9135 | 0.8643 | 0.9468 | 0.8111 | 0.0663 | 0.0865 | 0.9328 |
SVM-B | 0.8972 | 0.8729 | 0.9115 | 0.7845 | 0.0691 | 0.1028 | 0.9163 |
RF | 0.9084 | 0.8764 | 0.9318 | 0.8082 | 0.0672 | 0.0916 | 0.9284 |
Models | Accuracy | Specificity | Sensitivity | Youdent | Brier Score | Error | F Score |
---|---|---|---|---|---|---|---|
Logistic | 0.8858 | 0.8736 | 0.9073 | 0.7809 | 0.0929 | 0.1142 | 0.9135 |
Probit | 0.8873 | 0.8628 | 0.9090 | 0.7718 | 0.0900 | 0.1127 | 0.9125 |
D-Tree | 0.8457 | 0.8021 | 0.9065 | 0.7086 | 0.1207 | 0.1543 | 0.8950 |
KNN | 0.7145 | 0.6492 | 0.7553 | 0.4045 | 0.1554 | 0.2855 | 0.7685 |
SVM-RB | 0.9001 | 0.8712 | 0.9182 | 0.7893 | 0.0705 | 0.0999 | 0.9196 |
SVM-L | 0.8878 | 0.9077 | 0.8843 | 0.7920 | 0.0720 | 0.1122 | 0.9110 |
SVM-LAP | 0.9052 | 0.8585 | 0.9449 | 0.8034 | 0.0701 | 0.0948 | 0.9304 |
SVM-B | 0.8978 | 0.8692 | 0.9144 | 0.7836 | 0.0714 | 0.1022 | 0.9171 |
RF | 0.9021 | 0.8694 | 0.9314 | 0.8008 | 0.0712 | 0.0979 | 0.9264 |
Models. | Logistic | Probit | D-Tree | KNN | SVM-RB | SVM-L | SVM-LAP | SVM-B | RF |
---|---|---|---|---|---|---|---|---|---|
Logistic | - | 0.01 | 0.99 | 0.99 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |
Probit | 0.99 | _- | 0.99 | 0.99 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |
D-Tree | 0.01 | 0.01 | - | 0.99 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |
KNN | 0.01 | 0.01 | 0.01 | _- | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |
SVM-RB | 0.99 | 0.99 | 0.99 | 0.99 | - | 0.99 | 0.01 | 0.99 | 0.01 |
SVM-L | 0.99 | 0.99 | 0.99 | 0.99 | 0.01 | _- | 0.01 | 0.01 | 0.01 |
SVM-LAP | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | - | 0.99 | 0.99 |
SVM-B | 0.99 | 0.99 | 0.99 | 0.99 | 0.01 | 0.99 | 0.01 | _ - | 0.01 |
RF | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.01 | 0.99 | - |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Iftikhar, H.; Khan, M.; Khan, Z.; Khan, F.; Alshanbari, H.M.; Ahmad, Z. A Comparative Analysis of Machine Learning Models: A Case Study in Predicting Chronic Kidney Disease. Sustainability 2023, 15, 2754. https://doi.org/10.3390/su15032754
Iftikhar H, Khan M, Khan Z, Khan F, Alshanbari HM, Ahmad Z. A Comparative Analysis of Machine Learning Models: A Case Study in Predicting Chronic Kidney Disease. Sustainability. 2023; 15(3):2754. https://doi.org/10.3390/su15032754
Chicago/Turabian StyleIftikhar, Hasnain, Murad Khan, Zardad Khan, Faridoon Khan, Huda M Alshanbari, and Zubair Ahmad. 2023. "A Comparative Analysis of Machine Learning Models: A Case Study in Predicting Chronic Kidney Disease" Sustainability 15, no. 3: 2754. https://doi.org/10.3390/su15032754
APA StyleIftikhar, H., Khan, M., Khan, Z., Khan, F., Alshanbari, H. M., & Ahmad, Z. (2023). A Comparative Analysis of Machine Learning Models: A Case Study in Predicting Chronic Kidney Disease. Sustainability, 15(3), 2754. https://doi.org/10.3390/su15032754