Evaluating the Performance of Automated Machine Learning (AutoML) Tools for Heart Disease Diagnosis and Prediction
Abstract
:1. Introduction
2. Related Works
3. Methodology
3.1. AutoML Tools
3.2. Details of the Selected AutoML Tools
3.3. Dataset
3.4. Details of the Selected Datasets
3.5. Performance Metrics Used
3.6. Applying Traditional Steps of Manually Generating the Well-Performing Model
- (1).
- Data Cleaning: Rows with missing information in the “ca” and “thal” columns were removed to ensure data integrity.
- (2).
- Data Type Conversion: All fields were converted to numeric data types to facilitate subsequent analysis and modeling.
- (3).
- Correlation Analysis: The correlations between the fields and the target label were analyzed. Four fields (“chol”, “fbs”, “trestbps”, “restecg”) with correlations below 0.2 were identified and subsequently dropped from the dataset.
- (4).
- Data Scaling: The remaining data were scaled to normalize the feature values and ensure comparability across different variables.
- (5).
- Cross-validation: Cross-validation (k = 5) accuracy scores were calculated for 10 different machine learning algorithms. The algorithms used were stochastic gradient descent (SGD), logistic regression, support vector machine with a linear kernel, support vector machine with an RBF kernel, decision tree classifier, random forest classifier, extra trees classifier, AdaBoost classifier, gradient boosting classifier, and XGBoost.
- (6).
- Hyperparameter Tuning: The top-performing algorithms (AdaBoost, rando forest, gradient boosting, XGBoost) were selected for further improvement through hyperparameter tuning. A grid search was performed using various combinations of hyperparameters, including n-estimators (100, 200, 300, 400, 500), learning-rate (0.3, 0.1, 0.05), max-features (1, 0.7, 0.5, 0.4, 0.3), subsample (1, 0.5, 0.3), max-samples (1, 0.5, 0.3, 0.2), and bootstrap (True, False).
- (7).
- Ensemble Voting Classifier: Based on the fine-tuned estimators (AdaBoost, random forest, gradient boosting, XGBoost) and the other top-performing estimators (SVC, SGD, logistic regression), an ensemble voting classifier was constructed. This ensemble classifier combined the predictions of multiple models, leveraging their collective knowledge to make a final classification decision.
4. Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Gaidai, O.; Cao, Y.; Loginov, S. Global Cardiovascular Diseases Death Rate Prediction. Curr. Probl. Cardiol. 2023, 48, 101622. [Google Scholar] [CrossRef]
- Laslett, L.J.; Alagona, P.; Clark, B.A.; Drozda, J.P.; Saldivar, F.; Wilson, S.R.; Poe, C.; Hart, M. The Worldwide Environment of Cardiovascular Disease: Prevalence, Diagnosis, Therapy, and Policy Issues. J. Am. Coll. Cardiol. 2012, 60, S1–S49. [Google Scholar] [CrossRef]
- Luo, C.; Tong, Y. Comprehensive study and review of coronary artery disease. In Proceedings of the Second International Conference on Biological Engineering and Medical Science (ICBioMed 2022), Oxford, UK, 7–13 November 2022. [Google Scholar] [CrossRef]
- Absar, N.; Das, E.K.; Shoma, S.N.; Khandaker, M.U.; Miraz, M.H.; Faruque, M.R.I.; Tamam, N.; Sulieman, A.; Pathan, R.K. The Efficacy of Machine-Learning-Supported Smart System for Heart Disease Prediction. Healthcare 2022, 10, 1137. [Google Scholar] [CrossRef]
- Rani, U. Analysis of Heart Diseases Dataset Using Neural Network Approach. Int. J. Data Min. Knowl. Manag. Process 2011, 1, 1–8. [Google Scholar] [CrossRef]
- Singh, P.; Singh, S.; Pandi-Jain, G.S. Effective heart disease prediction system using data mining techniques. Int. J. Nanomed. 2018, 13, 121–124. [Google Scholar] [CrossRef]
- Ismail, A.; Ravipati, S.; Gonzalez-Hernandez, D.; Mahmood, H.; Imran, A.; Munoz, E.J.; Naeem, S.; Abdin, Z.U.; Siddiqui, H.F. Carotid Artery Stenosis: A Look into the Diagnostic and Management Strategies, and Related Complications. Cureus 2023, 15, e38794. [Google Scholar] [CrossRef]
- Pol, U.R.; Sawant, T.U. Automl: Building a classification model with PyCaret. YMER 2021, 20, 547–552. [Google Scholar] [CrossRef]
- Ferreira, L.; Pilastri, A.; Martins, C.M.; Pires, P.M.; Cortez, P. A Comparison of AutoML Tools for Machine Learning, Deep Learning and XGBoost. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar] [CrossRef]
- Lenkala, S.; Marry, R.; Gopovaram, S.R.; Akinci, T.C.; Topsakal, O. Comparison of Automated Machine Learning (AutoML) Tools for Epileptic Seizure Detection Using Electroencephalograms (EEG). Computers 2023, 12, 197. [Google Scholar] [CrossRef]
- Topsakal, O.; Akinci, T.C. Classification and Regression Using Automatic Machine Learning (AutoML)–Open Source Code for Quick Adaptation and Comparison. Balk. J. Electr. Comput. Eng. 2023, 11, 257–261. [Google Scholar] [CrossRef]
- Hazra, A.; Mandal, S.K.; Gupta, A.; Mukherjee, A.; Mukherjee, A. Heart disease diagnosis and prediction using machine learning and data mining techniques: A review. Adv. Comput. Sci. Technol. 2017, 10, 2137–2159. [Google Scholar]
- Khan, Y.; Qamar, U.; Yousaf, N.; Khan, A. Machine learning techniques for heart disease datasets: A survey. In Proceedings of the 2019 11th International Conference on Machine Learning and Computing (ICMLC ’19), Zhuhai, China, 22–24 February 2019; ACM: New York, NY, USA, 2019; pp. 27–35. [Google Scholar] [CrossRef]
- Marimuthi, M.; Abinaya, M.; Hariesh, K.S.; Madhankumar, K.; Pavithra, V. A review on heart disease prediction using machine learning and data analytics approach. Int. J. Comput. Appl. 2018, 181, 20–25. [Google Scholar] [CrossRef]
- Nagavelli, U.; Samanta, D.; Chakraborty, P. Machine Learning Technology-Based Heart Disease Detection Models. J. Healthc. Eng. 2022, 2022, 7351061. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Shen, F.; Hu, L.; Lang, Z.; Liu, L.D.; Cai, F.; Fu, L. A Stare-Down Video-Rate High-Throughput Hyperspectral Imaging System and Its Applications in Biological Sample Sensing. IEEE Sens. J. 2023, 23, 23629–23637. [Google Scholar] [CrossRef]
- Shen, F.; Deng, H.; Yu, L.; Cai, F. Open-source mobile multispectral imaging system and its applications in biological sample sensing. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2022, 280, 121504. [Google Scholar] [CrossRef] [PubMed]
- Squiers, J.J.; Thatcher, J.E.; Bastawros, D.S.; Applewhite, A.J.; Baxter, R.D.; Yi, F.; Quan, P.; Yu, S.; DiMaio, J.M.; Gable, D.R. Machine learning analysis of multispectral imaging and clinical risk factors to predict amputation wound healing. J. Vasc. Surg. 2022, 75, 279–285. [Google Scholar] [CrossRef]
- Staszak, K.; Tylkowski, B.; Staszak, M. From Data to Diagnosis: How Machine Learning Is Changing Heart Health Monitoring. Int. J. Environ. Res. Public Health 2023, 20, 4605. [Google Scholar] [CrossRef]
- Padmanabhan, M.; Yuan, P.; Chada, G.; Nguyen, H.V. Physician-friendly machine learning: A case study with cardiovascular disease risk prediction. J. Clin. Med. 2019, 8, 1050. [Google Scholar] [CrossRef]
- Valarmathi, R.; Sheela, T. Heart disease prediction using hyperparameter optimization (HPO) tuning. Biomed. Signal Process. Control 2021. [CrossRef]
- Romero, R.A.A.; Deypalan, M.N.Y.; Mehrotra, S.; Jungao, J.T.; Sheils, N.E.; Manduchi, E. Benchmarking AutoML frameworks for disease prediction using medical claims. BioData Min. 2022, 15, 15. [Google Scholar] [CrossRef]
- Wang, X.; Zhang, Z.; Zhu, W. Automated graph machine learning: Approaches, libraries, and directions. arXiv 2022, arXiv:2201.01288. [Google Scholar] [CrossRef]
- Bu, C.; Lu, Y.; Liu, F. Automatic Graph Learning with Evolutionary Algorithms: An Experimental Study. In PRICAI 2021: Trends in Artificial Intelligence. PRICAI 2021, Hanoi, Vietnam, 8–12 November 2021; Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2021; Volume 13031. [Google Scholar] [CrossRef]
- Alamin, M.A. Democratizing Software Development and Machine Learning Using Low Code Applications. Master’s Thesis, University of Calgary, Calgary, AB, Canada, 2022. [Google Scholar]
- Topsakal, O.; Dobratz, E.J.; Akbas, M.I.; Dougherty, W.M.; Akinci, T.C.; Celikoyar, M.M. Utilization of Machine Learning for the Objective Assessment of Rhinoplasty Outcomes. IEEE Access 2023, 11, 42135–42145. [Google Scholar] [CrossRef]
- Madhugiri, D. Beginner’s Guide to AutoML with an Easy AutoGluon Example. Analytics Vidhya, 18 September 2022. Available online: https://www.analyticsvidhya.com/blog/2021/10/beginners-guide-to-automl-with-an-easy-autogluon-example/ (accessed on 9 September 2023).
- Jin, H.; Chollet, F.; Song, Q.; Hu, X. AutoKeras: An AutoML Library for Deep Learning. J. Mach. Learn. Res. 2023, 24, 1–6. [Google Scholar]
- Budjac, R.; Nikmon, M.; Schreiber, P.; Zahradnikova, B.; Janacova, D. Automated machine learning overview. Sciendo 2019, 27, 107–112. [Google Scholar] [CrossRef]
- Koh, J.C.O.; Spangenberg, G.; Kant, S. Automated Machine Learning for High-Throughput Image-Based Plant Phenotyping. Remote Sens. 2021, 13, 858. [Google Scholar] [CrossRef]
- Singh, V.K.; Josh, K. Automated Machine Learning (AutoML): An overview of opportunities for application and research. J. Inf. Technol. Case Appl. Res. 2022, 24, 75–85. [Google Scholar] [CrossRef]
- Lee, S.; Kim, J.; Bae, J.H.; Lee, G.; Yang, D.; Hong, J.; Lim, K.J. Development of Multi-Inflow Prediction Ensemble Model Based on Auto-Sklearn Using Combined Approach: Case Study of Soyang River Dam. Hydrology 2023, 10, 90. [Google Scholar] [CrossRef]
- Pushparaj, S.N.; Sivasankaran, S.M.; Thamizh Chem-mal, S. Prediction of Heart Disease Using a Hybrid of CNN-LSTM Algorithm. J. Surv. Fish. Sci. 2023, 10, 5700–5710. [Google Scholar]
- Ferreira, L.; Pilastri, A.L.; Henrique, C.; Santos, P.A.; Cortez, P. A Scalable and Automated Machine Learning Framework to Support Risk Management. Lect. Notes Comput. Sci. 2020, 12613, 291–307. [Google Scholar] [CrossRef]
- Egger, R. Machine Learning in Tourism: A Brief Overview. In Applied Data Science in Tourism; Spring: Berlin/Heidelberg, Germany, 2022. [Google Scholar] [CrossRef]
- Yang, S.; Bhattacharjee, D.; Kumar, V.B.Y.; Chatterjee, S.; De, S.; Debacker, P.; Verkest, D.; Mallik, A.; Catthoor, F. AERO: Design Space Exploration Framework for Resource-Constrained CNN Mapping on Tile-Based Accelerators. IEEE J. Emerg. Sel. Top. Circuits Syst. 2022, 12, 508–521. [Google Scholar] [CrossRef]
- Sarangpure, N.; Dhamde, V.; Roge, A.; Doye, J.; Patle, S.; Tamboli, S. Automating the Machine Learning Process using PyCaret and Streamlit. In Proceedings of the 2023 2nd International Conference for Innovation in Technology (INOCON), Bangalore, India, 3–5 March 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Vinicius, M.; Paulo, N.; Cecilia, M. Auto machine learning to predict pregnancy after fresh embryo transfer following in vitro fertilization. World J. Adv. Res. Rev. 2022, 16, 621–626. [Google Scholar] [CrossRef]
- Olson, R.S. TPOT. Available online: http://epistasislab.github.io/tpot/ (accessed on 3 March 2023).
- Gurdo, N.; Volke, D.C.; McCloskey, D.; Nikel, P.I. Automating the design-build-test-learn cycle towards next-generation bacterial cell factories. New Biotechnol. 2023, 74, 1–15. [Google Scholar] [CrossRef] [PubMed]
- Erickson, N.; Mueller, J.; Shirkov, A.; Zhang, H.; Larroy, P.; Li, M.; Smola, A. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. arXiv 2020, arXiv:2003.06505. [Google Scholar]
- Ali, A.A.; Khedr, A.M.; El-Bannany, M.; Kanakkayil, S. A Powerful Predicting Model for Financial Statement Fraud Based on Optimized XGBoost Ensemble Learning Technique. Appl. Sci. 2023, 13, 2272. [Google Scholar] [CrossRef]
- Gaur, S.; Kalani, P.; Mohan, M. Harmonic-to-noise ratio as a speech biomarker for fatigue: K-nearest neighbour machine learning algorithm. Med. J. Armed Forces India 2023. [Google Scholar] [CrossRef]
- Jawad, B.J.; Shaker, S.M.; Altintas, I.; Eugen-Olse, J.; Nehlin, J.; Andersen, O.; Kallemose, T. Development and validation of prognostic machine learning models for short- and long-term mortality among acutely hospitalized patients. Eur. PMC 2023. [Google Scholar] [CrossRef]
- Suresh, K.; Elkahwagi, M.A.; Garcia, A.; Naples, J.G.; Corrales, C.E.; Crowson, M.G. Development of a Predictive Model for Persistent Dizziness Following Vestibular Schwannoma Surgery. Laryngoscope 2023, 133, 3534–3539. [Google Scholar] [CrossRef] [PubMed]
- Ortiz-Perez, A.; Izquierdo Lozano, C.; Meijers, R.; Grisoni, F.; Albertazzi, L. Identification of fluorescently-barcoded nanoparticles using machine learning. Nanoscale Adv. 2023, 5, 2307–2317. [Google Scholar] [CrossRef]
- Ehlers, M.R.; Lonsdorf, T.B. Data sharing in experimental fear and anxiety research: From challenges to a dynamically growing database in 10 simple steps. Neurosci. Biobehav. Rev. 2022, 143, 104958. [Google Scholar] [CrossRef]
- Lu, P.J.; Chuang, J.-H. Fusion of Multi-Intensity Image for Deep Learning-Based Human and Face Detection. IEEE Access 2022, 10, 8816–8823. [Google Scholar] [CrossRef]
- Maghfour, J.; Ceresnie, M.; Olson, J.; Lim, H.W. The association between frontal fibrosing alopecia, sunscreen, and moisturizers: A systematic review and meta-analysis. J. Am. Acad. Dermatol. 2022, 87, 395–396. [Google Scholar] [CrossRef]
- Datasets|Kaggle. Kaggle.com. 2019. Available online: https://www.kaggle.com/datasets (accessed on 25 April 2023).
- UCI Machine Learning Repository: Data Sets. Uci.edu. 2009. Available online: https://archive.ics.uci.edu/dataset/45/heart+disease (accessed on 18 April 2023).
- Price, W.N., II; Cohen, I.G. Privacy in the age of medical big data. Nat. Med. 2019, 25, 37–43. [Google Scholar] [CrossRef] [PubMed]
- Cleveland, Hungarian, Switzerland, and VA Datasets. Available online: https://archive.ics.uci.edu/ml/datasets/heart+disease (accessed on 9 September 2023).
- Pathare, A.; Mangrulkar, R.; Suvarna, K.; Parekh, A.; Thakur, G.; Gawade, A. Comparison of tabular synthetic data generation techniques using propensity and cluster log metric. Int. J. Inf. Manag. Data Insights 2023, 3, 100177. [Google Scholar] [CrossRef]
- El-Bialy, R.; Salamay, M.A.; Karam, O.H.; Khalifa, M.E. Feature analysis of coronary artery heart disease data sets. Procedia Comput. Sci. 2015, 65, 459–468. [Google Scholar] [CrossRef]
- Sarra, R.R.; Dinar, A.M.; Mohammed, M.A.; Abdulkareem, K.H. Enhanced heart diseaseprediction based on machine learning and X2 statistical optimal feature selection model. Designs 2022, 6, 87. [Google Scholar] [CrossRef]
- Ahmed, I. A Study of Heart Disease Diagnosis Using Machine Learning and Data Mining. Master’s Thesis, California State University, San Bernardino, CA, USA, 2022. Volume 1591. Available online: https://scholarworks.lib.csusb.edu/etd/1591 (accessed on 9 September 2023).
- AutoML Comparison for Heart Disease Diagnosis GitHub Page. Available online: https://github.com/researchoutcome/automl-comparison-heart/ (accessed on 4 July 2023).
- Chandrasekhar, N.; Peddakrishna, S. Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization. Processes 2023, 11, 1210. [Google Scholar] [CrossRef]
- Mayor, J.M.; Preventza, O.; McGinigle, K.; Mills, J.L.; Montero-Baker, M.; Gilani, R.; Pallister, Z.; Chung, J. Persistent under-representation of female patients in United States trials of common vascular diseases from 2008 to 2020. J. Vasc. Surg. 2022, 75, 30–36. [Google Scholar] [CrossRef] [PubMed]
- Finkelhor, R.S.; Newhouse, K.E.; Vrobel, T.R.; Miron, S.D.; Bahler, R.C. The ST segment/heartrate slope as a predictor of coronary artery disease: Comparison with quantitative thallium imaging and conventional ST segment criteria. Am. Heart J. 1986, 112, 296–304. [Google Scholar] [CrossRef]
- Islam, M.M.; Haque, M.R.; Iqbal, H.; Hasan, H.M.M.; Hasan, M.; Kabir, M.N. Breast cancer prediction: A comparative study using machine learning techniques. SN Comput. Sci. 2020, 1, 290. [Google Scholar] [CrossRef]
- Alaa, A.M.; van der Schaar, M. AutoPrognosis: Automated clinical prognostic modeling via Bayesian optimization with structured kernel learning. arXiv 2018, arXiv:1802.07207. [Google Scholar] [CrossRef]
- Imrie, F.; Cebere, B.; McKinney, E.F.; van der Schaar, M. AutoPrognosis 2.0: Democratizing diagnostic and prognostic modeling in healthcare with automated machine learning. arXiv 2022, arXiv:2210.12090. [Google Scholar] [CrossRef]
- Liu, G.; Lu, D.; Lu, J. Pharm-AutoML: An open-source, end-to-end automated machine learning package for clinical outcome prediction. CPT Pharmacomet. Syst. Pharmacol. 2021, 10, 478–488. [Google Scholar] [CrossRef] [PubMed]
- Alaa, A.M.; van der Schaar, M. Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants. PLoS ONE 2019, 14, e0213653. [Google Scholar] [CrossRef] [PubMed]
Attribute | Description |
---|---|
Age | Age in years |
Sex | Sex (1 = male; 0 = female) |
Cp | Chest pain type |
Trestbps | Resting blood pressure (in mm Hg on admission to hospital) |
Chol | Serum cholesterol in mg/dL |
Fbs | Fasting blood sugar > 120 mg/dL (1 = true; 0 = false) |
Restecg | Resting electrocardiographic results |
Thalach | Maximum heart rate achieved |
Exang | Exercise-induced angina (1 = yes; 0 = no) |
Oldpeak | ST depression induced by exercise relative to rest |
Slope | The slope of the peak exercise ST segment |
Ca | Number of major vessels (0–3) colored by fluoroscopy |
Thal | 3 = normal; 6 = fixed defect; 7 = reversible defect |
Num | Diagnosis of heart disease (angiographic disease status) |
Attribute | Cleveland | Hungarian | Switzerland | VA |
---|---|---|---|---|
Trestbps | 0 | 1 | 2 | 56 |
Chol | 0 | 23 | 0 | 7 |
Fbs | 0 | 8 | 75 | 7 |
ReThalach | 0 | 1 | 1 | 53 |
Exang | 0 | 1 | 1 | 53 |
Oldpeak | 0 | 0 | 6 | 53 |
Slope | 0 | 190 | 17 | 102 |
Ca | 4 | 290 | 118 | 198 |
Thal | 2 | 266 | 52 | 166 |
Attribute | Cleveland | Hungarian | Combined |
---|---|---|---|
Thalach | −0.417167 | −0.331074 | −0.385972 |
Fbs | 0.025264 | 0.162869 | dropped |
Chol | 0.085164 | 0.202372 | -0.234679 |
Trestbps | 0.150825 | 0.139582 | 0.103828 |
Restecg | 0.169202 | −0.031988 | 0.062304 |
Age | 0.223120 | 0.159315 | 0.282700 |
Sex | 0.276816 | 0.272781 | 0.307284 |
Slope | 0.339213 | dropped | dropped |
Cp | 0.414446 | 0.505864 | 0.471712 |
Oldpeak | 0.424510 | 0.545700 | 0.373382 |
Exang | 0.431894 | 0.584541 | 0.443433 |
Ca | 0.460033 | dropped | dropped |
Thal | 0.522057 | dropped | dropped |
Machine Learning Algorithm | Accuracy (Correlated) | Accuracy (Unreduced) |
---|---|---|
Stochastic Gradient Descent (SGD) | 0.59 | 0.58 |
Logistic Regression | 0.59 | 0.59 |
Support Vector Machine (SVM) (Linear Kernel) | 0.55 | 0.57 |
Support Vector Machine (SVC) (RBF Kernel) | 0.57 | 0.56 |
Decision Tree | 0.52 | 0.49 |
Random Forest | 0.62 | 0.59 |
Extra Trees | 0.57 | 0.57 |
AdaBoost | 0.58 | 0.57 |
Gradient Boosting | 0.60 | 0.59 |
XGBoost | 0.55 | 0.56 |
Ensemble of the following: AdaBoost, Random Forest, Gradient Boosting, XGBoost, SVM-Linear, SGD, Logistic Regression | 0.60 | 0.58 |
Cleveland | |||
---|---|---|---|
Accuracy | F1 Score | Best Model | |
Unreduced: | |||
PyCaret | 0.8525 0.8215 0.8180 | 0.8037 0.7998 0.7939 | 1. Linear Discriminant Analysis 2. Ridge Classifier 3. Naïve Bayes |
AutoGluon | 0.8688 0.8688 0.8524 | 0.8709 0.8709 0.8524 | 1. WeightedEnsemble_L2 2. RandomForestGini 3. RandomForestEntr |
AutoKeras | 0.8033 | 0.8182 | N/A |
Correlated: | |||
PyCaret | 0.8137 0.8048 0.8008 | 0.8012 0.7814 0.7775 | 1. Logistic Regression 2. Linear Discriminant Analysis 3. Ridge Classifier |
AutoGluon | 0.7868 0.8524 0.8360 | 0.7796 0.8474 0.8333 | 1. WeightedEnsemble_L2 2. RandomForestGini 3. RandomForestEntr |
AutoKeras | 0.5410 | 0.6667 | N/A |
Combined | |||
---|---|---|---|
Accuracy | F1 Score | Best Model | |
Unreduced: | |||
PyCaret | 0.6873 0.6833 0.6832 | 0.6678 0.6839 0.6484 | 1. Logistic Regression 2. Linear Discriminant Analysis 3. Ridge Classifier |
AutoGluon | 0.8478 0.8478 0.8423 | 0.8691 0.8691 0.8651 | 1. WeightedEnsemble_L2 2. RandomForestEntr 3. ExtraTreesGini |
AutoKeras | 0.8152 | 0.8365 | N/A |
Correlated: | |||
PyCaret | 0.7826 0.7459 0.7432 | 0.7311 0.7168 0.7260 | 1. Random Forest Classifier 2. Ridge Classifier 3. Logistic Regression |
AutoGluon | 0.8423 0.8423 0.8369 | 0.8638 0.8638 0.8369 | 1. WeightedEnsemble_L2 2. RandomForestEntr 3. RandomForestGini |
AutoKeras | 0.8315 | 0.8545 | N/A |
Hungarian | |||
---|---|---|---|
Accuracy | F1 Score | Best Model | |
Unreduced: | |||
PyCaret | 0.6976 0.6806 0.6766 | 0.6465 0.6092 0.6334 | 1. Logistic Regression 2. Ridge Classifier 3. Linear Discriminant Analysis |
AutoGluon | 0.8475 0.8474 0.8305 | 0.7804 0.7804 0.7619 | 1.WeightedEnsemble_L2 2. RandomForestEntr 3. ExtraTreesEntr |
AutoKeras | 0.8305 | 0.7059 | N/A |
Correlated: | |||
PyCaret | 0.8304 0.8303 0.8263 | 0.7516 0.7506 0.7470 | 1. Ridge Classifier 2. Log. Regression 3. Linear Discriminant Analysis |
AutoGluon | 0.8983 0.8983 0.8644 | 0.8500 0.8500 0.8095 | 1. WeightedEnsemble_L2 2. RandomForestEntr 3. ExtraTreesGini |
AutoKeras | 0.8305 | 0.7059 | N/A |
Run 1 | Run 2 | Run 3 | Mean σ st. dev. | |||||
---|---|---|---|---|---|---|---|---|
Accuracy | Run Time | Accuracy | Run Time | Accuracy | Run Time | |||
Cleveland | Unreduced | 0.8197 | 40 m 51 s | 0.7705 | 6 m 5 s | 0.8033 | 6 m 35 s | 0.7978 σ 0.0251 |
Correlated | 0.7541 | 18 m 47 s | 0.8197 | 6 m 29 s | 0.7705 | 11 m 10 s | 0.7814 σ 0.0341 | |
Hungarian | Unreduced | 0.8644 | 7 m 36 s | 0.7966 | 10 m 29 s | 0.6610 | 52 m 27 s | 0.7740 σ 0.1035 |
Correlated | 0.8136 | 5 m 34 s | 0.8644 | 7 m 55 s | 0.6610 | 7 m 23 s | 0.7797 σ 0.1059 | |
Combined | Unreduced | 0.8478 | 18 m 24 s | 0.8207 | 15 m 57 s | 0.8478 | 12 m 54 s | 0.8388 σ 0.0156 |
Correlated | 0.7989 | 35 m 51 s | 0.8152 | 10 m 41 s | 0.7880 | 24 m 02 s | 0.8007 σ 0.0137 |
Layer (Type) | Output Shape | Param |
---|---|---|
Input1(InputLayer) | [(None, 13)] | 0 |
MultiCategoryEncoding | (None, 13) | |
(MultiCategoryEncoding) | 0 | 0 |
Normalization (Normalization) | (None, 13) | 27 |
Dense (Dense) | 0.150825 | 896 |
Relu(ReLU) | 0.169202 | 0 |
Dense1(Dense) | 0.223120 | 16,640 |
Relu1(ReLU) | 0.276816 | 0 |
Dense2(Dense) | 0.339213 | 32,896 |
Relu2(ReLU) | 0.414446 | 0 |
Dense3(Dense) | 0.424510 | 129 |
Classificationhead1 | 0.431894 | 0 |
(Activation) | 0.460033 | |
Total params: | 50,588 | |
Trainable params: | 50,561 | |
Non-trainable params: | 27 |
Model | Score_Test | Score_Val | Pred_Time_Tes | Pred_Time_Val | Fit_Time | Pred_Time_ Test_Marginal | Pred_Time_ Val_Marginal | Fit_Time_ Marginal |
---|---|---|---|---|---|---|---|---|
0 | RandomForestEntr | 0.847826 | 0.797297 | 0.036896 | 0.024798 | 0.2945 | 0.036896 | 0.024798 |
1 | WeightedEnsemble_L2 | 0.847826 | 0.810811 | 0.123012 | 0.074949 | 0.979291 | 0.001744 | 0.000401 |
2 | RandomForestGini | 0.842391 | 0.77027 | 0.037669 | 0.025544 | 0.297175 | 0.037669 | 0.025544 |
3 | ExtraTreesGini | 0.842391 | 0.783784 | 0.041508 | 0.024531 | 0.28753 | 0.041508 | 0.024531 |
4 | ExtraTreesEntr | 0.826087 | 0.77027 | 0.046702 | 0.024206 | 0.282546 | 0.046702 | 0.024206 |
5 | XGBoost | 0.809783 | 0.722973 | 0.005913 | 0.003386 | 0.023183 | 0.005913 | 0.003386 |
6 | KNeighborsDist | 0.690217 | 0.668919 | 0.00367 | 0.001662 | 0.004442 | 0.00367 | 0.001662 |
7 | KNeighborsUnif | 0.690217 | 0.662162 | 0.014753 | 0.002047 | 0.012023 | 0.014753 | 0.002047 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Paladino, L.M.; Hughes, A.; Perera, A.; Topsakal, O.; Akinci, T.C. Evaluating the Performance of Automated Machine Learning (AutoML) Tools for Heart Disease Diagnosis and Prediction. AI 2023, 4, 1036-1058. https://doi.org/10.3390/ai4040053
Paladino LM, Hughes A, Perera A, Topsakal O, Akinci TC. Evaluating the Performance of Automated Machine Learning (AutoML) Tools for Heart Disease Diagnosis and Prediction. AI. 2023; 4(4):1036-1058. https://doi.org/10.3390/ai4040053
Chicago/Turabian StylePaladino, Lauren M., Alexander Hughes, Alexander Perera, Oguzhan Topsakal, and Tahir Cetin Akinci. 2023. "Evaluating the Performance of Automated Machine Learning (AutoML) Tools for Heart Disease Diagnosis and Prediction" AI 4, no. 4: 1036-1058. https://doi.org/10.3390/ai4040053