Early Prediction in Classification of Cardiovascular Diseases with Machine Learning, Neuro-Fuzzy and Statistical Methods
Abstract
:Simple Summary
Abstract
1. Introduction and Objectives
1.1. Introduction and Bibliographical Review
1.2. Contributions and Plan of the Article
- ML, ANFIS and statistical classification tools supported by the Gifi method are utilized to predict CVDs early in a more precise way.
- The effects of seventeen parameters on CVDs are investigated in depth using response surface methodology (RSM).
- The obtained findings are matched with the state-of-the-art studies comprehensively.
- Sensitivity analyses are carried out for ANFIS and SVR to determine the influence of significant factors such as age, BMI, glucose, cholesterol, RBC and HDL/LDL cholesterol levels on CVD.
- The results of statistical approaches with the Gifi method are given using statistical classification tools and linear discriminant analysis.
- The Nash–Sutcliffe model efficiency (NSE) coefficient is used to quantitatively describe and assess the model output’s predictive accuracy.
- We compare the capability of an adaptive elastic net logistic regression (AENLR) [31] and Gifi transformation with ML techniques (SVR, MARS, M5Tree and ANNs).
2. Methodology
2.1. Dataset and Framework of the Study and Patients
2.2. Gifi System for Data Transformation
2.3. The Support Vector Machines Method
2.4. Fuzzy Rules and Membership Functions
2.5. The ANFIS Approach
2.6. Response Surface Method for Factor Assessment and Sensitivity Analysis
2.7. Statistical Approaches for CVD Classification
2.8. Flowchart of the Methodology
3. Results and Findings
3.1. Exploratory Data Analysis
3.2. ANFIS for CVD Prediction
3.3. Fuzzy Rules and Membership Functions
3.4. The ANFIS Approach for CVD Prediction
3.5. Elastic Net Modeling for CVD Prediction
3.6. ANNs and Pattern Recognition
3.7. Response Surface Method for Factor Assessment
3.8. Sensitivity Analysis
3.9. CVD Prediction
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Nomenclature
AENLR | Adaptive elastic logistic net regression |
ANFIS | Adaptive neuro-fuzzy inference system |
ANN | Artificial neural network |
ANOVA | Analysis of variance |
AI | Artificial intelligence |
BFG | Broyden–Fletcher–Goldfarb–Shanno quasi-Newton back propagation |
BMI | Body mass index |
BPML | Backpropagation multiple layer |
BPNN | Back propagation in neural network |
BR | Bayesian regularization |
CVD | Cardiovascular disease |
DF | Desirability function |
DP | Differential probability |
DT | Decision trees |
ECG | Electrocardiogram |
EEG | Electroencephalogram |
FFB | Feed forward back propagation |
FIS | Fuzzy inference system |
HbA1c | Glycated hemoglobin |
HDL | High density lipoprotein |
kNN | k-nearest neighbor |
LDA | Linear discriminant analysis |
LDL | Low density lipoprotein |
LM | Levenberg–Marquardt |
MAE | Mean absolute error |
MARS | Multivariate adaptive regression splines |
MBE | Mean bias error |
ME | Mean error |
MESA | Multiple ethnic studies of atherosclerosis |
MF | Membership functions |
ML | Machine learning |
NB | Naive Bayes |
NSE | Nash–Sutcliffe efficiency |
PMH | Past medical history |
QDA | Quadratic discriminant analysis |
RBC | Red blood cell |
RBFNN | Radial basis functions neural networks |
RSM | Response surface methodology |
RMSE | Root of the mean square error |
SCG | Scaled conjugate gradient |
SD | Standard deviation |
SF | Sensitivity factor |
SS | Sum of squares |
ST | Subject to |
SVM | Support vector machines |
SVR | Support vector regression |
WHO | World Health Organization |
References
- Pluta, K.; Porębska, K.; Urbanowicz, T.; Gąsecka, A.; Olasińska-Wiśniewska, A.; Targoński, R.; Krasińska, A.; Filipiak, K.J.; Jemielity, M.; Krasiński, Z. Platelet–Leucocyte Aggregates as Novel Biomarkers in Cardiovascular Diseases. Biology 2022, 11, 224. [Google Scholar] [CrossRef] [PubMed]
- World Health Organization. Cardiovascular Diseases (CVDs). 2021. Available online: www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) (accessed on 23 September 2022).
- Gandin, I.; Scagnetto, A.; Romani, S.; Barbati, G. Interpretability of time-series deep learning models: A study in cardiovascular patients admitted to Intensive care unit. J. Biomed. Inform. 2021, 121, 103876. [Google Scholar] [CrossRef] [PubMed]
- Alaa, A.M.; Bolton, T.; Di Angelantonio, E.; Rudd, J.H.; Van der Schaar, M. Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants. PLoS ONE 2019, 14, e0213653. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Alkadya, W.; ElBahnasy, K.; Leiva, V.; Gad, W. Classifying COVID-19 based on amino acids encoding with machine learning algorithms. Chemom. Intell. Lab. Syst. 2022, 224, 104535. [Google Scholar] [CrossRef] [PubMed]
- Sardar, I.; Akbar, M.A.; Leiva, V.; Alsanad, A.; Mishra, P. Machine learning and automatic ARIMA/Prophet models-based forecasting of COVID-19: Methodology, evaluation and case study in SAARC countries. Stoch. Environ. Res. Risk Assess. 2023, in press. [CrossRef]
- Chaouch, H.; Charfeddine, S.; Aoun, S.B.; Jerbi, H.; Leiva, V. Multiscale monitoring using machine learning methods: New methodology and an industrial application to a photovoltaic system. Mathematics 2022, 10, 890. [Google Scholar] [CrossRef]
- Nikam, A.; Bhandari, S.; Mhaske, A.; Mantri, S. Cardiovascular disease prediction using machine learning models. In Proceedings of the 2020 IEEE Pune Section International Conference, Pune, India, 16–18 December 2020; pp. 22–27. [Google Scholar]
- Meshref, H. Cardiovascular disease diagnosis: A machine learning interpretation approach. Int. J. Adv. Comput. Sci. Appl. 2019, 10, 258–269. [Google Scholar] [CrossRef] [Green Version]
- Şahin Sadık, E.; Saraoğlu, H.M.; Canbaz Kabay, S.; Tosun, M.; Keskinkılıç, C.; Akdağ, G. Investigation of the effect of rosemary odor on mental workload using EEG: An artificial intelligence approach. Signal Image Video Process. 2022, 16, 497–504. [Google Scholar] [CrossRef]
- Krittanawong, C.; Virk, H.U.H.; Bangalore, S.; Wang, Z.; Johnson, K.W.; Rachel, P.; Hong, Z.; Scott, K.; Bharat, N.; Takeshi, K. Machine learning prediction in cardiovascular diseases: A meta-analysis. Sci. Rep. 2020, 10, 16057. [Google Scholar] [CrossRef]
- Aruna, S.; Rajagopalan, S.P.; Nandakishore, L.V. Knowledge based analysis of various statistical tools in detecting breast cancer. Comput. Sci. Inf. Technol. 2011, 2, 37–45. [Google Scholar]
- Ahmad, T.; Lund, L.H.; Rao, P.; Ghosh, R.; Warier, P.; Vaccaro, B.; Dahlström, U.; O’Connor, C.M.; Felker, G.M.; Desai, N.R. Machine learning methods improve prognostication, identify clinically distinct phenotypes and detect heterogeneity in response to therapy in a large cohort of heart failure patients. J. Am. Heart Assoc. 2018, 7, e008081. [Google Scholar] [CrossRef] [PubMed]
- Weng, S.F.; Reps, J.; Kai, J.; Garibaldi, J.M.; Qureshi, N. Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS ONE 2017, 12, e0174944. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ambale-Venkatesh, B.; Yang, X.; Wu, C.O.; Liu, K.; Hundley, W.G.; McClelland, R.; Gomes, A.S.; Folsom, A.R.; Shea, S.; Guallar, E.; et al. Cardiovascular event prediction by machine learning: The multi-ethnic study of atherosclerosis. Circ. Res. 2017, 121, 1092–1101. [Google Scholar] [CrossRef] [PubMed]
- Gonsalves, A.H.; Thabtah, F.; Mohammad, R.M.A.; Singh, G. Prediction of coronary heart disease using machine learning: An experimental analysis. In Proceedings of the 3rd International Conference on Deep Learning Technologies, Kochi, India, 17–20 October 2019; pp. 51–56. [Google Scholar]
- Elsayed, H.A.G.; Syed, L. An automatic early risk classification of hard coronary heart diseases using framingham scoring model. In Proceedings of the Second International Conference on Internet of Things, Data and Cloud Computing, Cambridge, UK, 22–23 March 2017; pp. 1–8. [Google Scholar]
- El Bialy, R.; Salama, M.A.; Karam, O. An ensemble model for heart disease datasets: A generalized model. In Proceedings of the 10th International Conference on Informatics and Systems, Giza, Egypt, 9–11 May 2016; pp. 191–196. [Google Scholar]
- Rajliwall, N.S.; Davey, R.; Chetty, G. Machine learning based models for cardiovascular risk prediction. In Proceedings of the 2018 International Conference on Machine Learning and Data Engineering, Sydney, Australia, 3–7 December 2018; pp. 142–148. [Google Scholar]
- Rahim, A.; Rasheed, Y.; Azam, F.; Anwar, M.W.; Rahim, M.A.; Muzaffar, A.W. An integrated machine learning framework for effective prediction of cardiovascular diseases. IEEE Access 2021, 9, 106575–106588. [Google Scholar] [CrossRef]
- Krishnani, D.; Kumari, A.; Dewangan, A.; Singh, A.; Naik, N.S. Prediction of coronary heart disease using supervised machine learning algorithms. In Proceedings of the 2019 IEEE Region 10 Conference, Kochi, India, 17–20 October 2019; pp. 367–372. [Google Scholar]
- Bede, B. Mathematics of Fuzzy Sets and Fuzzy Logic; Springer: Berlin, Germany, 2013. [Google Scholar]
- Taylan, O.; Taskin, H. Fuzzy modeling of a production system. J. Nav. Sci. Eng. 2003, 1, 1. [Google Scholar]
- Taylan, O.; Darrab, I.A. Determining optimal quality distribution of latex weight using adaptive neuro-fuzzy modeling and control systems. Comput. Ind. Eng. 2011, 61, 686–696. [Google Scholar] [CrossRef]
- Taylan, O.; Darrab, I.A. Fuzzy control charts for process quality improvement and product assessment in tip shear carpet industry. J. Manuf. Technol. Manag. 2012, 23, 402–420. [Google Scholar] [CrossRef]
- Taylan, O.; Karagözoğlu, B. An adaptive neuro-fuzzy model for prediction of student’s academic performance. Comput. Ind. Eng. 2009, 57, 732–741. [Google Scholar] [CrossRef]
- Ziasabounchi, N.; Askerzade, I. ANFIS based classification model for heart disease prediction. Int. J. Electr. Comput. Sci. 2014, 14, 7–12. [Google Scholar]
- Aghdam, A.D.; Dabanloo, N.J.; Sattari, M.; Attarodi, G.; Hemmati, N. Design and processing of a novel algorithm using ANFIS for new generation of cardiac pacemakers. In Proceedings of the 2017 Computing in Cardiology, Rennes, France, 24–27 September 2017; pp. 1–4. [Google Scholar]
- Bhuvaneswari, N.G.A. An intelligent approach based on principal component analysis and adaptive neuro-fuzzy inference system for predicting the risk of cardiovascular diseases. In Proceedings of the 2013 Fifth International Conference on Advanced Computing, Chennai, India, 18–20 December 2013; pp. 241–245. [Google Scholar]
- Lawal, A.I.; Kwon, S. Application of artificial intelligence to rock mechanics: An overview. J. Rock Mech. Geotech. Eng. 2021, 13, 248–266. [Google Scholar] [CrossRef]
- Zou, H.; Zhang, H.H. On the adaptive elastic-net with a diverging number of parameters. Ann. Stat. 2009, 37, 1733–1751. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liao, D.; Cooper, L.; Cai, J.; Toole, J.; Bryan, N.; Burke, G.; Heiss, G. The prevalence and severity of white matter lesions, their relationship with age, ethnicity, gender and cardiovascular disease risk factors: The ARIC study. Neuroepidemiology 1997, 16, 149–162. [Google Scholar] [CrossRef] [PubMed]
- Roeters van Lennep, J.E.; Westerveld, H.T.; Erkelens, D.W.; van der Wall, E.E. Risk factors for coronary heart disease: Implications of gender. Cardiovasc. Res. 2002, 53, 538–549. [Google Scholar] [CrossRef]
- Anderssen, S.A.; Cooper, A.R.; Riddoch, C.; Sardinha, L.B.; Harro, M.; Brage, S.; Andersen, L.B. Low cardiorespiratory fitness is a strong predictor for clustering of cardiovascular disease risk factors in children independent of country, age and sex. Eur. J. Prev. Cardiol. 2007, 14, 526–531. [Google Scholar] [CrossRef]
- Dahlof, B. Cardiovascular disease risk factors: Epidemiology and risk assessment. Am. J. Cardiol. 2010, 105, 3A–9A. [Google Scholar] [CrossRef] [PubMed]
- Kurian, A.K.; Cardarelli, K.M. Racial and ethnic differences in cardiovascular disease risk factors. Ethn. Dis. 2007, 17, 143–152. [Google Scholar] [PubMed]
- Sibai, A.M.; Nasreddine, L.; Mokdad, A.H.; Adra, N.; Tabet, M.; Hwalla, N. Nutrition transition and cardiovascular disease risk factors in Middle East and North Africa countries: Reviewing the evidence. Ann. Nutr. Metab. 2010, 57, 193–203. [Google Scholar] [CrossRef] [PubMed]
- Hertz, J.T.; Kweka, G.L.; Bloomfield, G.S.; Limkakeng, A.T., Jr.; Loring, Z.; Temu, G.; Sakita, F.M. Patterns of emergency care for possible acute coronary syndrome among patients with chest pain or shortness of breath at a Tanzanian referral hospital. Glob. Heart 2020, 15, 9. [Google Scholar] [CrossRef] [Green Version]
- Stampfer, M.J.; Willett, W.C.; Colditz, G.A.; Speizer, F.E.; Hennekens, C.H. A prospective study of past use of oral contraceptive agents and risk of cardiovascular diseases. N. Engl. J. Med. 1988, 319, 1313–1317. [Google Scholar] [CrossRef]
- Denes, P.; Larson, J.C.; Lloyd-Jones, D.M.; Prineas, R.J.; Greenland, P. Major and minor ECG abnormalities in asymptomatic women and risk of cardiovascular events and mortality. JAMA 2007, 297, 978–985. [Google Scholar] [CrossRef]
- Naghavi-Behzad, M.; Alizadeh, M.; Azami, S.; Foroughifar, S.; Ghasempour-Dabbaghi, K.; Karzad, N.; Naghavi-Behzad, A. Risk factors of congenital heart diseases: A case-control study in Northwest Iran. J. Cardiovasc. Thorac. Res. 2013, 5, 5. [Google Scholar]
- Weycker, D.; Nichols, G.A.; O’Keeffe-Rosetti, M.; Edelsberg, J.; Khan, Z.M.; Kaura, S.; Oster, G. Risk-factor clustering and cardiovascular disease risk in hypertensive patients. Am. J. Hypertens. 2007, 20, 599–607. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Twisk, J.W.; Kemper, H.C.; van Mechelen, W. Tracking of activity and fitness and the relationship with cardiovascular disease risk factors. Med. Sci. Sport. Exerc. 2000, 32, 1455–1461. [Google Scholar] [CrossRef] [PubMed]
- Eisenmann, J.C. Physical activity and cardiovascular disease risk factors in children and adolescents: An overview. Can. J. Cardiol. 2004, 20, 295–301. [Google Scholar] [PubMed]
- Barroso, T.A.; Marins, L.B.; Alves, R.; Gonçalves, A.C.S.; Barroso, S.G.; Rocha, G.D.S. Association of central obesity with the incidence of cardiovascular diseases and risk factors. Int. J. Cardiovasc. Sci. 2017, 30, 416–424. [Google Scholar] [CrossRef]
- Borg, R.; Kuenen, J.C.; Carstensen, B.; Zheng, H.; Nathan, D.M.; Heine, R.J.; Witte, D.R. HbA1c and mean blood glucose show stronger associations with cardiovascular disease risk factors than do postprandial glycaemia or glucose variability in persons with diabetes: The A1C-derived average glucose (ADAG) study. Diabetologia 2011, 54, 69–72. [Google Scholar] [CrossRef] [Green Version]
- Kameneva, M.V.; Garrett, K.O.; Watach, M.J.; Borovetz, H.S. Red blood cell aging and risk of cardiovascular diseases. Clin. Hemorheol. Microcirc. 1998, 18, 67–74. [Google Scholar]
- Rosiek, A.; Leksowski, K. The risk factors and prevention of cardiovascular disease: The importance of electrocardiogram in the diagnosis and treatment of acute coronary syndrome. Ther. Clin. Risk Manag. 2016, 12, 1223. [Google Scholar] [CrossRef] [Green Version]
- Michailidis, G.; de Leeuw, J. The Gifi system of descriptive multivariate analysis. Stat. Sci. 1998, 13, 307–336. [Google Scholar] [CrossRef]
- Bozdogan, H. Mixture-model cluster analysis using a new informational complexity and model selection criteria. In Multivariate Statistical Modeling; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1994; pp. 69–113. [Google Scholar]
- Bozdogan, H. Akaike’s information criterion and recent developments in information complexity. J. Math. Psychol. 2000, 44, 62–91. [Google Scholar] [CrossRef] [Green Version]
- Bozdogan, H. Intelligent statistical data mining with information complexity and genetic algorithms. In Statistical Data Mining and Knowledge Discovery; Chapman and Hall/CRC: New York, NY, USA, 2003; pp. 15–56. [Google Scholar]
- Bozdogan, H. A new class of information complexity (ICOMP) criteria with an application to customer profiling and segmentation. İstanbul Üniversitesi İşletme Fakültesi Dergisi 2010, 39, 370–398. [Google Scholar]
- Gifi, A. Nonlinear Multivariate Analysis; Wiley: Chichester, UK, 1990. [Google Scholar]
- Brereton, R.G.; Lloyd, G.R. Support vector machines for classification and regression. Analyst 2010, 135, 230–267. [Google Scholar] [CrossRef] [PubMed]
- Lu, C.J. Sales forecasting of computer products based on variable selection scheme and support vector regression. Neurocomputing 2014, 128, 491–499. [Google Scholar] [CrossRef]
- Taylan, O. Neural and fuzzy model performance evaluation of a dynamic production system. Int. J. Prod. Res. 2006, 44, 1093–1105. [Google Scholar] [CrossRef]
- Sariev, E.; Germano, G. Bayesian regularized artificial neural networks for the estimation of the probability of default. Quant. Financ. 2020, 20, 311–328. [Google Scholar] [CrossRef]
- Dwivedi, A.K. Performance evaluation of different machine learning techniques for prediction of heart disease. Neural Comput. Appl. 2018, 29, 685–693. [Google Scholar] [CrossRef]
- Mohan, S.; Thirumalai, C.; Srivastava, G. Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 2019, 7, 81542–81554. [Google Scholar] [CrossRef]
- Haq, A.U.; Li, J.P.; Memon, M.H.; Nazir, S.; Sun, R. A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mob. Inf. Syst. 2018, 2018, 3860146. [Google Scholar] [CrossRef]
- Kim, J.K.; Kang, S. Neural network-based coronary heart disease risk prediction using feature correlation analysis. J. Healthc. Eng. 2017, 2017, 2780501. [Google Scholar] [CrossRef] [Green Version]
- Dinesh, K.G.; Arumugaraj, K.; Santhosh, K.D.; Mareeswari, V. Prediction of cardiovascular disease using machine learning algorithms. In Proceedings of the 2018 International Conference on Current Trends towards Converging Technologies, Coimbatore, India, 1–3 March 2018; pp. 1–7. [Google Scholar]
- Sun, W.; Zhang, P.; Wang, Z.; Li, D. Prediction of cardiovascular diseases based on machine learning. ASP Trans. Internet Things 2021, 1, 30–35. [Google Scholar] [CrossRef]
- Asif, M.; Nishat, M.M.; Faisal, F.; Dip, R.R.; Udoy, M.H.; Shikder, M.; Ahsan, R. Performance evaluation and comparative analysis of different machine learning algorithms in predicting cardiovascular disease. Eng. Lett. 2021, 29, 2. [Google Scholar]
- Bustos, N.; Tello, M.; Droppelmann, G.; García, N.; Feijoo, F.; Leiva, V. Machine learning techniques as an efficient alternative diagnostic tool for COVID-19 cases. Signa Vitae 2022, 18, 23–33. [Google Scholar]
Variable | Notation | Data Type | Coding and Description |
---|---|---|---|
Gender | Nominal | Female (1); Male (0) | |
Age | Continuous | Age of patents | |
Nationality | Nominal | SA = 1; EG = 2; SU = 3; YE = 4; | |
IND = 5; JOR = 6; PAK = 7 PAL = 8; ETH = 9; | |||
CAN = 10; PHL = 11; TUN = 12; SY = 13 | |||
Symptoms | Nominal | SOB: shortness of breath; PMH: past medical history | |
PMH | Ordinal | PMH: Past medical history (DM: Diabetes mellitus = 1); | |
HTN: Hypertension = 1; | |||
NAD: No abnormality detected = 0; DM and HTN = 3 | |||
Smoking | Ordinal | No = 0; Yes = 1 | |
Activity | Ordinal | Low = 1; Normal = 0 | |
BMI | Continuous | Body mass index | |
Systolic | Continuous | Systolic blood pressure | |
Diastolic | Continuous | Diastolic blood pressure | |
F-glucose | Continuous | Blood sugar (glucose) level | |
HbA1c | Continuous | Three-months average blood glucose (sugar) levels | |
Cholesterol | Continuous | Cholesterol test | |
RBC | Continuous | Red blood cell | |
LDL | Continuous | Low density lipoprotein | |
HDL | Continuous | High density lipoprotein | |
ECG | Ordinal | Electrocardiogram test; Normal = 1; Otherwise = 0 | |
Diagnosis (CVD) | Nominal | Cardio disease = 1, No cardio diseases = 0 |
Variable | Source |
---|---|
Gender | Liao et al., 1997 [32]; Roeters van Lennep et al., 2002 [33]; Anderssen et al., 2007 [34] |
Age | Anderssen et al., 2007 [34]; Dahlof, 2010 [35] |
Nationality | Kurian and Cardarelli, 2007 [36]; Sibai et al., 2010 [37] |
Symptoms | Hertz et al., 2020 [38] |
PMH | Stampfer et al., 1988 [39]; Denes et al., 2007 [40]; Naghavi-Behzad et al., 2013 [41] |
Smoking | Weycker et al., 2007 [42]; Dahlof, 2010 [35] |
Activity | Twisk, 2000 [43]; Eisenmann, 2004 [44] |
BMI | Weycker et al., 2007 [42]; Barroso et al., 2017 [45] |
Systolic blood pressure | Weycker et al., 2007 [42]; |
Diastolic blood pressure | Denes et al., 2007 [40]; Weycker et al., 2007 [42] |
F-glucose | Weycker et al., 2007 [42] |
HbA1c | Weycker et al., 2007 [42]; Borg et al., 2011 [46] |
Cholesterol | Dahlof, 2010 [35] |
RBC | Kameneva et al., 1998 [47]; Dahlof, 2010 [35] |
LDL | Weycker et al., 2007 [42] |
HDL | Weycker et al., 2007 [42] |
ECG | Dahlof, 2010 [35]; Rosiek and Leksowski, 2016 [48] |
Variable | Notation | n | Mean | Standard Deviation | Median | Minimum | Maximum | Range | Skewness | Kurtosis |
---|---|---|---|---|---|---|---|---|---|---|
Age | 159 | 55.21 | 14.7 | 56 | 17 | 82 | 65 | −0.25 | −0.61 | |
BMI | 159 | 26.45 | 6.82 | 26 | 14 | 42 | 28 | 0.37 | −0.6 | |
Systolic BP | 159 | 139.5 | 18.74 | 140 | 96 | 179 | 83 | −0.16 | −0.52 | |
Diastolic BP | 159 | 81.8 | 11.56 | 86 | 50 | 103 | 53 | −0.56 | −0.37 | |
F-glucose | 159 | 6.33 | 1.45 | 6.1 | 3.89 | 10.5 | 6.61 | 0.48 | −0.46 | |
HbA1c | 159 | 6.54 | 1.65 | 6.4 | 3.6 | 11.7 | 8.1 | 0.66 | 0.08 | |
Cholesterol | 159 | 4.95 | 1.16 | 4.9 | 2.69 | 7.4 | 4.71 | 0.07 | −0.88 | |
RBC | 159 | 2.42 | 1.02 | 2.3 | 0.37 | 4.6 | 4.23 | 0.05 | −0.83 | |
LDL | 159 | 3.24 | 1.02 | 3.45 | 0.96 | 5.38 | 4.42 | −0.21 | −0.58 | |
HDL | 159 | 1.44 | 0.55 | 1.34 | 0.09 | 3.9 | 3.81 | 1.83 | 5.44 |
Variable | Notation | Values or Categories | ||
---|---|---|---|---|
Gender | X1 | Female | Male | |
61.01% | 38.99% | |||
Symptoms | X4 | SOB | PMH | NN |
61.01% | 30.82% | 8.18% | ||
PMH | X5 | DM | HTN | DM,HTN |
41.51% | 33.33% | 25.16% | ||
Smoking | X6 | NO | PAST | YES |
64.78% | 10.06% | 25.16% | ||
Activity | X7 | LOW | NORMAL | |
54.72% | 45.28% | |||
ECG | X17 | Normal | Change,STE,STD,SVT | |
49.06% | 50.94% | |||
Diagnosis (CVD) | - | No heart diseases | Heart diseases | |
60.38 % | 39.62% |
Model | MAE | RMSE | MBE | DF | NSE |
---|---|---|---|---|---|
Errors for Training Phase of Models | |||||
SVR | 0.0387 | 0.0389 | 0.0046 | 0.9583 | 0.9195 |
MARS | 0.2700 | 0.3402 | 0.0021 | 0.6560 | 0.4383 |
M5Tree | 0.2541 | 0.3382 | 0.0060 | 0.6782 | 0.4714 |
ANN–BR | 0.2744 | 0.3557 | 0.0035 | 0.6399 | 0.4291 |
ANN–SCG | 0.0847 | 0.1232 | 0.0008 | 0.9066 | 0.8237 |
ANN–BFG | 0.0923 | 0.1253 | 0.0002 | 0.8987 | 0.8079 |
ANN–LM | 0.1246 | 0.1564 | 0.0012 | 0.8620 | 0.7407 |
RBFNN | 0.2185 | 0.2980 | 0.0021 | 0.7346 | 0.5455 |
ANFIS | 0.0165 | 0.0697 | 0.0028 | 0.9829 | 0.9656 |
Errors for Testing Phase of Models | |||||
SVR | 0.2165 | 0.2965 | −0.0041 | 0.7163 | 0.5382 |
MARS | 0.2329 | 0.2870 | −0.1011 | 0.6928 | 0.5032 |
M5Tree | 0.2002 | 0.3055 | −0.1059 | 0.7493 | 0.5730 |
ANN–BR | 0.1774 | 0.2622 | −0.0914 | 0.7779 | 0.6215 |
ANN–SCG | 0.2720 | 0.3842 | −0.0344 | 0.7101 | 0.4198 |
ANN–BFG | 0.2653 | 0.3677 | −0.0489 | 0.7191 | 0.4339 |
ANN–LM | 0.3500 | 0.4289 | −0.1516 | 0.6629 | 0.2533 |
RBFNN | 0.2553 | 0.3395 | −0.0950 | 0.6857 | 0.4554 |
ANFIS | 0.2085 | 0.3292 | 0.0062 | 0.7600 | 0.5551 |
Modeling Approach | Number of Rules and MFs | Training RMSE | Number of Rules and MFs | Training RMSE |
---|---|---|---|---|
ANFIS | 9 | 0.080 | 4 | 17.442 |
11 | 11.119 | 6 | 6.444 | |
21 | 19.759 | 7 | 0.0697 | |
15 | 17.585 | 5 | 15.525 |
0.05 | 0.1 | 0.2 | 0.05 | 0.1 | 0.2 | 0.05 | 0.1 | 0.2 | |
0.50 | 0.0487 | 0.0924 | 0.1853 | 0.0487 | 0.0924 | 0.1853 | 0.0498 | 0.0996 | 0.1996 |
0.75 | 0.0486 | 0.0957 | 0.1917 | 0.0485 | 0.0958 | 0.1917 | 0.0389 | 0.0958 | 0.1907 |
1.00 | 0.049 | 0.0936 | 0.1753 | 0.0488 | 0.0935 | 0.1751 | 0.0725 | 0.1032 | 0.1838 |
1.50 | 0.0436 | 0.0934 | 0.1693 | 0.0484 | 0.0934 | 0.1691 | 0.1367 | 0.1512 | 0.1963 |
SVR | MASR | M5Tree | ANN–BR | ANN–SCG | ANN–BFG | ANN–LM | RBFNN | ANFIS | |
---|---|---|---|---|---|---|---|---|---|
ME | 0.0028 | −0.0187 | −0.0165 | −0.0156 | −0.0063 | −0.0097 | −0.0296 | −0.0174 | 0.0035 |
SD | 0.1375 | 0.3297 | 0.3315 | 0.3386 | 0.2044 | 0.1992 | 0.2359 | 0.3063 | 0.1602 |
Observation | SVR | MASR | M5Tree | ANN–BR | ANN–CG | ANN–BFG | ANN–LM | RBFNN | ANFIS | |
---|---|---|---|---|---|---|---|---|---|---|
Correlation | 1 | 0.962 | 0.739 | 0.737 | 0.725 | 0.909 | 0.915 | 0.886 | 0.780 | 0.947 |
SD | 0.489 | 0.437 | 0.348 | 0.335 | 0.327 | 0.469 | 0.476 | 0.499 | 0.394 | 0.495 |
True Outputs | SVR | MARS | M5Tree | ANN–BR | ANN–CG | ANN–BFG | ANN–LM | RBF–NN | ANFIS |
---|---|---|---|---|---|---|---|---|---|
1 | 0.96 | 0.923 | 0.662 | 0.704 | 0.999 | 1.052 | 1.083 | 0.603 | 0.999 |
0 | 0.04 | 0.550 | 0.208 | 0.380 | 0.015 | 0.014 | −0.027 | 0.256 | 0.000 |
1 | 0.96 | 0.277 | 0.122 | 0.128 | 0.872 | 0.859 | 0.698 | 0.337 | 1.000 |
0 | 0.04 | −0.048 | 0.179 | 0.060 | 0.044 | 0.088 | −0.083 | 0.077 | 0.000 |
1 | 0.96 | 0.928 | 0.567 | 0.647 | 1.003 | 1.113 | 0.915 | 0.899 | 1.000 |
0 | 0.021 | 0.035 | 0.038 | 0.152 | −0.082 | −0.060 | 0.016 | 0.097 | 0.000 |
1 | 0.96 | 0.785 | 0.872 | 0.864 | 0.975 | 0.932 | 1.001 | 0.785 | 1.000 |
0 | 0.04 | 0.426 | 0.104 | 0.260 | 0.220 | 0.029 | 0.037 | 0.007 | 0.000 |
1 | 0.96 | 0.471 | 0.730 | 0.475 | 0.947 | 0.964 | 0.931 | 0.732 | 1.000 |
0 | −0.03 | 0.059 | 0.047 | 0.032 | −0.047 | 0.009 | 0.215 | 0.037 | 0.000 |
1 | 0.96 | 0.405 | 0.541 | 0.421 | 0.806 | 0.805 | 0.647 | 0.741 | 1.000 |
0 | 0.04 | 0.247 | 0.294 | 0.343 | 0.201 | 0.169 | 0.042 | 0.240 | 0.000 |
Indicator | Estimated Value | Standard Error Value | t Statistic | p Value | |
---|---|---|---|---|---|
Y-intercept | −4.828 | 3.862 | −1.250 | 0.211 | |
Gender | 51.094 | 19.834 | 2.576 | 0.001 | |
Age | 0.041 | 0.031 | 1.301 | 0.193 | |
Nationality | 5.221 | 4.773 | 1.094 | 0.274 | |
Symptoms | 7.198 | 8.669 | 0.830 | 0.406 | |
PMH | −2.703 | 10.288 | −0.263 | 0.792 | |
Smoking | −15.231 | 17.806 | −0.855 | 0.392 | |
Activity | −1311 | 492.99 | −2.659 | 0.007 | |
BMI | −0.022 | 0.050 | −0.436 | 0.663 | |
Systolic | 0.021 | 0.028 | 0.736 | 0.462 | |
Diastolic | −0.015 | 0.040 | −0.383 | 0.702 | |
F-glucose | 0.201 | 0.414 | 0.485 | 0.627 | |
HbA1c | 0.113 | 0.413 | 0.274 | 0.784 | |
Cholesterol | −0.163 | 0.361 | −0.451 | 0.652 | |
RBC | 0.063 | 0.406 | 0.154 | 0.877 | |
LDL | −0.023 | 0.479 | −0.048 | 0.961 | |
HDL | −0.640 | 0.544 | −1.177 | 0.239 | |
ECG | 30.677 | 10.85 | 2.827 | 0.004 |
Method | TN | FN | TP | FP | CA | CER | Sensitivity | Specificity |
---|---|---|---|---|---|---|---|---|
LDA | 80 | 6 | 57 | 16 | 86.16 | 0.1383 | 90.47 | 6.25 |
QDA | 78 | 7 | 56 | 18 | 84.27 | 0.1572 | 88.88 | 7.29 |
kNN | 79 | 7 | 56 | 17 | 84.91 | 0.1509 | 88.88 | 7.29 |
NB | 77 | 6 | 57 | 19 | 84.27 | 0.1572 | 30.16 | |
DT | 79 | 6 | 57 | 17 | 85.53 | 0.1446 | 82.29 | 26.98 |
Methods | Accuracy (%) | Miss Rate (%) |
---|---|---|
Naive Bayes [60] | 75.80 | 24.20 |
HRFLM [60] | 88.40 | 11.60 |
Decision tree [60] | 85.00 | 15.00 |
SVM [61] | 88.00 | 12.00 |
Fuzzy-based ML | 91.30 | 08.70 |
Framingham risk score [62] | 687.04 | 12.96 |
Logistic regression [61] | 89.00 | 11.00 |
Logistic regression [62] | 86.11 | 13.89 |
ANFIS, our findings | 94.70 | 5.30 |
ANN-LM, our findings | 96.20 | 3.80 |
ANN-BFG, our findings | 91.50 | 08.50 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Taylan, O.; Alkabaa, A.S.; Alqabbaa, H.S.; Pamukçu, E.; Leiva, V. Early Prediction in Classification of Cardiovascular Diseases with Machine Learning, Neuro-Fuzzy and Statistical Methods. Biology 2023, 12, 117. https://doi.org/10.3390/biology12010117
Taylan O, Alkabaa AS, Alqabbaa HS, Pamukçu E, Leiva V. Early Prediction in Classification of Cardiovascular Diseases with Machine Learning, Neuro-Fuzzy and Statistical Methods. Biology. 2023; 12(1):117. https://doi.org/10.3390/biology12010117
Chicago/Turabian StyleTaylan, Osman, Abdulaziz S. Alkabaa, Hanan S. Alqabbaa, Esra Pamukçu, and Víctor Leiva. 2023. "Early Prediction in Classification of Cardiovascular Diseases with Machine Learning, Neuro-Fuzzy and Statistical Methods" Biology 12, no. 1: 117. https://doi.org/10.3390/biology12010117
APA StyleTaylan, O., Alkabaa, A. S., Alqabbaa, H. S., Pamukçu, E., & Leiva, V. (2023). Early Prediction in Classification of Cardiovascular Diseases with Machine Learning, Neuro-Fuzzy and Statistical Methods. Biology, 12(1), 117. https://doi.org/10.3390/biology12010117