Big Data Analytics Framework Using Squirrel Search Optimized Gradient Boosted Decision Tree for Heart Disease Diagnosis
Abstract
:1. Introduction
- Development of a novel squirrel search-optimized Gradient Boosted Decision Tree (SS-GBDT) framework for the early diagnosis of heart disease;
- Development of comprehensive methodology that includes data preprocessing, feature extraction using word2Vec and classification using SS-GBDT;
- Validation of the proposed SS-GBDT method using various performance indicators and comparing it with other state-of-the-art ML-based techniques, thereby highlighting the superiority and applicability of the proposed approach.
2. Literature Survey
3. Research Methodology
3.1. Dataset
3.2. Preprocessing Using Min–Max Normalization
3.3. Feature Extraction Using Word2vec
3.4. Classification Using Squirrel Search-Optimized Gradient Boosted Decision Tree
- Stage 1 (Initialize the Variables): The total number of iterations (), population size (), the total number of decision variables (), the likelihood that a predator would be present ()), the scaling factor (), the gliding constant () and the upper and lower bounds for the decision variables () and (). At the outset of the squirrel search optimization procedure, certain decisions are made.
- Stage 2: Initializing flying squirrels randomly, the starting point for squirrel search optimization, as in other population-based algorithm, is a haphazard position for flying squirrels. There are a certain number of flying squirrels () in a forest and their locations may be determined. A consistent distribution is used to establish each flying squirrels’ starting location inside the forest. The coordinates are initialized at random in the search process as follows in Equation (2):
- Stage 3 (Fitness Evaluation): By inputting the decision variable’s values into a user-defined FF and calculating the associated values, each ’s fitness is assessed. The sort of food supplies an is seeking—whether an ideal, typical or nonexistent one—and, therefore, its chances of survival, are indicated by the FV of that FLS’s location. The location of a flying squirrel’s FV is evaluated by inserting the number of choice variables into an FF and is determined in Equation (3):
- Stage 4: Statement, organizing and Random collection:
- Step 5: Use aerodynamic gliding to create new positions:
- Stage 6: Ending criteria:
Algorithm 1: Squirrel Search (SS) |
Input: Set the majority locations at random starting points about the lower and higher bound limits. |
Result: Highly integrated solution. |
Stage 1: Produce a random position for flying squirrels. |
Stage 2: Evaluate the FV for the supplied feature value for N samples based on the k-neighbors and error rate. |
Stage 3: According to their FV, arrange the flying squirrel sites in increasing order. |
Stage 4: Create new regions by gliding arbitrary regions, |
Else |
Stage 5: For the most iterations possible, repeat stages 1 through 4. |
- Stage 1: The model’s starting constant value is supplied.
- Stage 2 decides how many iterations there will be; m = 1 to M.
- Stage 2.1: Based on Equation (7), the step size and minimal loss reduction for averaging the weights of different trees may be determined as follows:
- Step 2.2 involves updating the method as follows:
- Step 3, after applying M additive functions to produce the result, returns .
Algorithm 2: Squirrel Search-Optimized Gradient Boosted Decision Tree (SS-GBDT) |
|
Iterations (), Population size (, Decision variables (), Likelihood of predator presence (), Scaling factor (), Gliding constant (), Upper and lower bounds for decision variables (, ) |
|
for in range : |
for in range : |
|
for in range : |
|
|
for each FLS: |
if : |
else: |
random location |
Limit the new positions to the lower and upper bounds |
|
|
|
Stage 1: Set the initial constant value for the model |
Stage 2: For in range : |
2.1: Determine step size and minimal loss reduction for averaging the weights of different trees |
2.2: Update the model |
Stage 3: Return the final model |
|
4. Results and Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ahsan, M.; Siddique, Z. Machine learning-based heart disease diagnosis: A systematic literature review. Artif. Intell. Med. 2022, 128, 102289. [Google Scholar] [CrossRef]
- Diwakar, M.; Tripathi, A.; Joshi, K.; Memoria, M.; Singh, P.; Kumar, N. Latest trends on heart disease prediction using machine learning and image fusion. Mater. Today Proc. 2021, 37, 3213–3218. [Google Scholar] [CrossRef]
- Hassan, S.; Dhali, M.; Zaman, F.; Tanveer, M. Big data and predictive analytics in healthcare in Bangladesh: Regulatory challenges. Heliyon 2021, 7, e07179. [Google Scholar] [CrossRef] [PubMed]
- Konstantonis, G.; Singh, K.V.; Sfikakis, P.P.; Jamthikar, A.D.; Kitas, G.D.; Gupta, S.K.; Saba, L.; Verrou, K.; Khanna, N.N.; Ruzsa, Z.; et al. Cardiovascular disease detection using machine learning and carotid/femoral arterial imaging frameworks in rheumatoid arthritis patients. Rheumatol. Int. 2022, 42, 215–239. [Google Scholar] [CrossRef] [PubMed]
- Ahmed, I.; Ahmad, M.; Jeon, G.; Piccialli, F. A Framework for Pandemic Prediction Using Big Data Analytics. Big Data Res. 2021, 25, 100190. [Google Scholar] [CrossRef]
- Ramesh, T.R.; Lilhore, U.K.; Poongodi, M.; Simaiya, S.; Kaur, A.; Hamdi, M. Predictive analysis of heart diseases with Machine Learning approaches. Malays. J. Comput. Sci. 2022, 132–148. [Google Scholar] [CrossRef]
- Chang, V.; Bhavani, V.R.; Xu, A.Q.; Hossain, A. An artificial intelligence model for heart disease detection using machine learning algorithms. Healthc. Anal. 2022, 2, 100016. [Google Scholar] [CrossRef]
- Rehman, A.; Naz, S.; Razzak, I. Leveraging big data analytics in healthcare enhancement: Trends, challenges and opportunities. Multimedia Syst. 2022, 28, 1339–1371. [Google Scholar] [CrossRef]
- Nagavelli, U.; Samanta, D.; Chakraborty, P. Machine Learning Technology-Based Heart Disease Detection Models. J. Healthc. Eng. 2022, 2022, 7351061. [Google Scholar] [CrossRef]
- Ketu, S.; Mishra, P.K. Empirical Analysis of Machine Learning Algorithms on Imbalance Electrocardiogram Based Arrhythmia Dataset for Heart Disease Detection. Arab. J. Sci. Eng. 2022, 47, 1447–1469. [Google Scholar] [CrossRef]
- Anooj, P.K. Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules. J. King Saud Univ. -Comput. Inf. Sci. 2012, 24, 27–40. [Google Scholar] [CrossRef]
- Dewan, A.; Sharma, M. Prediction of heart disease using a hybrid technique in data mining classification. In Proceedings of the 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 11–13 March 2015. [Google Scholar]
- Sharanyaa, S.; Lavanya, S.; Chandhini, M.R.; Bharathi, R.; Madhulekha, K. Hybrid Machine Learning Techniques for Heart Disease Prediction. Int. J. Adv. Eng. Res. Sci. 2020, 7, 44–48. [Google Scholar] [CrossRef]
- Rajendran, N.A.; Vincent, D.R. Heart disease prediction system using ensemble of machine learning algorithms. Recent Pat. Eng. 2021, 15, 130–139. [Google Scholar] [CrossRef]
- Shorewala, V. Early detection of coronary heart disease using ensemble techniques. Inform. Med. Unlocked 2021, 26, 100655. [Google Scholar] [CrossRef]
- Tiwari, A.; Chugh, A.; Sharma, A. Ensemble framework for cardiovascular disease prediction. Comput. Biol. Med. 2020, 146, 105624. [Google Scholar] [CrossRef]
- Yoon, T.; Kang, D. Multi-Modal Stacking Ensemble for the Diagnosis of Cardiovascular Diseases. J. Pers. Med. 2023, 13, 373. [Google Scholar] [CrossRef]
- Menshawi, A.; Hassan, M.M.; Allheeib, N.; Fortino, G. A Hybrid Generic Framework for Heart Problem Diagnosis Based on a Machine Learning Paradigm. Sensors 2023, 23, 1392. [Google Scholar] [CrossRef]
- Reddy, K.V.V.; Elamvazuthi, I.; Aziz, A.A.; Paramasivam, S.; Chua, H.N.; Pranavanand, S. Heart disease risk prediction using machine learning classifiers with attribute evaluators. Appl. Sci. 2021, 11, 8352. [Google Scholar] [CrossRef]
- Baccouche, A.; Garcia-Zapirain, B.; Olea, C.C.; Elmaghraby, A. Ensemble deep learning models for heart disease classification: A case study from Mexico. Information 2020, 11, 207. [Google Scholar] [CrossRef]
- Almulihi, A.; Saleh, H.; Hussien, A.M.; Mostafa, S.; El-Sappagh, S.; Alnowaiser, K.; Ali, A.A. Refaat Hassan. Ensemble Learning Based on Hybrid Deep Learning Model for Heart Disease Early Prediction. Diagnostics 2022, 12, 3215. [Google Scholar] [CrossRef]
- Cenitta, D.; Arjunan, R.V.; Prema, K.V. Ischemic Heart Disease Prediction Using Optimized Squirrel Search Feature Selection Algorithm. IEEE Access 2022, 10, 122995–123006. [Google Scholar] [CrossRef]
- Bharti, R.; Khamparia, A.; Shabaz, M.; Dhiman, G.; Pande, S.; Singh, P. Prediction of Heart Disease Using a Combination of Machine Learning and Deep Learning. Comput. Intell. Neurosci. 2021, 2021, 8387680. [Google Scholar] [CrossRef] [PubMed]
- Ko, Y.-F.; Kuo, P.-H.; Wang, C.-F.; Chen, Y.-J.; Chuang, P.-C.; Li, S.-Z.; Chen, B.-W.; Yang, F.-C.; Lo, Y.-C.; Yang, Y.; et al. Quantification Analysis of Sleep Based on Smartwatch Sensors for Parkinson’s Disease. Biosensors 2022, 12, 74. [Google Scholar] [CrossRef] [PubMed]
- Miao, K.H.; Miao, J.H. Coronary heart disease diagnosis using deep neural networks. Int. J. Adv. Comput. Sci. Appl. 2018, 9, 10. [Google Scholar] [CrossRef]
- Nawaz, M.S.; Shoaib, B.; Ashraf, M.A. Intelligent Cardiovascular Disease Prediction Empowered with Gradient Descent Optimization. Heliyon 2021, 7, e06948. [Google Scholar] [CrossRef]
- Eisa, M.M.; Alnaggar, M.H. Hybrid Rough-Genetic Classification Model for IoT Heart Disease Monitoring System. In Digital Transformation Technology: Proceedings of ITAF 2020; Springer: Singapore, 2022; pp. 437–451. [Google Scholar] [CrossRef]
- Nandy, S.; Adhikari, M.; Balasubramanian, V.; Menon, V.G.; Li, X.; Zakarya, M. An intelligent heart disease prediction system based on swarm-artificial neural network. Neural Comput. Appl. 2021, 1–15. [Google Scholar] [CrossRef]
- Kasbe, T.; Pippal, R.S. Enhancement in diagnosis of coronary artery disease using fuzzy expert system. Int. J. Sci. Res. Comput. Sci. Eng. Informat. Technol. 2018, 3, 1324–1331. [Google Scholar]
- Hernandez, A.F.; Albert, N.M.; Allen, L.A.; Ahmed, R.; Averina, V.; Boehmer, J.P.; Cowie, M.R.; Chien, C.V.; Galvao, M.; Klein, L.; et al. Multiple cArdiac seNsors for mAnaGEment of Heart Failure (MANAGE-HF)—Phase I Evaluation of the Integration and Safety of the HeartLogic Multisensor Algorithm in Patients With Heart Failure. J. Card. Fail. 2022, 28, 1245–1254. [Google Scholar] [CrossRef]
- Shakya, S.; Joby, P.P. Heart disease prediction using fog computing based wireless body sensor networks (WSNs). IRO J. Sustain. Wirel. Syst. 2021, 3, 49–58. [Google Scholar] [CrossRef]
- Subahi, A.F.; Khalaf, O.I.; Alotaibi, Y.; Natarajan, R.; Mahadev, N.; Ramesh, T. Modified Self-Adaptive Bayesian Algorithm for Smart Heart Disease Prediction in IoT System. Sustainability 2022, 14, 14208. [Google Scholar] [CrossRef]
Feature | Symbol | Value |
---|---|---|
Age | age | 29 to 77 |
Sex | sex | Female (0) Male (1) |
Chest Pain | cp | Angina (typical) (1) Angina (atypical) (2) Non-anginal (3) Asymptomatic (4) |
Resting blood sugar | trestbps | 94 to 200 mm Hg |
Serum cholesterol | chol | 126 to 564 mg/dL |
Fasting blood sugar | fbs | <120 mg/dL (0) >120 mg/dL (1) |
Resting ECG result | restecg | Normal (0) ST-T wave abnormality (1) LV-hypertrophy (2) |
The highest possible heart rate attained | thalach | 71 to 202 |
Angina caused by exercise | exang | No (0) Yes (1) |
ST depression induced by exercise relative to rest | oldpeak | 0 to 6.2 |
Peak workout ST segment slope | slope | Upsloping (1) Flat (2) Down sloping (3) |
Major vessels colored by fluoroscopy | ca | 0–3 |
Defect type | thal | Normal (3) Fixed defect (6) Reversible defect (7) |
Heart disease | target | 0–4 |
Method | Source | Accuracy % | Recall % | Precision % | F1-Measure % |
---|---|---|---|---|---|
LR | Bharti et al. [23] | 83.3 | 86.3 | - | - |
KNN | Bharti et al. [23] | 84.8 | 85 | - | - |
SVM | Bharti et al. [23] | 83.2 | 78.2 | - | - |
RF | Bharti et al. [23] | 80.3 | 78.2 | - | - |
DT | Bharti et al. [23] | 82.3 | 78.5 | - | - |
DL | Bharti et al. [23] | 94.2 | 82.3 | - | - |
ML | Ko et al. [24] | 70 | 73 | 72 | 79 |
DNN | Miao and Miao [25] | 83.67 | 93.51 | 79.12 | 85.71 |
GDO | Nawaz et al. [26] | 97.07 | 97.15 | - | - |
GA | Eisa and Alnaggar [27] | 64 | 66 | 65 | 70 |
Swarm-ANN | Nandy et al. [28] | 95.78 | 95.21 | 95.21 | 95.21 |
Fuzzy Expert System | Kasbe and Pippals [29] | 94.5 | - | - | - |
MANAGE-HF | Hernandez et al. [30] | 56 | 60 | 58 | 63 |
BSN | Shakya and Joby [31] | 82 | 85 | 84 | 88 |
MSABA | Subahi et al. [32] | 90 | 91 | 92 | 95 |
GBDT | Current paper | 83 | 81 | 79 | 79.99 |
SS-GBDT | Current paper | 95 | 96.8 | 95.8 | 96.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shaik, K.; Ramesh, J.V.N.; Mahdal, M.; Rahman, M.Z.U.; Khasim, S.; Kalita, K. Big Data Analytics Framework Using Squirrel Search Optimized Gradient Boosted Decision Tree for Heart Disease Diagnosis. Appl. Sci. 2023, 13, 5236. https://doi.org/10.3390/app13095236
Shaik K, Ramesh JVN, Mahdal M, Rahman MZU, Khasim S, Kalita K. Big Data Analytics Framework Using Squirrel Search Optimized Gradient Boosted Decision Tree for Heart Disease Diagnosis. Applied Sciences. 2023; 13(9):5236. https://doi.org/10.3390/app13095236
Chicago/Turabian StyleShaik, Kareemulla, Janjhyam Venkata Naga Ramesh, Miroslav Mahdal, Mohammad Zia Ur Rahman, Syed Khasim, and Kanak Kalita. 2023. "Big Data Analytics Framework Using Squirrel Search Optimized Gradient Boosted Decision Tree for Heart Disease Diagnosis" Applied Sciences 13, no. 9: 5236. https://doi.org/10.3390/app13095236
APA StyleShaik, K., Ramesh, J. V. N., Mahdal, M., Rahman, M. Z. U., Khasim, S., & Kalita, K. (2023). Big Data Analytics Framework Using Squirrel Search Optimized Gradient Boosted Decision Tree for Heart Disease Diagnosis. Applied Sciences, 13(9), 5236. https://doi.org/10.3390/app13095236