Heart Failure Prediction Based on Bootstrap Sampling and Weighted Fusion LightGBM Model
Abstract
:1. Introduction
2. Related Work
3. Mathematical Model
3.1. Classification Model Based on LightGBM
3.2. Application Strategy of Bootstrap Sampling in Model
- Determine the sampling times.
- Conduct bootstrap sampling in the training set.
- 3.
- Further divide the training set after bootstrap sampling.
- 4.
- Train the model based on bootstrap sampling data and make predictions.
- 5.
- Synthesize the prediction results of the bootstrap sampling model by voting.
3.3. Model Fusion Scheme
- (1)
- Synthesis of forecast results based on bootstrap sampling and voting
- (2)
- Model fusion based on evaluation index weight
- (3)
- Weighted average fusion to obtain the prediction result
3.4. Parameter Optimization of Model
3.4.1. The Optimization Thought
3.4.2. Parameter Optimization Strategy
- (1)
- Determination and range setting of hyperparameters.
- (2)
- Implementation process of grid search
- (3)
- Training and evaluation of the model
3.4.3. Parameter Optimization Process
- (1)
- Accurately obtain the corresponding training set and verification set data according to the split index.
- (2)
- Perform bootstrap sampling in turn, divide verification set carefully, create LightGBM data set objects, set model parameters reasonably, conduct sub-model training, and finally, complete the prediction task on the verification set.
- (3)
- Accurately calculate the accuracy rate, precision rate, recall rate, and F1 value on this verification set, and save the results of these evaluation indicators.
4. Experimental Settings
4.1. Data Set Generation
4.1.1. Data Source
4.1.2. Potential Deviation from the Data Set
4.1.3. Initial Data Exploration
- (1)
- Data distribution
- (2)
- The influence of various indicators on the predicted value
4.1.4. Data Preprocessing
- (1)
- Feature discretization
- (2)
- Feature encoding
4.2. Result Evaluation Metrics
4.3. Experimental Environment
5. Experimental Results and Analysis
5.1. Feature Selection
- (1)
- Whether the feature is divergent: If the difference between samples in this feature is very small, then this feature has no practical effect on distinguishing data samples. Generally speaking, such features can be regarded as invalid or irrelevant features, and they are eliminated in the process of feature selection to improve the prediction performance of the model.
- (2)
- Correlation between features and targets: The correlation between features and targets is a very important factor in feature selection. When the correlation between the feature and the target is high, the prediction ability of the feature for the target is stronger.
5.2. Comparative Analysis of Model Effects
5.2.1. The Setting of the Experimental Sample
5.2.2. Selection of Basic Model
5.2.3. Experiment and Comparison of Fusion Model
5.3. Discussion on Model Deployment with Incomplete External Verification
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Savarese, G.; Becher, P.M.; Lund, L.H.; Seferovic, P.; Rosano, G.M.C.; Coats, A.J.S. Global burden of heart failure: A comprehensive and updated review of epidemiology. Cardiovasc. Res. 2023, 118, 3272–3287. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.; Liu, Y.J.; Yang, J.F. Epidemiology of heart failure. J. Clin. Cardiol. 2023, 39, 243–247. [Google Scholar]
- Papadimitriou, L.; Grewal, P.; Kalogeropoulos, A.P. Epidemiology of heart failure. In Heart Failure: An Essential Clinical Guide, 1st ed.; Kalogeropoulos, A.P., Skopicki, H.A., Butler, J., Eds.; CRC Press: Boca Raton, FL, USA, 2022; pp. 244–253. [Google Scholar]
- Yogeswaran, V.; Hidano, D.; Diaz, A.E.; Spall, H.V.V.; Mamas, M.; Roth, G.A.; Cheng, R.K. Regional variations in heart failure: A global perspective. Heart 2023, 110, 11–18. [Google Scholar] [CrossRef]
- Tazi, A.; Biju, S.M.; Oroumchian, F.; Kumar, M. Artificial intelligence enabled healthcare data analysis for chronic heart disease detection: An evaluation. Int. J. Grid Util. Comput. 2024, 15, 198–210. [Google Scholar] [CrossRef]
- DeGroat, W.; Abdelhalim, H.; Patel, K.; Mendhe, D.; Zeeshan, S.; Ahmed, Z. Discovering biomarkers associated and predicting cardiovascular disease with high accuracy using a novel nexus of machine learning techniques for precision medicine. Sci. Rep. 2024, 14, 1. [Google Scholar] [CrossRef]
- Ouwerkerk, W.; Voors, A.A.; Zwinderman, A.H. Factors Influencing the Predictive Power of Models for Predicting Mortality and/or Heart-Failure Hospitalization in Patients With Heart Failure. JACC Heart Fail. 2014, 2, 429–436. [Google Scholar] [CrossRef]
- Bettencourt, P.; Azevedo, A.; Pimenta, J.; Friões, F.; Ferreira, S.; Ferreira, A. N-terminal-pro-brain natriuretic peptide predicts outcome after hospital discharge in heart failure patients. Circulation 2004, 110, 2168–2174. [Google Scholar] [CrossRef]
- McDowell, K.; Adamson, C.; Jackson, C.; Campbell, R.; Welsh, P.; Petrie, M.C.; Mcmurray, J.J.V.; Jhund, P.S.; Herring, N. Neuropeptide Y is elevated in heart failure and is an independent predictor of outcomes. Eur. J. Heart Fail. 2024, 26, 107–116. [Google Scholar] [CrossRef]
- Yang, W.; Zhu, L.; He, J.; Wu, W.; Zhang, Y.; Zhuang, B.; Xu, J.; Zhou, D.; Wang, Y.; Liu, G. Long-term outcomes prediction in diabetic heart failure with preserved ejection fraction by cardiac MRI. Eur. Radiol. 2024, 34, 5678–5690. [Google Scholar] [CrossRef]
- Matsushita, K.; Ito, J.; Isaka, A.; Higuchi, S.; Minamishima, T.; Sakata, K.; Satoh, T.; Soejima, K. Predicting readmission for heart failure patients by echocardiographic assessment of elevated left atrial pressure. Am. J. Med. Sci. 2023, 366, 360–366. [Google Scholar] [CrossRef]
- Judge, D.P.; Rouf, R. Use of Genetics in the Clinical Evaluation and Management of Heart Failure. Curr. Treat. Options Cardiovasc. Med. 2010, 12, 566–577. [Google Scholar] [CrossRef]
- Szczepanowski, R.; Uchmanowicz, I.; Pasieczna-Dixit, A.H.; Sobecki, J.; Katarzyniak, R.; Koaczek, G.; Lorkiewicz, W.; Kdras, M.; Dixit, A.; Biegus, J. Application of machine learning in predicting frailty syndrome in patients with heart failure. Adv. Clin. Exp. Med. 2024, 33, 309–315. [Google Scholar] [CrossRef]
- Miyashita, Y.; Hitsumoto, T.; Fukuda, H.; Kim, J.; Washio, T.; Kitakaze, M. Predicting heart failure onset in the general population using a novel data-mining artificial intelligence method. Sci. Rep. 2023, 13, 4352. [Google Scholar] [CrossRef]
- Smith, D.H.; Johnson, E.S.; Thorp, M.L.; Yang, X.; Petrik, A.; Platt, R.W.; Crispell, K. Predicting poor outcomes in heart failure. Perm. J. 2011, 15, 4–11. [Google Scholar] [CrossRef]
- Gottdiener, J.S.; Fohner, A.E. Risk Prediction in Heart Failure: New Methods, Old Problems. JACC Heart Fail. 2020, 8, 22–24. [Google Scholar] [CrossRef]
- Khan, S.S.; Ning, H.; Allen, N.B.; Carnethon, M.R.; Yancy, C.W.; Shah, S.J.; Wilkins, J.T.; Tian, L.; Lloyd-Jones, D.M. Development and Validation of a Long-Term Incident Heart Failure Risk Model. Circ. Res. 2022, 2, 200–209. [Google Scholar] [CrossRef]
- Gaziano, L.; Cho, K.; Djousse, L.; Schubert, P.; Galloway, A.; Ho, Y.-L.; Kurgansky, K.; Gagnon, D.R.; Russo, J.P.; Di Angelantonio, E.; et al. Risk factors and prediction models for incident heart failure with reduced and preserved ejection fraction. ESC Heart Fail. 2021, 8, 4893–4903. [Google Scholar] [CrossRef]
- Ahmad, T.; Yamamoto, Y.; Biswas, A.; Ghazi, L.; Martin, M.; Simonov, M.; Hsiao, A.; Kashyap, N.; Velazquez, E.J.; Desai, N.R.; et al. REVeAL-HF: Design and rationale of a pragmatic randomized controlled trial embedded within routine clinical practice. Heart Fail. 2021, 9, 409–419. [Google Scholar]
- Phan, J.; Barroca, C.; Fernandez, J. A Suggested Model for the Vulnerable Phase of Heart Failure: Assessment of Risk Factors, Multidisciplinary Monitoring, Cardiac Rehabilitation, and Addressing the Social Determinants of Health. Cureus 2023, 15, e35602. [Google Scholar] [CrossRef]
- Lindholm, D.; Lindbäck, J.; Armstrong, P.W.; Budaj, A.; Cannon, C.P.; Granger, C.B.; Hagström, E.; Held, C.; Koenig, W.; Östlund, O.; et al. Biomarker-Based Risk Model to Predict Cardiovascular Mortality in Patients with Stable Coronary Disease. J. Am. Coll. Cardiol. 2017, 70, 813–826. [Google Scholar] [CrossRef]
- Li, X.; Zhang, T.; Xing, W. Predictive value of initial Lp-PLA2, NT-proBNP, and peripheral blood-related ratios for heart failure after early onset infarction in patients with acute myocardial infarction. Am. J. Transl. Res. 2024, 16, 2940–2952. [Google Scholar] [CrossRef]
- Bayes-Genis, A.; Docherty, K.F.; Petrie, M.C.; Januzzi, J.L.; Mueller, C.; Anderson, L.; Bozkurt, B.; Butler, J.; Chioncel, O.; Cleland, J.G.F.; et al. Practical algorithms for early diagnosis of heart failure and heart stress using NT-proBNP: A clinical consensus statement from the Heart Failure Association of the ESC. Eur. J. Heart Fail. 2023, 25, 1891–1898. [Google Scholar] [CrossRef]
- da Silva, R.M.F.L.; Borges, L.E. Neutrophil-Lymphocyte Ratio and Red Blood Cell Distribution Width in Patients with Atrial Fibrillation and Rheumatic Valve Disease. Curr. Vasc. Pharmacol. 2023, 21, 367–377. [Google Scholar] [CrossRef]
- Huang, S.; Zhou, Y.; Zhang, Y.; Liu, N.; Liu, J.; Liu, L.; Fan, C. Advances in MicroRNA Therapy for Heart Failure: Clinical Trials, Preclinical Studies, and Controversies. Cardiovasc. Drugs Ther. 2025, 39, 221–232. [Google Scholar] [CrossRef]
- Paterson, I.; Mielniczuk, L.M.; O’Meara, E.; So, A.; White, J.A. Imaging Heart Failure: Current and Future Applications. Can. J. Cardiol. 2013, 29, 317–328. [Google Scholar] [CrossRef]
- Lee, J.-H.; Uhm, J.-S.; Suh, Y.J.; Kim, M.; Kim, I.-S.; Jin, M.-N.; Cho, M.S.; Yu, H.T.; Kim, T.-H.; Hong, Y.J.; et al. Usefulness of cardiac magnetic resonance images for prediction of sudden cardiac arrest in patients with mitral valve prolapse: A multicenter retrospective cohort study. BMC Cardiovasc. Disord. 2021, 21, 546. [Google Scholar] [CrossRef]
- Pinto, J.; Koshy, A.G. The Role of Echocardiography in Heart Failure Today. J. Indian Acad. Echocardiogr. Cardiovasc. Imaging 2021, 5, 16–23. [Google Scholar] [CrossRef]
- Yogasundaram, H.; Alhumaid, W.; Dzwiniel, T.; Christian, S.; Oudit, G.Y. Cardiomyopathies and Genetic Testing in Heart Failure: Role in Defining Phenotype-Targeted Approaches and Management. Can. J. Cardiol. 2021, 37, 547–559. [Google Scholar] [CrossRef]
- Bleumink, G.S.; Schut, A.F.C.; Sturkenboom, M.C.J.M.; Deckers, J.W.; van Duijn, C.M.; Stricker, B.H.C. Genetic polymorphisms and heart failure. Genet. Med. 2004, 6, 465–474. [Google Scholar] [CrossRef]
- Rosenbaum, A.N.; Pereira, N. Updates on the Genetic Paradigm in Heart Failure. Curr. Treat. Options Cardiovasc. Med. 2019, 21, 1–11. [Google Scholar] [CrossRef]
- Skrzynia, C.; Berg, J.S.; Willis, M.S.; Jensen, B.C. Genetics and Heart Failure: A Concise Guide for the Clinician. Curr. Cardiol. Rev. 2015, 11, 10–17. [Google Scholar] [CrossRef]
- Povysil, G.; Chazara, O.; Carss, K.J.; Deevi, S.V.V.; Wang, Q.; Armisen, J.; Paul, D.S.; Granger, C.B.; Kjekshus, J.; Aggarwal, V.; et al. Assessing the Role of Rare Genetic Variation in Patients with Heart Failure. JAMA Cardiol. 2021, 6, 379–386. [Google Scholar] [CrossRef]
- Cuocolo, R.; Perillo, T.; De Rosa, E.; Ugga, L.; Petretta, M. Current applications of big data and machine learning in cardiology. J. Geriatr. Cardiol. 2019, 16, 601–607. [Google Scholar]
- Agrawal, H.; Chandiwala, J.; Agrawal, S.; Goyal, Y. Heart Failure Prediction using Machine Learning with Exploratory Data Analysis. In Proceedings of the 2021 International Conference on Intelligent Technologies (CONIT), Hubli, India, 25–27 June 2021; pp. 1–6. [Google Scholar]
- Penny-Dimri, J.C.; Bergmeir, C.; Perry, L.; Hayes, L.; Bellomo, R.; Smith, J.A. Machine learning to predict adverse outcomes after cardiac surgery: A systematic review and meta-analysis. J. Card. Surg. 2022, 37, 3838–3845. [Google Scholar] [CrossRef]
- Benedetto, U.; Dimagli, A.; Sinha, S.; Cocomello, L.; Gibbison, B.; Caputo, M.; Gaunt, T.; Lyon, M.; Holmes, C.; Angelini, G.D. Machine learning improves mortality risk prediction after cardiac surgery: Systematic review and meta-analysis. J. Thorac. Cardiovasc. Surg. 2022, 163, 2075–2087. [Google Scholar] [CrossRef]
- Mythili, T.; Mukherji, D.; Padalia, N.; Naidu, A. A Heart Disease Prediction Model using SVM-Decision Trees-Logistic Regression (SDL). Int. J. Comput. Appl. Technol. 2014, 68, 11–15. [Google Scholar]
- SK, H.K.; Praveen, A.; Kowshik, G.; Lokeshwaran, T.; Prasanna, K.M. Heart Disease Prediction using XGBoost and Random Forest Models. In Proceedings of the 2024 5th International Conference on Mobile Computing and Sustainable Informatics (ICMCSI), Lalitpur, Nepal, 18–19 January 2024; pp. 19–23. [Google Scholar]
- Yang, P.; Qiu, H.; Wang, L.; Zhou, L. Early prediction of high-cost inpatients with ischemic heart disease using network analytics and machine learning. Expert Syst. Appl. 2022, 210, 118541. [Google Scholar] [CrossRef]
- Fedesoriano. Heart Failure Prediction Dataset. Available online: https://www.kaggle.com/fedesoriano/heart-failure-prediction (accessed on September 2021).
Attribute | Attribute Information |
---|---|
Age | age of the patient [years] |
Sex | sex of the patient [M: male, F: female] |
ChestPainType | chest pain type [TA: typical angina, ATA: atypical angina, NAP: non-anginal pain, ASY: asymptomatic] |
RestingBP | resting blood pressure [mm Hg] |
Cholesterol | serum cholesterol [mm/dL] |
FastingBS | fasting blood sugar 1 |
RestingECG | resting electrocardiogram results 2 |
MaxHR | maximum heart rate achieved 3 |
ExerciseAngina | exercise-induced angina [Y: Yes, N: No] |
Oldpeak | oldpeak = ST [numeric value measured in depression] |
ST_Slope | slope of the peak exercise ST segment 4 |
HeartDisease | output class [1: heart disease, 0: Normal] |
Combination Type | Accuracy |
---|---|
Age + Gender + ChestPainType + ExerciseAngina | About 79% |
ChestPainType + ExerciseAngina + Oldpeak + ST slope | 79–81% |
Oldpeak + ST_Slope + RestingECG + ExerciseAngina | About 84% |
RestingECG + MaxHR + ExerciseAngina | 83–84% |
Oldpeak + ST_Slope + RestingECG | 81–83% |
Full feature combination | 82–85% |
Model | LightGBM | SVM | KNN | Decision Tree | GBDT | XGBoost |
---|---|---|---|---|---|---|
LightGBM | 0.000000 | 0.002467 | 0.000839 | 0.017154 | 0.943512 | 0.079960 |
Model | Accuracy | Recall | Precision | F1 score |
---|---|---|---|---|
SVM | 0.735507 | 0.769230 | 0.733333 | 0.750853 |
KNN | 0.702899 | 0.762238 | 0.694268 | 0.726667 |
Decision Tree | 0.818841 | 0.853147 | 0.807947 | 0.829932 |
GBDT | 0.855072 | 0.874126 | 0.850340 | 0.862069 |
XGBoost | 0.829710 | 0.853147 | 0.824324 | 0.838488 |
LightGBM | 0.829710 | 0.853147 | 0.824324 | 0.838488 |
Bootstrap sampling and weighted fusion LightGBM | 0.847826 | 0.888112 | 0.830065 | 0.858108 |
Increase relative to baseline model (%) | 2.183413 | 4.098356 | 0.696449 | 2.339926 |
Contrast Mode | t | p |
---|---|---|
Fusion model and LightGBM model | 1.637964 | 0.102573 |
Fusion model and SVM model | 0.266811 | 0.789815 |
Fusion model and KNN model | −0.619480 | 0.536112 |
Fusion model and Decision Tree model | 1.990495 | 0.047526 |
Fusion model and GBDT model | 1.511057 | 0.131922 |
Fusion model and XGBoost model | 1.416791 | 0.157675 |
Model | Accuracy Difference | Precision Difference | Recall Difference | F1 Difference |
---|---|---|---|---|
Fusion model | 0.126202 | 0.135431 | 0.100929 | 0.119140 |
LightGBM | 0.170290 | 0.175676 | 0.146853 | 0.161512 |
SVM | −0.008093 | 0.020677 | 0.003372 | 0.012340 |
KNN | 0.075918 | 0.096855 | 0.067899 | 0.083494 |
Decision Tree | 0.242753 | 0.224638 | 0.251748 | 0.238434 |
GBDT | 0.098199 | 0.091612 | 0.103956 | 0.097608 |
XGBoost | 0.170290 | 0.175676 | 0.146853 | 0.161512 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Cao, H. Heart Failure Prediction Based on Bootstrap Sampling and Weighted Fusion LightGBM Model. Appl. Sci. 2025, 15, 4360. https://doi.org/10.3390/app15084360
Wang Y, Cao H. Heart Failure Prediction Based on Bootstrap Sampling and Weighted Fusion LightGBM Model. Applied Sciences. 2025; 15(8):4360. https://doi.org/10.3390/app15084360
Chicago/Turabian StyleWang, Yuanni, and Hong Cao. 2025. "Heart Failure Prediction Based on Bootstrap Sampling and Weighted Fusion LightGBM Model" Applied Sciences 15, no. 8: 4360. https://doi.org/10.3390/app15084360
APA StyleWang, Y., & Cao, H. (2025). Heart Failure Prediction Based on Bootstrap Sampling and Weighted Fusion LightGBM Model. Applied Sciences, 15(8), 4360. https://doi.org/10.3390/app15084360