Enhancing Cardiovascular Risk Prediction: Development of an Advanced Xgboost Model with Hospital-Level Random Effects
Abstract
:1. Introduction
Related Work
2. Methods
2.1. Dataset and Patient Population
2.2. Exploratory Analysis
2.3. Xgboost BME Approach
- Build a forest of trees using a standard Xgboost algorithm with as the training set responses in logit scale and as the corresponding training set of covariates, i = , j = 1, …, . Since logits of are continuous and binary classification using Xgboost is considered, the values were converted back to binary labels using median as the threshold. Given the high class imbalance, with the outcome class (mortality) constituting fewer than 3% of data, employing the median as a threshold dynamically modifies the decision boundary to better detect rare positive instances. Since the Xgboost now models only the fixed-effects component of the response, it was necessary to update the hyperparameters. Random stratified 3-fold Grid Search Cross Validation was applied using the training dataset with the same hyperparameter search criteria as that for the Xgboost NC model, similar to previous studies [1,3]. A maximum of 30 combinations was imposed to allow for variability in parameters across iterations.
- Obtain an estimate of using the training data on Xgboost in logit scale.
- Estimate using and as inputs into the Gauss Hermite Quadrature using an approach similar to Simchoni et al. [12], where . The number of quadratures was set at 80, as determined through pilot experiments, satisfying k < 2m − 1, where k represents the degree of the polynomial for numerical integration and m is the adjustment parameter, as the number of random effect levels.
- = − , i = 1, …, n, where represents the fixed component of the response and is re-binarized to 0 and 1 using the median of .
2.4. Validation Approach
2.4.1. Xgboost BME and NC Variant Models
2.4.2. Performance by Sample Size
2.4.3. Visualization of Parameters
2.4.4. Baseline Models
3. Results
3.1. Exploratory Analysis
3.2. Model Validation: Comparison Using All Samples
3.2.1. Xgboost BME and NC Variant Models
3.2.2. Glmer and Glm Variant Models
3.3. Performance by Sample Size
3.3.1. Unstandardized Xgboost BME and Standardized Xgboost NC Models
3.3.2. Unstandardized Glm and Standardized Glmer Models
3.4. Visualization of Parameters
4. Discussion
4.1. Technical Perspective
4.2. Relevance to Clinical Practice
4.2.1. Cardiac Surgery Perspective
4.2.2. Cardiology Perspective
5. Future Work and Limitations
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
References
- Sinha, S.; Dong, T.; Dimagli, A.; Vohra, H.A.; Holmes, C.; Benedetto, U.; Angelini, G.D. Comparison of Machine Learning Techniques in Prediction of Mortality Following Cardiac Surgery: Analysis of over 220,000 Patients from a Large National Database. Eur. J. Cardio-Thorac. Surg. 2023, 63, ezad183. [Google Scholar] [CrossRef] [PubMed]
- Dong, T.; Sinha, S.; Zhai, B.; Fudulu, D.; Chan, J.; Narayan, P.; Judge, A.; Caputo, M.; Dimagli, A.; Benedetto, U.; et al. Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis. JMIRx Med. 2024, 5, e45973. [Google Scholar] [CrossRef] [PubMed]
- Dong, T.; Sinha, S.; Zhai, B.; Fudulu, D.P.; Chan, J.; Narayan, P.; Judge, A.; Caputo, M.; Dimagli, A.; Benedetto, U.; et al. Cardiac Surgery Risk Prediction Using Ensemble Machine Learning to Incorporate Legacy Risk Scores: A Benchmarking Study. Digit. Health 2023, 9, 20552076231187605. [Google Scholar] [CrossRef] [PubMed]
- Kumar, N.K.; Sindhu, G.S.; Prashanthi, D.K.; Sulthana, A.S. Analysis and Prediction of Cardio Vascular Disease Using Machine Learning Classifiers. In Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 March 2020; pp. 15–21. [Google Scholar]
- Tiwari, P.; Colborn, K.L.; Smith, D.E.; Xing, F.; Ghosh, D.; Rosenberg, M.A. Assessment of a Machine Learning Model Applied to Harmonized Electronic Health Record Data for the Prediction of Incident Atrial Fibrillation. JAMA Netw. Open 2020, 3, e1919396. [Google Scholar] [CrossRef]
- Mehrtash, A.; Wells, W.M.; Tempany, C.M.; Abolmaesumi, P.; Kapur, T. Confidence Calibration and Predictive Uncertainty Estimation for Deep Medical Image Segmentation. IEEE Trans. Med. Imaging 2020, 39, 3868–3878. [Google Scholar] [CrossRef]
- Huang, C.; Li, S.-X.; Caraballo, C.; Masoudi, F.A.M.; Rumsfeld, J.S.; Spertus, J.A.; Normand, S.-L.T.; Mortazavi, B.J.; Krumholz, H.M.M. Performance Metrics for the Comparative Analysis of Clinical Risk Prediction Models Employing Machine Learning. Circ. Cardiovasc. Qual. Outcomes 2021, 14, e007526. [Google Scholar] [CrossRef]
- Steyerberg, E.W.; Vickers, A.J.; Cook, N.R.; Gerds, T.; Gonen, M.; Obuchowski, N.; Pencina, M.J.; Kattan, M.W. Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures. Epidemiology 2010, 21, 128–138. [Google Scholar] [CrossRef]
- Allyn, J.; Allou, N.; Augustin, P.; Philip, I.; Martinet, O.; Belghiti, M.; Provenchere, S.; Montravers, P.; Ferdynus, C. A Comparison of a Machine Learning Model with EuroSCORE II in Predicting Mortality after Elective Cardiac Surgery: A Decision Curve Analysis. PLoS ONE 2017, 12, e0169772. [Google Scholar] [CrossRef]
- Gregorich, M.; Strohmaier, S.; Dunkler, D.; Heinze, G. Regression with Highly Correlated Predictors: Variable Omission Is Not the Solution. Int. J. Environ. Res. Public Health 2021, 18, 4259. [Google Scholar] [CrossRef]
- Ng, S.-K.; McLachlan, G.J. Extension of Mixture-of-Experts Networks for Binary Classification of Hierarchical Data. Artif. Intell. Med. 2007, 41, 57–67. [Google Scholar] [CrossRef]
- Simchoni, G.; Rosset, S. Integrating Random Effects in Deep Neural Networks. J. Mach. Learn. Res. 2024, 24, 156:7402–156:7458. [Google Scholar]
- Hajjem, A.; Bellavance, F.; Larocque, D. Mixed-Effects Random Forest for Clustered Data. J. Stat. Comput. Simul. 2014, 84, 1313–1328. [Google Scholar] [CrossRef]
- Dong, T.; Sinha, S.; Fudulu, D.P.; Chan, J.; Zhai, B.; Narayan, P.N.; Caputo, M.; Judge, A.; Dimagli, A.; Benedetto, U.; et al. Random Effects Adjustment in Machine Learning Models for Cardiac Surgery Risk Prediction: A Benchmarking Study. medRxiv 2023. [Google Scholar] [CrossRef]
- Kang, X. The Effect of Color on Short-Term Memory in Information Visualization. In Proceedings of the 9th International Symposium on Visual Information Communication and Interaction, Dallas, TX, USA, 24–26 September 2016; ACM: Dallas, TX, USA, 2016; pp. 144–145. [Google Scholar] [CrossRef]
- Dong, T.; Benedetto, U.; Sinha, S.; Fudulu, D.; Dimagli, A.; Chan, J.; Caputo, M.; Angelini, G. Deep Recurrent Reinforced Learning Model to Compare the Efficacy of Targeted Local versus National Measures on the Spread of COVID-19 in the UK. BMJ Open 2022, 12, e048279. [Google Scholar] [CrossRef]
- McCulloch, C.E.; Searle, S.R. Generalized, Linear, and Mixed Models; Wiley Series; Wiley: Hoboken, NJ, USA, 2001; ISBN 0-471-19364-X. [Google Scholar]
- Kokol, P.; Kokol, M.; Zagoranski, S. Machine Learning on Small Size Samples: A Synthetic Knowledge Synthesis. Sci. Prog. 2022, 105, 00368504211029777. [Google Scholar] [CrossRef]
- Marin, J. Evaluating Synthetically Generated Data from Small Sample Sizes: An Experimental Study. arXiv 2022, arXiv:2211.10760. [Google Scholar]
- Lutakamale, A.S.; Manyesela, Y.Z. Machine Learning-Based Fingerprinting Positioning in Massive MIMO Networks: Analysis on the Impact of Small Training Sample Size to the Positioning Performance. SN Comput. Sci. 2023, 4, 286. [Google Scholar] [CrossRef]
- Lu, G.; Li, B.; Yang, W.; Yin, J. Unsupervised Feature Selection with Graph Learning via Low-Rank Constraint. Multimed. Tools Appl. 2018, 77, 29531–29549. [Google Scholar] [CrossRef]
- Soppa, G.; Theodoropoulos, P.; Bilkhu, R.; Harrison, D.; Alam, R.; Beattie, R.; Bleetman, D.; Hussain, A.; Jones, S.; Kenny, L.; et al. Variation between Hospitals in Outcomes Following Cardiac Surgery in the UK. Ann. R. Coll. Surg. Engl. 2019, 101, 333–341. [Google Scholar] [CrossRef]
- Fowler, A.J.; Abbott, T.E.F.; Prowle, J.; Pearse, R.M. Age of Patients Undergoing Surgery. Br. J. Surg. 2019, 106, 1012–1018. [Google Scholar] [CrossRef]
- Stoller, N.; Wertli, M.M.; Haynes, A.G.; Chiolero, A.; Rodondi, N.; Panczak, R.; Aujesky, D. Large Regional Variation in Cardiac Closure Procedures to Prevent Ischemic Stroke in Switzerland a Population-Based Small Area Analysis. PLoS ONE 2024, 19, e0291299. [Google Scholar] [CrossRef] [PubMed]
- Schenker, C.; Wertli, M.M.; Räber, L.; Haynes, A.G.; Chiolero, A.; Rodondi, N.; Panczak, R.; Aujesky, D. Regional Variation and Temporal Trends in Transcatheter and Surgical Aortic Valve Replacement in Switzerland: A Population-Based Small Area Analysis. PLoS ONE 2024, 19, e0296055. [Google Scholar] [CrossRef] [PubMed]
- Baquedano, M.; de Jesus, S.E.; Rapetto, F.; Murphy, G.J.; Angelini, G.; Benedetto, U.; Caldas, P.; Srivastava, P.K.; Uzun, O.; Luyt, K.; et al. Outcome Monitoring and Risk Stratification after Cardiac Procedure in Neonates, Infants, Children and Young Adults Born with Congenital Heart Disease: Protocol for a Multicentre Prospective Cohort Study (Children OMACp). BMJ Open 2023, 13, e071629. [Google Scholar] [CrossRef] [PubMed]
- Schmid, C.H.; Stark, P.C.; Berlin, J.A.; Landais, P.; Lau, J. Meta-Regression Detected Associations between Heterogeneous Treatment Effects and Study-Level, but Not Patient-Level, Factors. J. Clin. Epidemiol. 2004, 57, 683–697. [Google Scholar] [CrossRef]
- Cook, D.A.; Oh, S.-Y.; Pusic, M.V. Accuracy of Physicians’ Electrocardiogram Interpretations. JAMA Intern. Med. 2020, 180, 1–11. [Google Scholar] [CrossRef] [PubMed]
- Pecchia, L.; Melillo, P.; Sansone, M.; Bracale, M. Heart Rate Variability in Healthy People Compared with Patients with Congestive Heart Failure. In Proceedings of the 2009 9th International Conference on Information Technology and Applications in Biomedicine, Larnaka, Cyprus, 4–7 November 2009; pp. 1–4. [Google Scholar]
- Pecchia, L.; Melillo, P.; Sansone, M.; Bracale, M. Discrimination Power of Short-Term Heart Rate Variability Measures for CHF Assessment. IEEE Trans. Inf. Technol. Biomed. 2011, 15, 40–46. [Google Scholar] [CrossRef] [PubMed]
- Melillo, P.; Fusco, R.; Sansone, M.; Bracale, M.; Pecchia, L. Discrimination Power of Long-Term Heart Rate Variability Measures for Chronic Heart Failure Detection. Med. Biol. Eng. Comput. 2011, 49, 67–74. [Google Scholar] [CrossRef]
- Putnikovic, M.; Jordan, Z.; Munn, Z.; Borg, C.; Ward, M. Use of Electrocardiogram Monitoring in Adult Patients Taking High-Risk QT Interval Prolonging Medicines in Clinical Practice: Systematic Review and Meta-Analysis. Drug Saf. 2022, 45, 1037–1048. [Google Scholar] [CrossRef]
- Brindle, R.C.; Ginty, A.T.; Phillips, A.C.; Carroll, D. A Tale of Two Mechanisms: A Meta-Analytic Approach toward Understanding the Autonomic Basis of Cardiovascular Reactivity to Acute Psychological Stress. Psychophysiology 2014, 51, 964–976. [Google Scholar] [CrossRef]
- Sela, R.J.; Simonoff, J.S. RE-EM Trees: A Data Mining Approach for Longitudinal and Clustered Data. Mach. Learn. 2012, 86, 169–207. [Google Scholar] [CrossRef]
- Ankenman, B.E.; Avilés, A.I.; Pinheiro, J.C. Optimal Designs for Mixed-Effects Models with Two Random Nested Factors. Stat. Sin. 2003, 13, 385–401. [Google Scholar]
- Snijders, T.; Bosker, R. Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling. 2012. Available online: https://www.stats.ox.ac.uk/~snijders/mlbook.htm (accessed on 30 August 2024).
- Bates, D.; Mächler, M.; Bolker, B.; Walker, S. Fitting Linear Mixed-Effects Models Using Lme4. J. Stat. Softw. 2015, 67, 1–48. [Google Scholar] [CrossRef]
- Zuur, A.F.; Ieno, E.N.; Walker, N.J.; Saveliev, A.A.; Smith, G.M. Mixed Effects Modelling for Nested Data. In Mixed Effects Models and Extensions in Ecology With R; Zuur, A.F., Ieno, E.N., Walker, N., Saveliev, A.A., Smith, G.M., Eds.; Springer: New York, NY, USA, 2009; pp. 101–142. ISBN 978-0-387-87458-6. [Google Scholar]
- Bauer, D.J.; McNeish, D.M.; Baldwin, S.A.; Curran, P.J. Analyzing Nested Data: Multilevel Modeling and Alternative Approaches. In The Cambridge Handbook of Research Methods in Clinical Psychology; Cambridge Handbooks in Psychology; Cambridge University Press: New York, NY, USA, 2020; pp. 426–443. ISBN 978-1-316-63952-8. [Google Scholar]
- Fernández-Castilla, B.; Jamshidi, L.; Declercq, L.; Beretvas, S.N.; Onghena, P.; Van den Noortgate, W. The Application of Meta-Analytic (Multi-Level) Models with Multiple Random Effects: A Systematic Review. Behav. Res. 2020, 52, 2031–2052. [Google Scholar] [CrossRef] [PubMed]
- Rasouli, B.; Chubak, J.; Floyd, J.S.; Psaty, B.M.; Nguyen, M.; Walker, R.L.; Wiggins, K.L.; Logan, R.W.; Danaei, G. Combining High Quality Data with Rigorous Methods: Emulation of a Target Trial Using Electronic Health Records and a Nested Case-Control Design. BMJ 2023, 383, e072346. [Google Scholar] [CrossRef]
- Ioannidis, J.P.A.; Adami, H.-O. Nested Randomized Trials in Large Cohorts and Biobanks: Studying the Health Effects of Lifestyle Factors. Epidemiology 2008, 19, 75. [Google Scholar] [CrossRef]
- Koczkodaj, W.W.; Kakiashvili, T.; Szymańska, A.; Montero-Marin, J.; Araya, R.; Garcia-Campayo, J.; Rutkowski, K.; Strzałka, D. How to Reduce the Number of Rating Scale Items without Predictability Loss~. Scientometrics 2017, 111, 581–593. [Google Scholar] [CrossRef]
Model Category | ECE | AUC | Brier | F1 | Net Benefit | CEM | CEM Lower 95% CI | CEM Upper 95% CI |
---|---|---|---|---|---|---|---|---|
standardized Xgboost BME | 0.998 | 0.854 | 0.977 | 0.293 | 0.908 | 0.739 | 0.7391 | 0.7397 |
unstandardized Xgboost BME | 0.997 | 0.854 | 0.977 | 0.294 | 0.908 | 0.740 | 0.7396 | 0.7402 |
standardized Xgboost NC | 0.997 | 0.854 | 0.977 | 0.295 | 0.908 | 0.741 | 0.7405 | 0.7411 |
unstandardized Xgboost NC | 0.997 | 0.854 | 0.977 | 0.293 | 0.908 | 0.740 | 0.7394 | 0.7400 |
Model Category | ECE | AUC | Brier | F1 | Net Benefit | CEM | CEM Lower 95% CI | CEM Upper 95% CI |
---|---|---|---|---|---|---|---|---|
standardized glmer | 0.993 | 0.827 | 0.973 | 0.269 | 0.889 | 0.719 | 0.7182 | 0.7188 |
unstandardized glmer | 0.993 | 0.827 | 0.973 | 0.269 | 0.889 | 0.718 | 0.7178 | 0.7184 |
unstandardized glm | 0.994 | 0.826 | 0.973 | 0.270 | 0.889 | 0.719 | 0.7183 | 0.7189 |
standardized glm | 0.994 | 0.826 | 0.973 | 0.269 | 0.889 | 0.718 | 0.7181 | 0.7187 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dong, T.; Oronti, I.B.; Sinha, S.; Freitas, A.; Zhai, B.; Chan, J.; Fudulu, D.P.; Caputo, M.; Angelini, G.D. Enhancing Cardiovascular Risk Prediction: Development of an Advanced Xgboost Model with Hospital-Level Random Effects. Bioengineering 2024, 11, 1039. https://doi.org/10.3390/bioengineering11101039
Dong T, Oronti IB, Sinha S, Freitas A, Zhai B, Chan J, Fudulu DP, Caputo M, Angelini GD. Enhancing Cardiovascular Risk Prediction: Development of an Advanced Xgboost Model with Hospital-Level Random Effects. Bioengineering. 2024; 11(10):1039. https://doi.org/10.3390/bioengineering11101039
Chicago/Turabian StyleDong, Tim, Iyabosola Busola Oronti, Shubhra Sinha, Alberto Freitas, Bing Zhai, Jeremy Chan, Daniel P. Fudulu, Massimo Caputo, and Gianni D. Angelini. 2024. "Enhancing Cardiovascular Risk Prediction: Development of an Advanced Xgboost Model with Hospital-Level Random Effects" Bioengineering 11, no. 10: 1039. https://doi.org/10.3390/bioengineering11101039
APA StyleDong, T., Oronti, I. B., Sinha, S., Freitas, A., Zhai, B., Chan, J., Fudulu, D. P., Caputo, M., & Angelini, G. D. (2024). Enhancing Cardiovascular Risk Prediction: Development of an Advanced Xgboost Model with Hospital-Level Random Effects. Bioengineering, 11(10), 1039. https://doi.org/10.3390/bioengineering11101039