XGBoost and SHAP-Based Analysis of Risk Factors for Hypertension Classification in Korean Postmenopausal Women
Abstract
:1. Introduction
- To analyze risk factors for hypertension in postmenopausal women using machine learning (ML) and explainable artificial intelligence (XAI) techniques.
- To identify factors that directly affect the development of hypertension by minimizing the confounding effects of other diseases in a group of postmenopausal women in whom all clinical parameters except blood pressure were within the normal range.
- To evaluate the effect of changes in waist circumference (WC), a modifiable risk factor, on blood pressure prediction using SHAP (Shapley Additive exPlanations), and to explore the potential for developing personalized health management strategies based on this analysis.
2. Related Work
3. Materials and Methods
3.1. Dataset
3.1.1. Study Population
3.1.2. Definition of Variables
3.2. Data Preprocessing
3.2.1. Data Normalization
3.2.2. Balancing Data
3.3. Feature Importance Analysis
3.3.1. Explainable Artificial Intelligence (XAI)
3.3.2. Shapley Additive exPlanations (SHAP)
3.4. Performance Evaluations
3.5. Algorithmic Enhancement
4. Results
4.1. Baseline Characteristics and Correlation Analysis
4.2. Hypertension Classification Performance
4.3. SHAP-Based Feature Importance and Local Insights for Blood Pressure Prediction
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
ALT | Alanine aminotransferase |
ANN | Artificial neural network |
AST | Aspartate aminotransferase |
BMI | Body mass index |
BST | Fasting glucose |
CVD | Cardiovascular disease |
DBP | Diastolic blood pressure |
Hb | Hemoglobin |
HDL-C | High-density lipoprotein cholesterol |
LDL-C | Low-density lipoprotein cholesterol |
ML | Machine learning |
PP | Pulse pressure |
rGTP | Gamma-glutamyl transpeptidase |
SBP | Systolic blood pressure |
SHAP | Shapley Additive exPlanations |
SVM | Support vector machine |
TC | Total cholesterol |
TG | Triglyceride |
WC | Waist circumference |
XAI | Explainable artificial intelligence |
XGBoost | eXtreme Gradient Boosting |
References
- World Health Organization. Global Report on Hypertension; World Health Organization: Geneva, Switzerland, 2023.
- The Korean Society of Hypertension. Korea Hypertension Fact Sheet; The Korean Society of Hypertension Seoul: Seoul, Republic of Korea, 2024. [Google Scholar]
- Barton, M.; Meyer, M.R. Postmenopausal hypertension: Mechanisms and therapy. Hypertension 2009, 54, 11–18. [Google Scholar] [CrossRef] [PubMed]
- Delanerolle, G.; Phiri, P.; Elneil, S.; Talaulikar, V.; Eleje, G.U.; Kareem, R.; Shetty, A.; Saraswath, L.; Kurmi, O.; Benetti-Pinto, C.L.; et al. Menopause: A global health and wellbeing issue that needs urgent attention. Lancet Glob. Health 2025, 13, e196–e198. [Google Scholar] [CrossRef] [PubMed]
- Samaan, S.A.; Crawford, M.H. Estrogen and cardiovascular function after menopause. J. Am. Coll. Cardiol. 1995, 26, 1403–1410. [Google Scholar] [CrossRef] [PubMed]
- Portaluppi, F.; Pansini, F.; Manfredini, R.; Mollica, G. Relative influence of menopausal status, age, and body mass index on blood pressure. Hypertension 1997, 29, 976–979. [Google Scholar] [CrossRef]
- Zanchetti, A.; Facchetti, R.; Cesana, G.C.; Modena, M.G.; Pirrelli, A.; Sega, R.; on behalf of the SIMONA participants. Menopause-related blood pressure increase and its relationship to age and body mass index: The SIMONA epidemiological study. J. Hypertens. 2005, 23, 2269–2276. [Google Scholar] [CrossRef]
- Tao, Z.; Qu, Q.; Li, J.; Li, X. Factors influencing blood pressure variability in postmenopausal women: Evidence from the China Health and Nutrition Survey. Clin. Exp. Hypertens. 2023, 45, 2181356. [Google Scholar] [CrossRef]
- Begum, P.; Richardson, C.E.; Carmichael, A.R. Obesity in post menopausal women with a family history of breast cancer: Prevalence and risk awareness. Int. Semin. Surg. Oncol. 2009, 6, 1. [Google Scholar] [CrossRef]
- Lambrinoudaki, I.; Brincat, M.; Erel, C.T.; Gambacciani, M.; Moen, M.H.; Schenck-Gustafsson, K.; Tremollieres, F.; Vujovic, S.; Rees, M.; Rozenberg, S. EMAS position statement: Managing obese postmenopausal women. Maturitas 2010, 66, 323–326. [Google Scholar] [CrossRef]
- Cifkova, R.; Pitha, J.; Lejskova, M.; Lanska, V.; Zecova, S. Blood pressure around the menopause: A population study. J. Hypertens. 2008, 26, 1976–1982. [Google Scholar] [CrossRef]
- Khitan, Z.; Dial, L.; Santhanam, P. Predictors of systolic blood pressure in post-menopausal euthyroid women: A study of the NHANES continuous survey data 2007–2012. Post. Reprod. Health 2015, 21, 75–76. [Google Scholar] [CrossRef]
- Zhang, X.; Ouyang, Y.; Huang, F.; Zhang, J.; Su, C.; Jia, X.; Du, W.; Li, L.; Bai, J.; Zhang, B.; et al. Modifiable factors of 20-year blood pressure trajectories among normotensives and their associations with Hypertension: A prospective study. Br. J. Nutr. 2022, 128, 252–262. [Google Scholar] [CrossRef] [PubMed]
- Park, J.K.; Lim, Y.H.; Kim, K.S.; Kim, S.G.; Kim, J.H.; Lim, H.G.; Shin, J. Changes in body fat distribution through menopause increase blood pressure independently of total body fat in middle-aged women: The Korean National Health and Nutrition Examination Survey 2007–2010. Hypertens. Res. 2013, 36, 444–449. [Google Scholar] [CrossRef] [PubMed]
- Guarneros-Nolasco, L.R.; Cruz-Ramos, N.A.; Alor-Hernández, G.; Rodríguez-Mazahua, L.; Sánchez-Cervantes, J.L. Identifying the Main Risk Factors for Cardiovascular Diseases Prediction Using Machine Learning Algorithms. Mathematics 2021, 9, 2537. [Google Scholar] [CrossRef]
- Shirley, M.E.; Kasujja, N.H.; Marvin, G. Shapley Additive Explanations (SHAP) for Cardiovascular Diseases Prediction. In Proceedings of the 2024 2nd International Conference on Sustainable Computing and Smart Systems (ICSCSS), Tamil Nadu, India, 10–12 July 2024; pp. 1429–1437. [Google Scholar]
- Shantal, M.; Othman, Z.; Bakar, A.A. A Novel Approach for Data Feature Weighting Using Correlation Coefficients and Min–Max Normalization. Symmetry 2023, 15, 2185. [Google Scholar] [CrossRef]
- Yang, C.; Fridgeirsson, E.A.; Kors, J.A.; Reps, J.M.; Rijnbeek, P.R. Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data. J. Big Data 2024, 11, 7. [Google Scholar] [CrossRef]
- Adadi, A.; Berrada, M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
- Hicks, S.A.; Strumke, I.; Thambawita, V.; Hammou, M.; Riegler, M.A.; Halvorsen, P.; Parasa, S. On evaluation metrics for medical applications of artificial intelligence. Sci. Rep. 2022, 12, 5979. [Google Scholar] [CrossRef]
- Wong, T.T.; Yeh, P.Y. Reliable Accuracy Estimates from k-Fold Cross Validation. IEEE Trans. Knowl. Data Eng. 2020, 32, 1586–1594. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Gold, E.B.; Bromberger, J.; Crawford, S.; Samuels, S.; Greendale, G.A.; Harlow, S.D.; Skurnick, J. Factors associated with age at natural menopause in a multiethnic sample of midlife women. Am. J. Epidemiol. 2001, 153, 865–874. [Google Scholar] [CrossRef] [PubMed]
- Beckett, N.S.; Peters, R.; Fletcher, A.E.; Staessen, J.A.; Liu, L.; Dumitrascu, D.; Stoyanovsky, V.; Antikainen, R.L.; Nikitin, Y.; Anderson, C.; et al. Treatment of hypertension in patients 80 years of age or older. N. Engl. J. Med. 2008, 358, 1887–1898. [Google Scholar] [CrossRef] [PubMed]
- Harvey, A.; Montezano, A.C.; Touyz, R.M. Vascular biology of ageing-Implications in hypertension. J. Mol. Cell. Cardiol. 2015, 83, 112–121. [Google Scholar] [CrossRef]
- Xue, R.; Li, Q.; Geng, Y.; Wang, H.; Wang, F.; Zhang, S. Abdominal obesity and risk of CVD: A dose-response meta-analysis of thirty-one prospective studies. Br. J. Nutr. 2021, 126, 1420–1430. [Google Scholar] [CrossRef] [PubMed]
- Lee, S.Y.; Park, H.S.; Kim, D.J.; Han, J.H.; Kim, S.M.; Cho, G.J.; Kim, D.Y.; Kwon, H.S.; Kim, S.R.; Lee, C.B.; et al. Appropriate waist circumference cutoff points for central obesity in Korean adults. Diabetes Res. Clin. Pract. 2007, 75, 72–80. [Google Scholar] [CrossRef]
- Huxley, R.; Barzi, F.; Lee, C.M.; Lear, S.; Shaw, J.; Lam, T.H.; Caterson, I.; Azizi, F.; Patel, J. Waist circumference thresholds provide an accurate and widely applicable method for the discrimination of diabetes. Diabetes Care 2007, 30, 3116–3118. [Google Scholar] [CrossRef]
- D’Agostino, R.B., Sr.; Vasan, R.S.; Pencina, M.J.; Wolf, P.A.; Cobain, M.; Massaro, J.M.; Kannel, W.B. General cardiovascular risk profile for use in primary care: The Framingham Heart Study. Circulation 2008, 117, 743–753. [Google Scholar] [CrossRef]
- World Health Organization. Waist Circumference and Waist–Hip Ratio: Report of a WHO Expert Consultation, Geneva, 8–11 December 2008; World Health Organization: Geneva, Switzerland, 2011.
- Narisawa, S.; Nakamura, K.; Kato, K.; Yamada, K.; Sasaki, J.; Yamamoto, M. Appropriate waist circumference cutoff values for persons with multiple cardiovascular risk factors in Japan: A large cross-sectional study. J. Epidemiol. 2008, 18, 37–42. [Google Scholar] [CrossRef]
- Okosun, I.S.; Rotimi, C.N.; Forrester, T.E.; Fraser, H.; Osotimehin, B.; Muna, W.F.; Cooper, R.S. Predictive value of abdominal obesity cut-off points for hypertension in blacks from west African and Caribbean island nations. Int. J. Obes. Relat. Metab. Disord. 2000, 24, 180–186. [Google Scholar] [CrossRef]
- Auffray, C.; Charron, D.; Hood, L. Predictive, preventive, personalized and participatory medicine: Back to the future. Genome Med. 2010, 2, 57. [Google Scholar] [CrossRef]
- Huang, Y.Q.; Liu, X.C.; Lo, K.; Liu, L.; Yu, Y.L.; Chen, C.L.; Huang, J.Y.; Feng, Y.Q.; Zhang, B. The U Shaped Relationship Between High-Density Lipoprotein Cholesterol and All-Cause or Cause-Specific Mortality in Adult Population. Clin. Interv. Aging 2020, 15, 1883–1896. [Google Scholar] [CrossRef] [PubMed]
- Ference, B.A.; Ginsberg, H.N.; Graham, I.; Ray, K.K.; Packard, C.J.; Bruckert, E.; Hegele, R.A.; Krauss, R.M.; Raal, F.J.; Schunkert, H.; et al. Low-density lipoproteins cause atherosclerotic cardiovascular disease. 1. Evidence from genetic, epidemiologic, and clinical studies. A consensus statement from the European Atherosclerosis Society Consensus Panel. Eur. Heart J. 2017, 38, 2459–2472. [Google Scholar] [CrossRef] [PubMed]
- Sun, J.Y.; Hua, Y.; Zou, H.Y.; Qu, Q.; Yuan, Y.; Sun, G.Z.; Sun, W.; Kong, X.Q. Association Between Waist Circumference and the Prevalence of (Pre) Hypertension Among 27,894 US Adults. Front. Cardiovasc. Med. 2021, 8, 717257. [Google Scholar] [CrossRef] [PubMed]
- Valensi, P.; Avignon, A.; Sultan, A.; Chanu, B.; Nguyen, M.T.; Cosson, E. Atherogenic dyslipidemia and risk of silent coronary artery disease in asymptomatic patients with type 2 diabetes: A cross-sectional study. Cardiovasc. Diabetol. 2016, 15, 104. [Google Scholar] [CrossRef]
- Loaiza-Betancur, A.F.; Chulvi-Medrano, I.; Díaz-López, V.A.; Gómez-Tomás, C. The effect of exercise training on blood pressure in menopause and postmenopausal women: A systematic review of randomized controlled trials. Maturitas 2021, 149, 40–55. [Google Scholar] [CrossRef]
- Simkin-Silverman, L.R.; Rena, R.R.; Boraz, M.A.; Kuller, L.H. Lifestyle intervention can prevent weight gain during menopause: Results from a 5-year randomized clinical trial. Ann. Behav. Med. 2003, 26, 212–220. [Google Scholar] [CrossRef]
Measured Value | Reference Range |
---|---|
Body Mass Index (BMI) | <25 kg/m2 |
Waist Circumference (WC) | >85 cm |
Fasting Glucose (BST) | 80–130 mg/dL |
Total Cholesterol (TC) | 150–250 mg/dL |
Triglyceride (TG) | 30–135 mg/dL |
High-Density Lipoprotein Cholesterol (HDL-C) | 30–65 mg/dL |
Low-Density Lipoprotein Cholesterol (LDL-C) | 70–169 mg/dL |
Hemoglobin (Hb) | 12.5–15.5 g/dL |
Aspartate Aminotransferase (AST) | 0–40 IU/L |
Alanine Aminotransferase (ALT) | 0–40 IU/L |
Gamma-Glutamyl Transpeptidase (rGTP) | 8–35 IU/L |
Creatinine | 0.8–1.7 mg/dL |
Normotensive | Hypertensive | p Value | |||
---|---|---|---|---|---|
(N = 2487) | (N = 752) | ||||
Age (years) | 61.12 | ±5.334 | 63.58 | ±6.152 | <0.001 |
BMI 1 (kg/m2) | 21.72 | ±1.865 | 22.10 | ±1.708 | <0.001 |
Height (cm) | 154.03 | ±5.530 | 152.86 | ±5.700 | <0.001 |
Weight (kg) | 51.59 | ±5.328 | 51.68 | ±4.956 | 0.660 |
WC 2 (cm) | 74.11 | ±5.512 | 75.52 | ±5.104 | <0.001 |
Systolic BP (mmHg) | 118.81 | ±11.492 | 146.68 | ±10.577 | <0.001 |
Diastolic BP (mmHg) | 72.03 | ±8.229 | 86.20 | ±8.851 | <0.001 |
Fasting glucose (mg/dL) | 97.63 | ±10.019 | 99.37 | ±10.709 | <0.001 |
TC 3 (mg/dL) | 195.55 | ±25.646 | 193.67 | ±25.922 | 0.080 |
TG 4 (mg/dL) | 88.13 | ±25.687 | 90.78 | ±24.377 | 0.012 |
HDL-C 5 (mg/dL) | 55.15 | ±6.980 | 55.46 | ±6.992 | 0.276 |
LDL-C 6 (mg/dL) | 122.59 | ±24.882 | 119.79 | ±25.382 | 0.007 |
Hemoglobin (g/dL) | 13.51 | ±0.680 | 13.56 | ±0.699 | 0.053 |
AST 7 (mg/dL) | 25.05 | ±5.375 | 24.93 | ±5.496 | 0.618 |
ALT 8 (U/L) | 19.05 | ±6.188 | 18.86 | ±6.301 | 0.469 |
rGTP 9 (U/L) | 17.74 | ±5.690 | 17.87 | ±5.851 | 0.580 |
Creatinine (mg/dL) | 0.86 | ±0.084 | 0.86 | ±0.094 | 0.473 |
Algorithm | Parameters Settings |
---|---|
XGBoost 1 | max_depth = 6, n_estimators = 200, learning_rate = 0.1 |
SVM 2 | c = 10, kernel = rbf, gamma = scale |
ANN 3 | activation = relu, solver = adam, hidden_layer_sizes = (50, 50) |
Classifier | Accuracy | Specificity | Sensitivity | Precision | AUC | F1-Score | MCC 1 |
---|---|---|---|---|---|---|---|
XGBoost | 84.73 | 78.09 | 92.43 | 78.44 | 92.12 | 84.86 | 0.71 |
SVM | 65.71 | 60.84 | 71.35 | 61.11 | 72.47 | 65.84 | 0.32 |
ANN | 75.09 | 71.56 | 79.19 | 70.60 | 81.03 | 74.65 | 0.51 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, H.; Khomidov, M.; Lee, J.-H. XGBoost and SHAP-Based Analysis of Risk Factors for Hypertension Classification in Korean Postmenopausal Women. Bioengineering 2025, 12, 659. https://doi.org/10.3390/bioengineering12060659
Kim H, Khomidov M, Lee J-H. XGBoost and SHAP-Based Analysis of Risk Factors for Hypertension Classification in Korean Postmenopausal Women. Bioengineering. 2025; 12(6):659. https://doi.org/10.3390/bioengineering12060659
Chicago/Turabian StyleKim, Hojeong, Mavlonbek Khomidov, and Jong-Ha Lee. 2025. "XGBoost and SHAP-Based Analysis of Risk Factors for Hypertension Classification in Korean Postmenopausal Women" Bioengineering 12, no. 6: 659. https://doi.org/10.3390/bioengineering12060659
APA StyleKim, H., Khomidov, M., & Lee, J.-H. (2025). XGBoost and SHAP-Based Analysis of Risk Factors for Hypertension Classification in Korean Postmenopausal Women. Bioengineering, 12(6), 659. https://doi.org/10.3390/bioengineering12060659