Next Article in Journal
Prospects of Novel and Repurposed Immunomodulatory Drugs against Acute Respiratory Distress Syndrome (ARDS) Associated with COVID-19 Disease
Previous Article in Journal
Inflammatory Cells Can Alter the Levels of H3K9ac and γH2AX in Dysplastic Cells and Favor Tumor Phenotype
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Predictive Model of Ischemic Heart Disease in Middle-Aged and Older Women Using Data Mining Technique

Department of Health Care & Science, Donga University, Busan 49315, Republic of Korea
J. Pers. Med. 2023, 13(4), 663; https://doi.org/10.3390/jpm13040663
Submission received: 18 February 2023 / Revised: 5 April 2023 / Accepted: 12 April 2023 / Published: 13 April 2023
(This article belongs to the Section Epidemiology)

Abstract

:
This study was conducted to identify ischemic heart disease-related factors and vulnerable groups in Korean middle-aged and older women using data from the Korea National Health and Nutrition Examination Survey (KNHANES). Among the 24,229 people who participated in the 2017–2019 survey, 7249 middle-aged women aged 40 and over were included in the final analysis. The data were analyzed using IBM SPSS and SAS Enterprise Miner by chi-squared analysis, logistic regression analysis, and decision tree analysis. The prevalence of ischemic heart disease in the study results was 2.77%, including those diagnosed with myocardial infarction or angina. The factors associated with ischemic heart disease in middle-aged and older women were identified as age, family history, hypertension, dyslipidemia, stroke, arthritis, and depression. The group most vulnerable to ischemic heart disease included women who had hypertension, a family history of ischemic heart disease, and were menopausal. Based on these results, effective management should be achieved by applying customized medical services and health management services for each relevant factor in consideration of the characteristics of the groups with potential risks. This study can be used as basic data that can be helpful in national policy decision making for the management of chronic diseases.

1. Introduction

Ischemic heart disease is known as a representative disease with a high social burden that causes much death and disability [1]. The prevalence of cardiovascular disease is rapidly increasing in women after the age of 40 due to changes in women’s hormones related to menopause, physical changes related to aging, and increased fat accumulation [2]. Previous studies reported that women with risk factors for cardiovascular disease had a 19% increase in the incidence of myocardial infarction after 10 years and that the quality of life of middle-aged women with cardiovascular disease was poor [3,4]. The life expectancy of Korean women in 2015 was 85.2 years [5], and the prevention and management of cardiovascular disease in middle-aged women are very important to prepare for a healthy old age. According to the 2017 Statistical Annual Report of Causes of Death in Korea, cardiovascular disease is the second-highest cause of death after cancer, and the mortality rate from cardiovascular disease also tends to increase sharply with increasing age. Particularly, hypertension disease (2.3 times) and heart disease (1.1 times) showed higher mortality rates among women than men [6]. Cardiovascular disease is a major chronic disease. Chronic diseases have various causes but no direct cause, making early diagnosis difficult. In addition, the time of disease onset is unclear and the latent period is long [7]. Therefore, since prevention is emphasized more than treatment in chronic diseases, if the characteristics of a group at high risk of cardiovascular disease can be identified and customized interventions suitable for each characteristic can be provided, the prevention and management of cardiovascular disease will be effective.
The risk factors for cardiovascular disease identified in previous studies included gender, age, marital status, income, education, diabetes, hypercholesterolemia, family history, smoking, drinking, obesity, lack of physical activity, and stress [8,9,10,11,12,13,14,15,16,17,18,19]. However, these studies investigated the incidence of cardiovascular disease related to a few factors by focusing on specific groups such as young men and the elderly as study subjects [8,9,12,13,14,15,16,18,19]. Studies that have confirmed various characteristics in the groups vulnerable to cardiovascular disease are lacking. Therefore, to understand the characteristics of the groups vulnerable to ischemic heart disease, research using data mining techniques is required.
Data mining technology allows for the exploration, identification, and modeling of the relationships and rules that exist in big data [20]. Recently, research methods using data mining have been used in various fields such as medical research, diagnosis, quality control, hospital management, and customer relationship management in the medical field [21,22,23]. Decision tree analysis, one data mining technique, is an effective tool for classification and prediction; therefore, it is useful for discovering hidden patterns in data [24]. Predicting cardiovascular disease risk using decision support systems can play an important role in disease prevention [24].
In this study, we intended to analyze the factors related to ischemic heart disease in middle-aged women using the Korea National Health and Nutrition Examination Survey (KNHANES) data, which is representative of the Korean middle-aged and older women population, and develop an ischemic heart disease prediction model. The specific study purposes were as follows.
  • Identify the sociodemographic characteristics and health-related behavior characteristics of the study subjects.
  • Identify the differences in the prevalence of ischemic heart disease according to social demographic characteristics and health-related behaviors, and the presence of chronic diseases.
  • Identify the factors that affected ischemic heart disease in middle-aged women.
  • Utilizing data mining techniques, develop a predictive model for ischemic heart disease in middle-aged women.
The results of these studies can be used as important foundational data for regional and national health policy decisions for the prevention and management of ischemic heart disease.

2. Materials and Methods

2.1. Study Population

In this study, the raw data of the Korea National Health and Nutrition Examination Survey (KNHANES) (2017–2019), which was a statutory survey based on Article 16 of the National Health Promotion Law, were utilized. Data use was approved according to the procedures of the Korea Disease Control and Prevention Agency (KDCA). The target population of the KNHANES was one year old or older residing in Korea, and a two-step stratified cluster sampling method using the survey district and household as the primary and secondary extraction units, respectively, was applied. The number of survey districts was 192 and there were 23 sampled households. Within the sampled households, all household members aged one year or older who satisfied the appropriate household size were selected as survey subjects. Among the 24,229 people who participated in the 2017–2019 survey, 7249 middle-aged women aged 40 and over were included in the final analysis after excluding missing data (Figure 1). The age groups of the study subjects were 40–49 years old (1822 people), 50–59 years old (1966 people), 60–69 years old (1761 people), 70–79 years old (1257 people), and 80 years old and over (443 people).
The study was conducted in accordance with the Declaration of Helsinki. Ethical review and approval were waived for this study because it used anonymous public open data and not an individual’s personal data.

2.2. Variable Definitions

The dependent variable for the presence or absence of ischemic heart disease utilized the answer to the question “Have you ever been diagnosed with myocardial infarction or angina by your doctor?” The characteristics of the study subjects were classified into sociodemographic factors, health behavior factors, and clinical factors. The independent variables for sociodemographic characteristics were age, marital status, education, household income, subjective health status, and stress. Marital status was divided into married and unmarried, and education levels were classified as less than elementary school, middle school, high school, and college or higher. Household income levels were divided into categories (lower, lower middle, upper middle, and upper) based on the quartile of household equalization income. The independent variables for health behavior characteristics were smoking, alcohol drinking, and physical activity. Smoking status was divided into daily smoking, occasionally smoking, past smoker, and non-smoker. Drinking was divided according to the classification of the raw data, with and without the experience of drinking alcohol for a lifetime. The physical activity variable used the response to the question “Does your work or leisure activity involve moderate-intensity physical activity with a slight shortness of breath or moderately rapid heart rate for at least 10 min?” The clinical characteristic variables were composed of body mass index (BMI), menopause, family history, hypertension, stroke, arthritis, diabetes mellitus, depression, renal failure, and dyslipidemia, with reference to previous studies [7,10,24,25,26]. BMI was classified as underweight for a value of less than 18.5 kg/m2, normal for 18.5 kg/m2 or more and less than 25.0 kg/m2, and obesity for 25.0 kg/m2 or more. Family history was defined as when at least one parent or sibling had a history of ischemic heart disease.

2.3. Statistical Analysis

The data were analyzed using IBM SPSS version 25.0 (IBM Co., Armonk, NY, USA) and SAS Enterprise Miner 9.4. To observe the differences in the prevalence of ischemic heart disease according to social demographic characteristics, health-related behaviors, and the presence of chronic diseases, a chi-squared analysis was conducted. Logistic regression analysis was performed to identify the factors influencing the prevalence of ischemic heart disease. The statistical significance level was set as a two-sided test of p < 0.05. An interactive decision tree analysis and random forest analysis were generated to develop a predictive model of ischemic heart disease.

3. Results

3.1. General Characteristics of the Study Regions

The distribution of ischemic heart disease cases according to the general characteristics is described in Table 1. The prevalence of ischemic heart disease was high among women aged 80 years (7.7%) and over and those with a low educational attainment of less than elementary school (5.6%) (Table 1). It was found that the presence of ischemic heart disease was high in women with low household incomes (5.3%), women who experienced very poor subjective health (9.1%), and women who experienced a lot of stress (4.8%) (Table 1). The prevalence of ischemic heart disease according to age, education level, household income, subjective health status, and stress awareness was statistically significantly different (p < 0.05) (Table 1).

3.2. Health Behavior and Clinical Characteristics of the Study Regions

The prevalence of ischemic heart disease according to health behavior characteristics is shown in Table 2. Among the variables of smoking, drinking, and physical activity, there was a statistically significant difference only in the prevalence of ischemic heart disease with or without drinking experience (Table 2).
The distribution of ischemic heart disease cases according to the clinical characteristics is described in Table 3. The presence of ischemic heart disease was high among obese (4.1%) and menopausal women (3.8%), and those with a family history of ischemic heart disease (5.0%) (Table 3). Moreover, the prevalence of ischemic heart disease was statistically significantly higher if there were comorbidities including hypertension, dyslipidemia, stroke, arthritis, diabetes mellitus, depression, and renal failure (p < 0.05) (Table 3).

3.3. Predictive Factors of Ischemic Heart Disease

Logistic regression analysis was performed to identify the factors related to ischemic heart disease in middle-aged women (Table 4). The analysis showed that ischemic heart disease in middle-aged women was significantly associated with age, physical leisure activity, family history, hypertension, dyslipidemia, stroke, arthritis, and depression (Table 4). The incidence of ischemic heart disease was 16.73 times higher in people over 80 years old than in those 40–49 years old. The incidence of ischemic heart disease in those with a family history was 3.29 times (95% confidence interval (CI): 2.03–5.32) higher than in those without a family history, and 1.42 times (95% CI: 1.01–2.00) higher in patients with hypertension than in those without hypertension (Table 4). The risk of ischemic heart disease was more than 1.70 times (95% CI: 1.24–2.33) higher in patients with dyslipidemia and more than 1.93 times (95% CI: 1.18–3.18) higher in those with a previous stroke. The risk of ischemic heart disease was more than 1.43 times (95% CI: 1.05–1.94) higher in patients with arthritis and more than 1.66 times (95% CI 1.09–2.51) higher in patients with depression (Table 4).
Decision tree analysis was performed to identify the ischemic heart disease risk group in the study subjects. As for the method of growing the trees, the classification and regression tree (CRT) method was applied to maximize homogeneity within the child nodes by separating them to be as homogeneous as possible within the child nodes (Figure 2). At the researcher’s discretion, we presented an interactive decision tree analysis focusing on health behavior and clinical characteristics, excluding age variables that were too strongly associated in logistic regression analysis. As a result of the analysis, a total of eight nodes were separated based on the terminal node, and the seventh node (16.67%) was found to be the most vulnerable to ischemic heart disease (Figure 2). The seventh node was a patient with hypertension, a family history of ischemic heart disease, and menopause. The group with hypertension had a higher risk of developing ischemic heart disease than the group without hypertension, and the group without hypertension had the highest risk of developing ischemic heart disease at the eleventh node at 10.47% in patients with diabetes and arthritis (Figure 2).
An interactive decision tree analysis and random forest analysis were generated to develop a predictive model of ischemic heart disease. The results of the random forest algorithm analyzed to predict the presence or absence of ischemic heart disease are shown in Table 5. As a result of the random forest analysis, the important variables in predicting ischemic heart disease response were age, dyslipidemia, education level, arthritis, hypertension, diabetes, depression, family history, menopause, and stroke, in that order (Table 5). For modeling comparison, logistic regression, decision trees, and random forest algorithms were used to compare prediction models for each dependent variable. The sensitivity, specificity, and accuracy of each model were confirmed, and the model was evaluated using AUC. For the AUC value, the closer the area of the ROC curve is to 1, the better the performance of the model. If the AUC value is 0.8 or more, it is evaluated as a stable model, and the AUC values of all three prediction models presented in this study showed 0.8 or more. The AUC value has the highest value at 0.872 in random forest. All of the models’ accuracy, sensitivity, and specificity showed the highest values in random forest (Table 6).

4. Discussion

With the development of medical technology, life expectancy has increased, and women spend more than a third of their lives after middle age. The middle-aged period of women is the beginning period of before and just after the onset of menopause, and since health management after middle-age is closely related to the quality of life, active health management is necessary [27]. Therefore, this study was performed to contribute to the prevention and management of ischemic heart disease for health promotion by identifying the factors related to ischemic heart disease in middle-aged and older Korean women and identifying the vulnerable group with a high prevalence of ischemic heart disease.
The prevalence of ischemic heart disease in the study was 2.77%, including those diagnosed with myocardial infarction or angina. It was slightly higher than the results of previous studies [28], which suggested that about 1.72% of the world’s population is affected by ischemic heart disease. When the prevalence of ischemic heart disease was compared by age, it increased rapidly after 60 years old compared to those 40–49 years old. Previous studies have also shown that cardiovascular disease increased rapidly after 50 years old [29,30]. In particular, it is known that as women transition from middle age to old age, the incidence of cardiovascular disease increases due to changes in women hormones, physical changes according to aging, and an increase in body fat accumulation [2,27].
In this study, family history, hypertension, dyslipidemia, stroke, arthritis, and depression were found to be statistically significant as clinical factors affecting ischemic heart disease, and smoking, drinking, and physical activity were not related factors. Since the association was investigated in middle-aged and older women, the results differed from previous studies [9,11,29,31,32] where smoking, drinking, and physical activity were associated with ischemic heart disease. Previous studies were conducted on both men and women with cardiovascular disease [9] and on women in their 30s or older [11], and it is thought that the results were different because they were more than the data set of cardiovascular disease patients used in this study. According to a previous study by Lim [33] using machine learning, the major risk factors affecting the occurrence of myocardial infarction and angina were age, hypertension, dyslipidemia, family history, low educational background, and gender, consistent with the results of this study. The diseases identified as risk factors for cardiovascular in this study were hypertension, dyslipidemia, stroke, arthritis, and depression. However, since it is difficult to clearly identify a causal relationship in a cross-sectional study, it is also possible that individuals with ischemic heart disease may have had high prevalence of risk factors for comorbidities such as hypertension and dyslipidemia due to more frequent health care encounters and screening opportunities. Diabetes and renal failure did not show a statistically significant association. These results were similar to previous studies [12,14,26,34,35] reporting that the cardiovascular disease risk factors depression and rheumatoid arthritis were significantly higher in women than men, and diabetes was statistically significantly higher in men. In a study by Seo et al. [36], Korean adults with depression had a higher prevalence of cardiovascular disease than those without depression, and a previous study confirmed depression as a significant cardiovascular disease risk factor in women compared to men [35]. Decreased renal function may increase the prevalence of cardiovascular disease and increase mortality [37]. However, in this study, it was not a risk factor in middle-aged and older women.
As a result of the decision tree analysis to identify the groups vulnerable to ischemic heart disease, hypertension and family history were derived as the most relevant factors, consistent with the regression analysis. Focusing on hypertension, which is the biggest influencing factor, those who had hypertension, a family history, and were menopausal (16.67%), and those who had hypertension, no family history, and had a previous stroke (15.44%) were found to be the groups most vulnerable to ischemic heart disease. Taken together, the risk of ischemic heart disease increased in middle-aged and older women when combined with related factors such as hypertension, a family history of ischemic heart disease, menopause, and stroke. These results are consistent with the results of previous studies [38,39,40] that postmenopausal women significantly increase the risk of cardiovascular disease.
The study results indicated that for the prevention and effective management of ischemic heart disease in middle-aged and older women, a customized program considering the characteristics of the subjects is intensively needed. The importance of women’s health care after middle age is emphasized, but in most cases, a uniform program is applied by integrating factors affecting cardiovascular disease [7]. In previous studies [8,9,10,11,12,13,14,15,16,17,18,19], risk factors for ischemic heart disease were selected based on socioeconomic characteristics, some co-morbidities, and clinical test results. However, in this study, most of the comorbidities, socioeconomic characteristics, and lifestyle behaviors suggested to be related in previous studies were reflected and analyzed. In addition, there is a lack of previous studies that have identified factors affecting ischemic heart disease and risk groups in middle-aged women. According to the results of this study, family history, vascular disease, and depression appeared to be the biggest risk factors for cardiovascular disease in middle-aged women, rather than menopause and lifestyle, which can be seen as a different result from previous studies. Based on the results of this study, for the prevention and management of ischemic heart disease in middle-aged and older women, it is necessary to first classify the subjects according to the risk level of each vulnerable group. In addition, it is necessary to establish a customized prevention and management strategy according to the characteristics of the relevant factors in each vulnerable group. Utilization of healthcare big data can contribute to enormous cost savings in the healthcare field by providing patient-customized medical services based on data. E-health and m-health devices that combine technologies such as big data, data mining, and deep learning are bringing about innovation in the medical field, such as disease prevention, diagnosis, and treatment, by providing more effective and personalized solutions. If we add the function of identifying and managing high-risk and low-risk patients in advance using a predictive model to this technological system, we believe that it can contribute to the management of ischemic heart disease in middle-aged women. This study is significant in that it identified the characteristics of middle-aged and older women who are vulnerable to ischemic heart disease using large-scale data representing the entire Korean population. However, it had the following limitations. First, this study was cross-sectional, making the investigation of the cause–effect relationship between the risk factors for ischemic heart disease difficult. Second, there was a lack of data on clinical examinations related to ischemic heart disease. Third, since the ischemic heart disease variables used in this study were obtained as self-reported data on doctors’ diagnoses, there is a limitation that there may be a bias toward memory recall. In the future, it is necessary to analyze big data mining techniques in more depth with more data and conduct a prospective cohort study on the relationship between risk factors for ischemic heart disease by addressing the limitations of this study.

5. Conclusions

This study was conducted to identify ischemic heart disease-related factors and the vulnerable groups in Korean middle-aged and older women using data from the Korea National Health and Nutrition Examination Survey (KNHANES). The factors associated with ischemic heart disease in middle-aged and older women in this study were age, family history, hypertension, dyslipidemia, stroke, arthritis, and depression. Additionally, the group most vulnerable to ischemic heart disease were those with high blood pressure, a family history of ischemic heart disease, and menopause. It is meaningful that research to develop a predictive model for ischemic heart disease, with a high social burden of disease, using healthcare big data can be used as basic data to help in national policy decision-making for the prevention and management of chronic diseases. Based on these results, effective management should be achieved by applying customized medical services and health management services for each relevant factor in consideration of the characteristics of the groups with potential risk. In addition, it is necessary to reflect the realization of active screening programs and chronic disease management education programs that consider the comorbidity of patients with ischemic heart disease in health policies.

Funding

This work was supported by the Dong-A University research fund.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki. Ethical review and approval were waived in this study because it used anonymous public open data and not an individual’s personal data.

Informed Consent Statement

Patient consent was waived because this study utilized public open data.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from the Korea Disease Control and Prevention Agency (KDCA) and are available from https://knhanes.kdca.go.kr/knhanes/sub03/sub03_02_05.do (accessed on 1 August 2021).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sezavar, S.; Valizade, M.; Moradi, M.; Rahbar, M. Review of early myocardial infarction and its risk factor in patients hospitalized in Rasool Akram hospital in Tehran. Hormozgan Med. J. 2010, 63, 156. [Google Scholar]
  2. Yeoum, S. The investigation on the risk factors of cardiovascular disease for postmenopausal women over 50 years. J. Korean Soc. Menopause 2003, 9, 266–272. [Google Scholar]
  3. Mosca, L.; Benjamin, E.J.; Berra, K.; Bezanson, J.L.; Dolor, R.J.; Lloyd-Jones, D.M.; Newby, L.K.; Pina, I.L.; Roger, V.L.; Shaw, L.J. Effectiveness-based guidelines for the prevention of cardiovascular disease in women—2011 update: A guideline from the American Heart Association. Circulation 2011, 123, 1243–1262. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Ludt, S.; Wensing, M.; Szecsenyi, J.; Van Lieshout, J.; Rochon, J.; Freund, T.; Campbell, S.M.; Ose, D. Predictors of health-related quality of life in patients at risk for cardiovascular disease in European primary care. PLoS ONE 2011, 6, e29334. [Google Scholar] [CrossRef] [PubMed]
  5. Korea, S. The Lives of Women Looking to 2017 Statistics [Internet]; Statistics Korea: Daejeon, Republic of Korea, 2017. [Google Scholar]
  6. Korea, S. Causes of Death Statistics [Internet]; Statistics Korea: Daejeon, Republic of Korea, 2020. [Google Scholar]
  7. Kang, M.J.; Yi, J.S.; Park, C.S. Factors related to the identification of middle-aged women who are disadvantaged by cardio-cerebrovascular disease. Korean J. Women Health Nurs. 2018, 24, 185–195. [Google Scholar] [CrossRef]
  8. Kim, C.-G.; Lee, S.H.; Cha, S.K. Influencing factors on cardio-cerebrovascular disease risk factors in young men: Focusing on obesity indices. J. Korean Biol. Nurs. Sci. 2017, 19, 1–10. [Google Scholar] [CrossRef] [Green Version]
  9. Hong, S.; Byeon, H.; Kim, J.; Mun, S. Development of risk prediction model for cardiovascular disease among community-dwelling elderly. Asia-Pac. J. Multimid. Serv. Converg. Art Hum. Soc. 2015, 5, 37–46. [Google Scholar]
  10. Choi, J.-Y.; Choi, S.-W. Comparison of the health behaviors according to income and education level among cardio-cerebrovascular patients; based on KNHANES data of 2010–2011. J. Korea Acad.-Ind. Coop. Soc. 2014, 15, 6223–6233. [Google Scholar]
  11. Park, K.J.; Lim, G.U.; Hwangbo, Y.; Jhang, W.G. The impact of health behaviors and social strata on the prevalence of cardio-cerebrovascular disease. Soonchunhyang Med. Sci. 2011, 17, 105–111. [Google Scholar] [CrossRef]
  12. Barr, E.L.; Zimmet, P.Z.; Welborn, T.A.; Jolley, D.; Magliano, D.J.; Dunstan, D.W.; Cameron, A.J.; Dwyer, T.; Taylor, H.R.; Tonkin, A.M. Risk of cardiovascular and all-cause mortality in individuals with diabetes mellitus, impaired fasting glucose, and impaired glucose tolerance: The Australian Diabetes, Obesity, and Lifestyle Study (AusDiab). Circulation 2007, 116, 151–157. [Google Scholar] [CrossRef] [Green Version]
  13. Cordero, A.; Andrés, E.; Ordoñez, B.; León, M.; Laclaustra, M.; Grima, A.; Luengo, E.; Moreno, J.; Bes, M.; Pascual, I. Usefulness of triglycerides-to–high-density lipoprotein cholesterol ratio for predicting the first coronary event in men. Am. J. Cardiol. 2009, 104, 1393–1397. [Google Scholar] [CrossRef] [PubMed]
  14. Fried, L.F.; Shlipak, M.G.; Crump, C.; Kronmal, R.A.; Bleyer, A.J.; Gottdiener, J.S.; Kuller, L.H.; Newman, A.B. Renal insufficiency as a predictor of cardiovascular outcomes and mortality in elderly individuals. J. Am. Coll. Cardiol. 2003, 41, 1364–1372. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. He, Y.; Jiang, B.; Wang, J.; Feng, K.; Chang, Q.; Zhu, S.; Fan, L.; Li, X.; Hu, F.B. BMI versus the metabolic syndrome in relation to cardiovascular risk in elderly Chinese individuals. Diabetes Care 2007, 30, 2128–2134. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Wong, N.D.; Lopez, V.A.; Roberts, C.S.; Solomon, H.A.; Burke, G.L.; Kuller, L.; Tracy, R.; Yanez, D.; Psaty, B.M. Combined association of lipids and blood pressure in relation to incident cardiovascular disease in the elderly: The cardiovascular health study. Am. J. Hypertens. 2010, 23, 161–167. [Google Scholar] [CrossRef] [Green Version]
  17. Yun, J.W.; Lee, W.Y.; Kim, J.Y.; Park, H.D.; Lim, S.H.; Jung, C.H.; Kim, Y.C.; Kim, S.W. Relationship between body fat distribution and atherosclerotic risk factors in Korean populations. Korean J. Med. 2002, 63, 177–185. [Google Scholar]
  18. Yoo, S.Y.; Kim, M.; Kim, S.; Kim, S.H.; Ko, S.J.; Beom, J.W.; Kim, J.Y.; Jo, J.; Kim, Y.U.; Heo, D. Relationship between obesity indices and cardiovascular risk score in Korean type 2 diabetes patients. Korean J. Obes. 2013, 22, 148–154. [Google Scholar] [CrossRef]
  19. Steptoe, A.; Kivimäki, M. Stress and cardiovascular disease. Nat. Rev. Cardiol. 2012, 9, 360–370. [Google Scholar] [CrossRef]
  20. Fayyad, U.M.; Piatetsky-Shapiro, G.; Smyth, P. Knowledge Discovery and Data Mining: Towards a Unifying Framework. In KDD 1996, 96, 82–88. [Google Scholar]
  21. Islam, M.S.; Hasan, M.M.; Wang, X.; Germack, H.D. A systematic review on healthcare analytics: Application and theoretical perspective of data mining. Healthcare 2018, 6, 54. [Google Scholar] [CrossRef] [Green Version]
  22. Kumar, V.; Mishra, B.K.; Mazzara, M.; Thanh, D.N.; Verma, A. Prediction of malignant and benign breast cancer: A data mining approach in healthcare applications. In Advances in Data Science and Management; Springer: Berlin/Heidelberg, Germany, 2020; pp. 435–442. [Google Scholar]
  23. Park, M.; Choi, S.; Shin, A.M.; Koo, C.H. Analysis of the characteristics of the older adults with depression using data mining decision tree analysis. J. Korean Acad. Nurs. 2013, 43, 1–10. [Google Scholar] [CrossRef] [Green Version]
  24. Safdari, R.; Saeedi, M.G.; Arji, G.; Gharooni, M.; Soraki, M.; Nasiri, M. A model for predicting myocardial infarction using data mining techniques. Front. Health Inform. 2013, 2, 1–6. [Google Scholar]
  25. Dosi, R.; Bhatt, N.; Shah, P.; Patell, R. Cardiovascular disease and menopause. J. Clin. Diagn. Res. 2014, 8, 62. [Google Scholar] [PubMed]
  26. Oh, M.S.; Jeong, M.H. Sex differences in cardiovascular disease risk factors among Korean adults. Korean J. Med. 2020, 95, 266–275. [Google Scholar] [CrossRef]
  27. Lowdermilk, D.L.; Cashion, M.C.; Perry, S.E.; Alden, K.R.; Olshansky, E. Maternity and Women’s Health Care E-Book; Elsevier Health Sciences: Amsterdam, The Netherlands, 2019. [Google Scholar]
  28. Khan, M.A.; Hashim, M.J.; Mustafa, H.; Baniyas, M.Y.; Al Suwaidi, S.K.B.M.; AlKatheeri, R.; Alblooshi, F.M.K.; Almatrooshi, M.E.A.H.; Alzaabi, M.E.H.; Al Darmaki, R.S. Global epidemiology of ischemic heart disease: Results from the global burden of disease study. Cureus 2020, 12, e9349. [Google Scholar] [CrossRef] [PubMed]
  29. Bae, Y.; Lee, K. Risk factors for cardiovascular disease in adults aged 30 years and older. J. Korean Soc. Integr. Med. 2016, 4, 97–107. [Google Scholar] [CrossRef] [Green Version]
  30. Joo, J.K.; Son, J.B.; Jung, J.E.; Kim, S.C.; Lee, K.S. Differences of prevalence and components of metabolic syndrome according to menopausal status. J. Korean Soc. Menopause 2012, 18, 155–162. [Google Scholar] [CrossRef] [Green Version]
  31. Moon, H.-K.; Kong, J.-E. Assessment of nutrient intake for middle aged with and without metabolic syndrome using 2005 and 2007 Korean National Health and Nutrition Survey. Korean J. Nutr. 2010, 43, 69–78. [Google Scholar] [CrossRef] [Green Version]
  32. Kim, S. Incidence Rate and Risk Factors of Cardio-Cerebrovascular Disease of Middle-Aged and Elderly People. Master’s Thesis, Inje University, Busan, Republic of Korea, 2016. [Google Scholar]
  33. Lim, H.K. Prediction of myocardial infarction/angina and selection of major risk factors using machine learning. J. Korean Data Anal. Soc. 2018, 20, 647–656. [Google Scholar] [CrossRef]
  34. Turner, R.; Millns, H.; Neil, H.; Stratton, I.; Manley, S.; Matthews, D.; Holman, R. Risk factors for coronary artery disease in non-insulin dependent diabetes mellitus: United Kingdom Prospective Diabetes Study (UKPDS: 23). BMJ 1998, 316, 823–828. [Google Scholar] [CrossRef] [Green Version]
  35. Jung, Y.H.; Shin, H.K.; Kim, Y.H.; Shin, H.G.; Linton, J. The association of depression and cardiovascular risk factors in Korean adults: The sixth Korea National Health and Nutrition Examination Survey, 2014. Korean J. Fam. Pract. 2017, 7, 308–314. [Google Scholar] [CrossRef]
  36. Seo, Y.; Je, Y. A comparative study on cardiovascular disease risk factors in Korean adults according to clinical depression status. Psychiatry Res. 2018, 263, 88–93. [Google Scholar] [CrossRef] [PubMed]
  37. Bae, E.H.; Lim, S.Y.; Cho, K.H.; Choi, J.S.; Kim, C.S.; Park, J.W.; Ma, S.K.; Jeong, M.H.; Kim, S.W. GFR and cardiovascular outcomes after acute myocardial infarction: Results from the Korea Acute Myocardial Infarction Registry. Am. J. Kidney Dis. 2012, 59, 795–802. [Google Scholar] [CrossRef] [PubMed]
  38. El Khoudary, S.R.; Aggarwal, B.; Beckie, T.M.; Hodis, H.N.; Johnson, A.E.; Langer, R.D.; Limacher, M.C.; Manson, J.E.; Stefanick, M.L.; Allison, M.A. Menopause transition and cardiovascular disease risk: Implications for timing of early prevention: A scientific statement from the American Heart Association. Circulation 2020, 142, e506–e532. [Google Scholar] [CrossRef] [PubMed]
  39. Zhu, D.; Chung, H.-F.; Dobson, A.J.; Pandeya, N.; Giles, G.G.; Bruinsma, F.; Brunner, E.J.; Kuh, D.; Hardy, R.; Avis, N.E. Age at natural menopause and risk of incident cardiovascular disease: A pooled analysis of individual patient data. Lancet Public Health 2019, 4, e553–e564. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  40. Nappi, R.E.; Simoncini, T. Menopause transition: A golden age to prevent cardiovascular disease. Lancet Diabetes Endocrinol. 2021, 9, 135–137. [Google Scholar] [CrossRef]
Figure 1. Study population selection process.
Figure 1. Study population selection process.
Jpm 13 00663 g001
Figure 2. Prediction of ischemic heart disease in middle-aged and older women using a decision tree.
Figure 2. Prediction of ischemic heart disease in middle-aged and older women using a decision tree.
Jpm 13 00663 g002
Table 1. Presence of ischemic heart disease according to general characteristics.
Table 1. Presence of ischemic heart disease according to general characteristics.
VariablesIschemic Heart DiseaseTotal (n = 7249)p-Value 1
Yes (n, %)No (n, %)
Age
  40–493 (0.2)1819 (99.8)1822<0.000
  50–5917 (0.9)1949 (99.1)1966
  60–6962 (3.5)1699 (96.5)1761
  70–7985 (6.8)1172 (93.2)1257
  ≥8034 (7.7)409 (92.3)443
Marital status
  Yes198 (2.8)6871 (97.2)70690.360
  No3 (1.7)177 (98.3)180
Education level
  ≤Elementary school133 (5.6)2221 (94.4)2534<0.000
  Middle school31 (3.4)875 (96.6)906
  High school27 (1.2)2179 (98.8)2206
  ≥College10 (0.6)1773 (99.4)1783
Household income level
  Lower97 (5.3)1737 (94.7)1834<0.000
  Lower middle41 (2.3)1747 (97.7)1788
  Upper middle43 (2.5)1695 (97.5)1738
  Upper20 (1.1)1869 (98.9)1889
Subjective health status
  Very good1 (0.4)241 (99.6)242<0.000
  Good13 (0.9)1427 (99.1)1440
  Usually81 (2.1)3776 (97.9)3857
  Poor66 (5.2)1203 (94.8)1269
  Very poor40 (9.1)401 (90.9)441
Stress awareness
  Very much16 (4.8)315 (95.2)331<0.000
  A lot49 (3.3)1426 (96.7)1475
  A little bit 85 (2.1)3989 (97.9)4074
  Rarely51 (3.7)1318 (96.3)1369
1 p-value by chi-square test. p < 0.05.
Table 2. Presence of ischemic heart disease according to health behavior.
Table 2. Presence of ischemic heart disease according to health behavior.
VariablesIschemic Heart DiseaseTotal (n = 7249)p-Value 1
Yes (n, %)No (n, %)
Smoking
  Daily smoking8 (3.7)210 (96.3)2180.775
  Occasionally smoking2 (2.8)69 (97.2)71
  Past smoker12 (3.3)352 (96.7)364
  Non-smoker179 (2.7)6417 (96.7)6596
Drinking
  Yes139 (2.4)5661 (97.6)5800<0.000
  No62 (4.3)1387 (95.7)1449
Moderate physical activity (work)
  Yes9 (3.3)263 (96.7)2720.583
  No192 (2.8)6785 (97.2)6977
Moderate physical activity (leisure)
  Yes28 (2.2)1252 (97.8)12800.160
  No173 (2.9)5796 (97.1)5969
1 p-value by chi-square test. p < 0.05.
Table 3. Presence of ischemic heart disease according to clinical characteristics.
Table 3. Presence of ischemic heart disease according to clinical characteristics.
VariablesIschemic Heart DiseaseTotal (n = 7249)p-Value 1
Yes (n, %)No (n, %)
BMI
  Underweight3 (1.4)205 (98.6)208<0.000
  Normal weight99 (2.1)4537 (97.9)4636
  Obesity99 (4.1)2306 (95.9)2405
Menopause
  Yes192 (3.8)4801 (96.2)4993<0.000
  No9 (0.4)2247 (99.6)2256
Family history
  Yes25 (5.0)479 (95.0)5040.002
  No176 (2.6)6569 (97.4)6745
Hypertension
  Yes133 (5.8)2156 (94.2)2289<0.000
  No68 (1.4)4892 (98.6)4960
Dyslipidemia
  Yes119 (5.8)1932 (94.2)2051<0.000
  No82 (1.6)5116 (98.4)5198
Stroke
  Yes23 (11.7)173 (88.3)196<0.000
  No178 (2.5)6875 (97.5)7053
Arthritis
  Yes112 (6.0)1743 (94.0)1855<0.000
  No89 (1.6)5305 (98.4)5394
Diabetes mellitus
  Yes56 (6.9)757 (93.1)813<0.000
  No145 (2.3)6291 (97.7)6436
Depression
  Yes34 (6.7)470 (93.3)504<0.000
  No167 (2.5)6578 (97.5)6745
Kidney failure
  Yes3 (11.5)23 (88.5)260.006
  No198 (2.1)7025 (97.3)7223
1 p-value by chi-square test. p < 0.05.
Table 4. Predictive factors for the presence of ischemic heart disease.
Table 4. Predictive factors for the presence of ischemic heart disease.
VariablesCategoriesIschemic Heart Disease
OR95% CI
Age40–491
50–592.830.71–11.25
60–697.51 *1.87–30.23
70–7913.82 *3.34–57.25
≥8016.73 *3.86–72.46
Marital statusYes1
No1.220.35–4.20
Education level≤Elementary school1
Middle school1.030.66–1.59
High school0.750.46–1.24
≥College0.580.28–1.22
Household income levelUpper1
Upper middle 0.920.53–1.62
Lower middle0.810.45–1.45
Lower1.490.85–2.62
Subjective health statusVery good1
Good2.490.32–19.36
Usually3.810.52–27.94
Poor6.560.89–48.44
Very poor6.890.91–51.95
Stress awarenessRarely 1
A little bit 1.190.63–2.24
A lot1.130.73–1.76
Very much0.890.61–1.31
SmokingNon-smoker 1
Past smoker 1.850.85–4.01
Occasionally smoking2.400.52–11.03
Daily smoking1.550.82–2.93
DrinkingNo1
Yes1.110.80–1.54
Moderate physical activity (work)Yes1
No0.840.41–1.73
Moderate physical activity (leisure)Yes1
No0.63 *0.41–0.99
BMIUnderweight1
Normal weight1.080.33–3.58
Obesity1.430.43–4.77
MenopauseNo1
Yes1.460.65–3.28
Family historyNo1
Yes3.29 *2.03–5.32
HypertensionNo1
Yes1.42 *1.01–2.00
DyslipidemiaNo1
Yes1.70 *1.24–2.33
StrokeNo1
Yes1.93 *1.18–3.18
ArthritisNo1
Yes1.43 *1.05–1.94
Diabetes mellitusNo1
Yes1.270.90–1.79
DepressionNo1
Yes1.66 *1.09–2.51
Kidney failureNo1
Yes1.860.52–6.70
Logistic regression analysis: classification accuracy, 97.2%, * p < 0.05; OR: odds ratio, CI: confidence interval.
Table 5. Prediction of ischemic heart disease in middle-aged and older women using a random forest.
Table 5. Prediction of ischemic heart disease in middle-aged and older women using a random forest.
VariablesGini ImportanceGini Importance (Out of Bagging)
Age0.0120.013
Dyslipidemia0.0090.009
Education level0.0080.007
Arthritis0.0070.006
Hypertension0.0070.006
Diabetes mellitus0.0060.004
Depression0.0060.004
Family history0.0060.003
Menopause0.0050.003
Stroke0.0040.002
Table 6. Model evaluation.
Table 6. Model evaluation.
Logistic RegressionDecision TreeRandom Forest
Sensitivity0.2140.1990.433
Specificity0.9820.9770.992
Accuracy0.8610.8560.877
AUC (Area Under Curve)0.8520.8480.872
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lim, J. A Predictive Model of Ischemic Heart Disease in Middle-Aged and Older Women Using Data Mining Technique. J. Pers. Med. 2023, 13, 663. https://doi.org/10.3390/jpm13040663

AMA Style

Lim J. A Predictive Model of Ischemic Heart Disease in Middle-Aged and Older Women Using Data Mining Technique. Journal of Personalized Medicine. 2023; 13(4):663. https://doi.org/10.3390/jpm13040663

Chicago/Turabian Style

Lim, Jihye. 2023. "A Predictive Model of Ischemic Heart Disease in Middle-Aged and Older Women Using Data Mining Technique" Journal of Personalized Medicine 13, no. 4: 663. https://doi.org/10.3390/jpm13040663

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop