1. Introduction
Myocardial infarction (MI) is a significant global health concern and remains a leading cause of death worldwide [
1]. Accurate prediction of MI risk is crucial for early intervention and prevention of the disease [
2,
3]. Metabolomics has emerged as an innovative approach for identifying potential biomarkers and risk factors associated with MI [
1,
4,
5,
6,
7]. However, the availability of population-based human cohorts with comprehensive metabolite profiles for incident MI cases is often limited and imbalanced, with a scarcity of non-MI individuals [
6]. For example, in a study by Nogal et al., a two-step meta-analysis was conducted using data from the COnsortium of METabolomics Studies, involving 7897 individuals, including 1373 incident MI cases from six cohorts [
6]. Alternatively, nested study designs have been employed, matching cases with non-cases in a 1:1 ratio [
5]. However, the selection of non-MI cases in population-based studies may introduce bias and potentially lead to disparate results [
8]. Furthermore, the limited and imbalanced number of incident cases poses challenges when applying machine learning (ML) methods for prediction. Therefore, the development of a novel MI prediction model is warranted.
In recent years, ML techniques, such as support vector machine (SVM) [
9] and random forest (RF) [
10], have been extensively utilized to forecast disease risk due to their improved accuracy and sensitivity, particularly with complex data. SVM and RF algorithms, for instance, have been extensively employed to identify biomarkers associated with chronic renal disease [
11] and to detect MI based on electrocardiographic (ECG) signals [
12,
13,
14]. Additionally, decision tree (DT) and RF models have been utilized to identify key microbial species linked to colorectal cancer [
15]. Moreover, k-nearest neighbor (k-NN) algorithms have been employed to identify significant indicators of breast cancer based on anthropometric and clinical characteristics [
16]. These studies demonstrate the effectiveness of ML as a tool for disease diagnosis and biomarker discovery.
The current ML algorithms often treat clinical variables and metabolite profiles as separate entities, failing to capture the potential associations between these datasets, particularly when dealing with vast amounts of omics data. Deep-learning (DL) algorithms, such as the multilayer perceptron (MLP), offer a promising solution in this regard [
17,
18,
19,
20,
21]. In a previous study, an MLP-based approach was employed to predict heart disease, including MI, using the Cleveland Heart Disease dataset. However, the performance of DL models can be hampered by limited datasets.
To address this limitation, a DL method known as generative adversarial network (GAN) has emerged [
22,
23,
24,
25,
26]. GAN has the capability to generate new data and transfer styles by leveraging the mappings of the original data [
27]. This unique characteristic empowers the GAN model to enhance features and expand the potential distribution of limited data at a relatively low cost [
28].
In this study, our primary objectives are threefold. Firstly, we aim to develop a novel DL method that utilizes a GAN model for feature enhancement, enabling the establishment of a 7-year MI prediction model. This DL approach will be built upon real observational metabolomic and phenotypic data obtained from the KORA (Cooperative Health Research in the Region of Augsburg) human cohort.
Secondly, we propose a novel loss function, GAN for feature-enhanced (GFE), designed to further enhance the prediction accuracy of our DL model. By incorporating this unique loss function, we aim to improve the model’s ability to identify subtle yet critical patterns and associations within the data, leading to more robust and accurate MI risk predictions.
Lastly, we seek to interpret the DL model and leverage its capabilities to identify metabolites and clinical variables that are strongly correlated with MI. By unraveling these associations, we aim to gain valuable insights into the underlying biological mechanisms and potential biomarkers of MI, ultimately contributing to a better understanding of the disease and its predictive indicators.
2. Methods
2.6. Construction of the GAN for Feature-Enhanced Loss Function
To address the potential presence of misleading information in the generated data that could deviate the gradient from the ground truth optimal, we have developed a novel loss function called GAN for feature-enhanced (GFE). The objective of the GFE loss function is to reduce the impact of misleading information and bring the lowest point of gradient descent closer to the ground truth. This is accomplished by calculating the actual training direction while emphasizing the original observed incident case data.
The GFE loss function, as shown in Equation (1), is an extension of the binary cross entropy (BCE) loss function, designed to further enhance prediction accuracy. In the equation, we extract the portions of misleading information from the generated data during each training epoch, which refers to one complete pass of the entire training dataset through the learning algorithm. The discriminator (D) used in the GAN model is a previously fully trained model. To evaluate the reliability of the generated data, we utilize the accuracy of D(P), which represents the discriminator’s accuracy when evaluating the generated data. The term (1-Acc(D(P))) quantifies the misleading information contained in the generated data.
It is important to note that the discriminator is trained on incident cases and has learned information during the training process. Therefore, we also evaluate the reliability of the discriminator. Since non-MI cases are not used in any phase of the GAN training process, we use Acc(D(N)) as an appropriate evaluation method for the discriminator.
The term BCE(X’) in the GFE loss function emphasizes the ground truth information for gradient descent. X represents the combined data, which includes both observed data and generated data. X’ specifically refers to the observational positive data, which is used to generate the misleading information in the generated data. The weight W represents the weight of the observed portion in the batch size divided by the total training number.
where:
GEF () = GAN for feature enhanced loss function;
BCE () = binary cross entropy loss function;
X = combined data = observed data + generated data;
X’ = observational positive data;
W = weight of observed portion in BatchSize/Total training number;
Acc(D(P)) = accuracy of Discriminator (Observational positive data);
Acc(D(N)) = accuracy of Discriminator (Observational negative data).
By incorporating the GFE loss function, we emphasize the ground truth direction by utilizing purely observational positive data. Simultaneously, we account for and mitigate the effect of misleading information contained in the generated data during each training epoch.
4. Discussion
In this study, we introduced a novel GFE loss function in our deep-learning algorithms, specifically in the GAN and MLP models. The incorporation of the GFE loss function resulted in a 2% improvement in the accuracy of our 7-year myocardial infarction (MI) predictions. Our approach involved three interconnected steps to achieve this improvement. Firstly, we utilized the GAN model to generate new incident MI cases based on real observational data obtained from the population-based KORA cohort. This approach allowed us to augment the dataset and strengthen the feature representation, enhancing the predictive capabilities of the models. Secondly, we implemented the MLP model and employed a combination of oversampling techniques, including SMOTE, and undersampling techniques, such as ENN. This approach aimed to address the issue of imbalanced data by balancing the combined generated feature-enhanced data with the real observational data. Finally, we introduced the GFE loss function, which is an adaptation of the conventional BCE loss function. By incorporating additional features and information from the dataset, the GFE loss function enhanced the accuracy of MI prediction. This improvement was achieved by leveraging the discriminative power of the GAN model and the predictive capabilities of the MLP model. The successful implementation of the GFE loss function, in conjunction with the GAN and MLP models, demonstrates its efficacy in improving the accurate prediction of MI. The findings highlight the potential of our approach in identifying novel biomarkers and enhancing risk stratification models for MI prediction.
The availability of high-quality and comprehensive clinical and metabolomic data is crucial to ensure accurate prediction of MI. In this study, we leveraged data from the KORA S4 and F4 studies, which provided a valuable resource for our analysis. To ensure data integrity, we implemented a rigorous data cleaning process for both clinical variables and non-targeted metabolite profiles. Specifically, we focused on utilizing incident MI cases and non-MI participants with measured metabolite data. By selecting only these individuals, we aimed to establish a more accurate representation of the population under study. However, it is important to acknowledge that this selection process introduced certain limitations to our dataset. One limitation was the relatively small number of observational MI cases available, which affected the overall balance of the data. The class imbalance, with a majority of non-MI cases, posed a challenge in training the models effectively. Imbalanced data can lead to biased predictions and inadequate performance, particularly when the minority class (MI cases) is of interest. As demonstrated in
Figure 5a, when we initially applied ML methods such as DT, RF, SVM, and DL methods such as CNN and LSTM, the prediction of MI was highly misleading. These models achieved 100% accuracy for non-MI cases but failed to accurately predict any incident MI cases.
The limitations in our observational data, especially regarding the limited number of MI cases and the class imbalance, contributed to the challenges faced by the ML and DL models. These limitations highlight the importance of comprehensive and balanced datasets, including a sufficient number of MI cases, to improve the accuracy and reliability of predictive models. Addressing these limitations and ensuring the availability of complete and diverse data, particularly regarding clinical variables and metabolites, is essential to enhance the predictive accuracy of the models. An additional limitation of our model is that it is optimized solely for binary outcomes and does not extend to multinomial settings. Future studies should focus on expanding the dataset and incorporating more relevant variables to enhance the robustness and generalizability of MI prediction models.
To overcome the limitations arising from the limited number of incident cases, we employed a GAN model to learn from the observational data and generate new incident MI cases. However, even with the addition of the generated feature-enhanced data, the combined dataset remained largely imbalanced. To address this issue, we employed oversampling and undersampling techniques to balance the distribution space within the training dataset. Specifically, we utilized the SMOTE to generate synthetic samples, thereby increasing the representation of the minority class (incident MI). We then applied the ENN technique to remove noisy instances and further refine the dataset. This balanced dataset enabled the MLP model to effectively learn from the data and improve its performance on the minority class.
While we recognized the limitations of the generated data, we proposed a more appropriate loss function, the GFE loss function, which resulted in a 2% increase in prediction accuracy. The combination of the GAN model and the GFE loss function expanded the potential for generalization across different datasets. Dealing with imbalanced datasets, where the number of observed incident MI cases is significantly lower than that of non-MI individuals, presents challenges. Techniques like SMOTE and ENN help address this issue, but their effectiveness can vary depending on the dataset’s specific characteristics. In our study, through two rounds of training with SMOTE+ENN and the GFE loss function, we successfully employed deep-learning algorithms for the first time to predict 7-year incident MI cases in the KORA cohort data, achieving an accuracy of 70%.
However, our study also has additional limitations and potential challenges. The findings may be specific to the population and dataset used in the analysis, such as the KORA cohort. It is crucial to validate the performance of the deep-learning methods with the GFE loss function on external datasets to ensure their robustness and generalizability. Without proper validation on independent datasets, the reliability of the model’s predictions may be limited.
Deep-learning models, including generative adversarial networks, are often considered black boxes, making it challenging to interpret the underlying mechanisms and understand how the model arrives at its predictions. However, our sensitivity analyses ranked the impact weights of each clinical phenotype and metabolite in predicting MI, which may aid in the clinical understanding and acceptance of the model’s predictions. Our approach also established a novel method for enhancing the interpretability of the model, revealing numerous clinical variables and metabolites that are significantly relevant to MI. The identified clinical variables, such as the glucose tolerance groups, sex, and physical activity [
46,
47], align with those already recognized in current clinical tests, further demonstrating the reliability of our model. We anticipate that as generative deep-learning continues to advance, more cutting-edge techniques will emerge in the future to better target critical features and generate omics data, further improving the performance of predictive models [
48,
49,
50,
51,
52].
Author Contributions
Conceptualization, S.Y. and R.W.-S.; Methodology, S.Y. and S.H.; Software, S.Y., X.L. and X.C.; Formal Analysis, S.Y. and S.H.; Investigation, M.S., M.H. (Makoto Harada) and J.G.; Resources, M.H. (Margit Heier), C.G., W.K., W.R. and A.P.; Data Curation, G.K. and K.S.; Writing—Original Draft Preparation, S.Y. and R.W.-S.; Writing—Review & Editing, S.Y., M.S., M.H. (Makoto Harada), J.G., X.L., X.C., W.R. and R.W.-S.; Visualization, S.Y. and R.W.-S.; Supervision, R.W.-S.; Funding Acquisition, W.R., A.P. and R.W.-S. All authors have read and agreed to the published version of the manuscript.
Funding
This project has received funding from the Innovative Medicines Initiative 2 Joint Undertaking (JU) under grant agreement No 821508 (CARDIATEAM). The JU receives support from the European Union’s Horizon 2020 research and innovation programme and the European Federation of Pharmaceutical Industries and Associations (EFPIA). The German Diabetes Center is supported by the German Federal Ministry of Health (Berlin, Germany) and the Ministry of Science and Culture in North-Rhine Westphalia (Düsseldorf, Germany). This study was supported in part by a grant from the German Federal Ministry of Education and Research to the German Center for Diabetes Research (DZD). The KORA study was initiated and financed by the Helmholtz Zentrum Munchen–German Research Center for Environmental Health, which is funded by the German Federal Ministry of Education and Research (BMBF) and by the State of Bavaria.
Institutional Review Board Statement
The KORA study protocol and procedures were conducted in accordance with the Declaration of Helsinki, and approved by the ethics committee of the Bavarian Medical Association, Germany (approval code: 06068 and approval date: 19 September 2006).
Informed Consent Statement
All KORA participants provided written informed consent.
Data Availability Statement
The KORA data are governed by the General Data Protection Regulation (GDPR) and national data protection laws, with additional restrictions imposed by the Ethics Committee of the Bavarian Chamber of Physicians to ensure data privacy of the study participants. Therefore, the data cannot be made freely available in a public repository. However, researchers with a legitimate interest in accessing the data may submit a request through an individual project agreement with KORA via the online portal (
https://www.helmholtz-munich.de/en/epi/cohort/kora). All codes are available at
https://github.com/ShawnYu1996/GAN-for-MI.
Acknowledgments
We express our appreciation to all KORA study participants for their long-term commitment to the KORA study, the staff for data collection and research data management, and the members of the KORA Study Group (
https://www.helmholtz-munich.de/en/epi/cohort/kora) who are responsible for the design and conduct of the KORA study. We would like to express our gratitude to OpenAI’s ChatGPT language model for its assistance in enhancing the English grammar of this manuscript during the preparation process. In addition, we thank the professional author services of Springer Nature for proofreading.
Conflicts of Interest
Xuening Li is an employee of Beijing Huanyang Bole Consulting Co., Ltd., China. The paper reflects the views of the scientist, and not the company.
References
- Ward-Caviness, C.K.; Xu, T.; Aspelund, T.; Thorand, B.; Montrone, C.; Meisinger, C.; Dunger-Kaltenback, I.; Zierer, A.; Yu, Z.; Helgadottie, I.; et al. Improvement of myocardial infarction risk prediction via inflammation-associated metabolite biomarkers. Heart 2017, 103, 1278–1285. [Google Scholar] [CrossRef] [PubMed]
- Yeh, R.W.; Sidney, S.; Chandra, M.; Sorel, M.; Selby, J.V.; Go, A.S. Population trends in the incidence and outcomes of acute myocardial infarction. N. Engl. J. Med. 2010, 362, 2155–2165. [Google Scholar] [CrossRef]
- Zhan, C.; Tang, T.; Wu, E.; Zhang, Y.; He, M.; Wu, R.; Bi, C.; Wang, J.; Zhang, Y.; Shen, B. From multi-omics approaches to personalized medicine in myocardial infarction. Front. Cardiovasc. Med. 2023, 10, 1250340. [Google Scholar] [CrossRef] [PubMed]
- D’Agostino, R.B., Sr.; Vasan, R.S.; Pencina, M.J.; Wolf, P.A.; Cobain, M.; Massaro, J.M.; Kannel, W.B. General cardiovascular risk profile for use in primary care: The Framingham Heart Study. Circulation 2008, 117, 743–753. [Google Scholar] [CrossRef]
- Shah, S.H.; Bain, J.R.; Muehlbauer, M.J.; Stevens, R.D.; Crosslin, D.R.; Haynes, C.; Dungan, J.; Newby, L.K.; Hauser, E.R.; Ginsburg, G.S.; et al. Association of a peripheral blood metabolic profile with coronary artery disease and risk of subsequent cardiovascular events. Circ. Cardiovasc. Genet. 2010, 3, 207–214. [Google Scholar] [CrossRef] [PubMed]
- Nogal, A.; Alkis, T.; Lee, Y.; Kifer, D.; Hu, J.; Murphy, R.A.; Huang, Z.; Wang-Sattler, R.; Kastenmüler, G.; Linkohr, B.; et al. Predictive metabolites for incident myocardial infarction: A two-step meta-analysis of individual patient data from six cohorts comprising 7897 individuals from the COnsortium of METabolomics Studies. Cardiovasc. Res. 2023, 119, 2743–2754. [Google Scholar] [CrossRef] [PubMed]
- Ganna, A.; Salihovic, S.; Sundstrom, J.; Broeckling, C.D.; Hedman, A.K.; Magnusson, P.K.; Pedersen, N.L.; Larsson, A.; Siegbahn, A.; Zilmer, M.; et al. Large-scale metabolomic profiling identifies novel biomarkers for incident coronary heart disease. PLoS Genet. 2014, 10, e1004801. [Google Scholar] [CrossRef]
- Wang-Sattler, R.; Yu, Z.; Herder, C.; Messias, A.C.; Floegel, A.; He, Y.; Heim, K.; Campillos, M.; Holzapfel, C.; Thorand, B.; et al. Novel biomarkers for pre-diabetes identified by metabolomics. Mol. Syst. Biol. 2012, 8, 615. [Google Scholar] [CrossRef]
- Chowdhary, C.L.; Mittal, M.P.K.; Pattanaik, P.A.; Marszalek, Z. An Efficient Segmentation and Classification System in Medical Images Using Intuitionist Possibilistic Fuzzy C-Mean Clustering and Fuzzy SVM Algorithm. Sensors 2020, 20, 3903. [Google Scholar] [CrossRef]
- Wang, J.; Shi, L. Prediction of medical expenditures of diagnosed diabetics and the assessment of its related factors using a random forest model, MEPS 2000-2015. Int. J. Qual. Health Care 2020, 32, 99–112. [Google Scholar] [CrossRef]
- Huang, J.; Huth, C.; Covic, M.; Troll, M.; Adam, J.; Zukunft, S.; Prehn, C.; Wang, L.; Nano, J.; Scheerer, M.F.; et al. Machine Learning Approaches Reveal Metabolic Signatures of Incident Chronic Kidney Disease in Individuals with Prediabetes and Type 2 Di-abetes. Diabetes 2020, 69, 2756–2765. [Google Scholar] [CrossRef]
- Antwi-Amoabeng, D.; Gbadebo, T.D. Limitations of ECG algorithms in paced right bundle branch block with prior myocardial infarction. HeartRhythm Case Rep. 2021, 7, 702–705. [Google Scholar] [CrossRef]
- Sponder, M.; Ehrengruber, S.; Berghofer, A.; Schonbauer, R.; Toma, A.; Silbert, B.I.; Hengstenberg, C.; Lang, I.; Richter, B. New ECG algorithms with improved accuracy for prediction of culprit vessel in inferior ST-Segment elevation myocardial infarction. Panminerva Med. 2021, 65, 303–311. [Google Scholar] [CrossRef]
- Yontar, O.C.; Erdogan, G.; Yenercag, M.; Gul, S.; Arslan, U.; Karagoz, A. Relationship between Selvester ECG Score and Cardio-vascular Outcomes in Patients with Non-ST Elevation Myocardial Infarction. Acta Cardiol. Sin. 2021, 37, 580–590. [Google Scholar]
- Ai, D.; Pan, H.; Han, R.; Li, X.; Liu, G.; Xia, L.C. Using Decision Tree Aggregation with Random Forest Model to Identify Gut Microbes Associated with Colorectal Cancer. Genes 2019, 10, 112. [Google Scholar] [CrossRef]
- Demirkale, Z.H.; Abali, Z.Y.; Bas, F.; Poyrazoglu, S.; Bundak, R.; Darendeliler, F. Comparison of the Clinical and Anthropometric Features of Treated and Untreated Girls with Borderline Early Puberty. J. Pediatr. Adolesc. Gynecol. 2019, 32, 264–270. [Google Scholar] [CrossRef]
- Haghighat, F. Predicting the trend of indicators related to Covid-19 using the combined MLP-MC model. Chaos Solitons Fractals 2021, 152, 111399. [Google Scholar] [CrossRef]
- Lee, J.; Hwang, J.; Lee, K.S. Prediction and comparison of postural discomfort based on MLP and quadratic regression. J. Occup. Health 2021, 63, e12292. [Google Scholar] [CrossRef]
- Rajasekar, S.J.S.; Narayanan, V.; Perumal, V. Detection of COVID-19 from Chest CT Images Using CNN with MLP Hybrid Model. Stud. Health Technol. Inform. 2021, 285, 288–291. [Google Scholar]
- Qiao, L.; Li, H.; Wang, Z.; Sun, H.; Feng, G.; Yin, D. Machine learning based on SEER database to predict distant metastasis of thyroid cancer. Endocrine 2023, 82, 1–11. [Google Scholar] [CrossRef]
- Song, H.; Yin, C.; Li, Z.; Feng, K.; Cao, Y.; Gu, Y.; Sun, H.J.M. Identification of Cancer Driver Genes by Integrating Multiomics Data with Graph Neural Networks. Metabolites 2023, 13, 339. [Google Scholar] [CrossRef]
- Hong, K.-T.; Cho, Y.; Kang, C.H.; Ahn, K.-S.; Lee, H.; Kim, J.; Hong, S.J.; Kim, B.H.; Shim, E.J.D. Lumbar Spine Computed To-mography to Magnetic Resonance Imaging Synthesis Using Generative Adversarial Network: Visual Turing Test. Diagnostics 2022, 12, 530. [Google Scholar] [CrossRef]
- Liu, S. SCAM-GAN: Generating brain MR images from CT scan data based on CycleGAN combined with attention module. J. Phys. Conf. Ser. 2023, 2646, 012018. [Google Scholar] [CrossRef]
- Liu, M.; Zou, W.; Piao, C. MR imaging from CT scan data using generative adversarial network. In Proceedings of the Interna-tional Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2022), Guilin, China, 25–27 February 2022; pp. 42–48. [Google Scholar]
- Liu, M.; Zou, W.; Wang, W.; Jin, C.-B.; Chen, J.; Piao, C.J.S. Multi-Conditional Constraint Generative Adversarial Network-Based MR Imaging from CT Scan Data. Sensors 2022, 22, 4043. [Google Scholar] [CrossRef]
- Guo, K.; Chen, J.; Qiu, T.; Guo, S.; Luo, T.; Chen, T.; Ren, S. MedGAN: An adaptive GAN approach for medical image generation. Comput. Biol. Med. 2023, 163, 107119. [Google Scholar] [CrossRef]
- Hazra, D.; Byun, Y.C.; Kim, W.J. Enhancing classification of cells procured from bone marrow aspirate smears using generative adversarial networks and sequential convolutional neural network. Comput. Methods Programs Biomed. 2022, 224, 107019. [Google Scholar] [CrossRef]
- Liang, K.; Liu, X.; Chen, S.; Xie, J.; Qing Lee, W.; Liu, L.; Kuan Lee, H. Resolution enhancement and realistic speckle recovery with generative adversarial modeling of micro-optical coherence tomography. Biomed. Opt. Express 2020, 11, 7236–7252. [Google Scholar] [CrossRef]
- Holle, R.; Happich, M.; Lowel, H.; Wichmann, H.E.; MONICA/KORA Study Group. KORA—A research platform for population based health re-search. Gesundheitswesen 2005, 67 (Suppl. 1), S19–S25. [Google Scholar] [CrossRef]
- Han, S.; Huang, J.; Foppiano, F.; Prehn, C.; Adamski, J.; Suhre, K.; Li, Y.; Matullo, G.; Schliess, F.; Gieger, C.; et al. TIGER: Technical variation elimination for metabolomics data using ensemble learning architecture. Brief. Bioinform. 2022, 23, bbab535. [Google Scholar] [CrossRef]
- Shi, M.; Han, S.; Klier, K.; Fobo, G.; Montrone, C.; Yu, S.; Harada, M.; Henning, A.K.; Friedrich, N.; Bahls, M.; et al. Identification of candidate metabolite biomarkers for metabolic syndrome and its five components in population-based human cohorts. Cardi-ovasc. Diabetol. 2023, 22, 141. [Google Scholar] [CrossRef]
- Huang, J.; Covic, M.; Huth, C.; Rommel, M.; Adam, J.; Zukunft, S.; Prehn, C.; Wang, L.; Nano, J.; Scheerer, M.F.J.M. Validation of candidate phospholipid biomarkers of chronic kidney disease in hyperglycemic individuals and their organ-specific exploration in leptin receptor-deficient db/db mouse. Metabolites 2021, 11, 89. [Google Scholar] [CrossRef]
- Thygesen, K.; Alpert, J.S.; Jaffe, A.S.; Chaitman, B.R.; Bax, J.J.; Morrow, D.A.; White, H.D. Fourth Universal Definition of Myo-cardial Infarction (2018). Glob. Heart 2018, 13, 305–338. [Google Scholar] [CrossRef]
- Palomäki, P.; Miettinen, H.; Mustaniemi, H.; Lehto, S.; Pyörälä, K.; Mähönen, M.; Tuomilehto, J. Diagnosis of acute myocardial infarction by MONICA and FINMONICA diagnostic criteria in comparison with hospital discharge diagnosis. J. Clin. Epidemiol. 1994, 47, 659–666. [Google Scholar] [CrossRef]
- Alpert, J.S.; Thygesen, K.; Antman, E.; Bassand, J.P. Myocardial infarction redefined--a consensus document of The Joint Euro-pean Society of Cardiology/American College of Cardiology Committee for the redefinition of myocardial infarction. J. Am. Coll. Cardiol. 2000, 36, 959–969. [Google Scholar]
- Adam, J.; Brandmaier, S.; Leonhardt, J.; Scheerer, M.F.; Mohney, R.P.; Xu, T.; Bi, J.; Rotter, M.; Troll, M.; Chi, S.; et al. Metformin Effect on Nontargeted Metabolite Profiles in Patients with Type 2 Diabetes and in Multiple Murine Tissues. Diabetes 2016, 65, 3776–3785. [Google Scholar] [CrossRef]
- van Buuren, S.; Groothuis-Oudshoorn, K. mice: Multivariate Imputation by Chained Equations in R. J. Stat. Softw. 2011, 45, 1–67. [Google Scholar] [CrossRef]
- Koenig, W.; Khuseyinova, N.; Baumert, J.; Meisinger, C. Prospective study of high-sensitivity C-reactive protein as a determinant of mortality: Results from the MONICA/KORA Augsburg Cohort Study, 1984–1998. Clin. Chem. 2008, 54, 335–342. [Google Scholar] [CrossRef]
- Rathmann, W.; Strassburger, K.; Heier, M.; Holle, R.; Thorand, B.; Giani, G.; Meisinger, C. Incidence of Type 2 diabetes in the elderly German population and the effect of clinical and lifestyle risk factors: KORA S4/F4 cohort study. Diabet. Med. 2009, 26, 1212–1219. [Google Scholar] [CrossRef]
- Amodio, M.; Youlten, S.E.; Venkat, A.; San Juan, B.P.; Chaffer, C.L.; Krishnaswamy, S. Single-cell multi-modal GAN reveals spa-tial patterns in single-cell data from triple-negative breast cancer. Patterns 2022, 3, 7040. [Google Scholar] [CrossRef]
- Gao, M.; Ruan, N.; Shi, J.; Zhou, W. Deep Neural Network for 3D Shape Classification Based on Mesh Feature. Sensors 2022, 22, 7040. [Google Scholar] [CrossRef]
- Muntasir Nishat, M.; Faisal, F.; Jahan Ratul, I.; Al-Monsur, A.; Ar-Rafi, A.M.; Nasrullah, S.M.; Reza, M.T.; Khan, M.R.H.J.S.P. A comprehensive investigation of the performances of different machine learning classifiers with SMOTE-ENN oversampling technique and hyperparameter optimization for imbalanced heart failure dataset. Sci. Program. 2022, 2022, 3649406. [Google Scholar] [CrossRef]
- Kim, M.J.; Song, J.Y.; Hwang, S.H.; Park, D.Y.; Park, S.M. Electrospray mode discrimination with current signal using deep convolutional neural network and class activation map. Sci. Rep. 2022, 12, 16281. [Google Scholar] [CrossRef]
- Rai, H.M.; Chatterjee, K.; Dashkevych, S. The prediction of cardiac abnormality and enhancement in minority class accuracy from imbalanced ECG signals using modified deep neural network models. Comput. Biol. Med. 2022, 150, 106142. [Google Scholar] [CrossRef]
- Zhang, H.; Zhao, Y.; Kang, H.; Mei, E.; Han, H. Multi-Input Deep Convolutional Neural Network Model for Short-Term Power Prediction of Photovoltaics. Comput. Intell. Neurosci. 2022, 2022, 9350169. [Google Scholar] [CrossRef]
- Junttila, M.J.; Barthel, P.; Myerburg, R.J.; Mäkikallio, T.H.; Bauer, A.; Ulm, K.; Kiviniemi, A.; Tulppo, M.; Perkiömäki, J.S.; Schmidt, G.; et al. Sudden cardiac death after myocardial infarction in patients with type 2 diabetes. Heart Rhythm. 2010, 7, 1396–1403. [Google Scholar] [CrossRef]
- Bubenikova, A.; Skalicky, P.; Benes, V., Jr.; Benes, V., Sr.; Bradac, O. Overview of cerebral cavernous malformations: Compar-ison of treatment approaches. J. Neurol. Neurosurg. Psychiatry. 2022, 93, 475–480. [Google Scholar] [CrossRef]
- van den Elshout, R.; Scheenen, T.W.J.; Driessen, C.M.L.; Smeenk, R.J.; Meijer, F.J.A.; Henssen, D. Diffusion imaging could aid to differentiate between glioma progression and treatment-related abnormalities: A meta-analysis. Insights Imaging 2022, 13, 158. [Google Scholar] [CrossRef]
- Tuleasca, C.; Aboukais, R.; Vannod-Michel, Q.; Lejeune, J.P. Microsurgical resection under intraoperative MRI guidance and diffusion tractography for a cavernous malformation of the primary motor cortex. Acta Neurol. Belg. 2023, 123, 1591–1595. [Google Scholar] [CrossRef]
- Kahraman, A.S.; Kahraman, B.; Ozdemir, Z.M.; Karaca, L.; Sahin, N.; Yilmaz, S. Diffusion-weighted imaging of the liver in as-sessing chronic liver disease: Effects of fat and iron deposition on ADC values. Eur. Rev. Med. Pharmacol. Sci. 2022, 26, 6620–6631. [Google Scholar]
- Arora, S.C.; Sharma, M.; Singh, V.K. Using diffusion of innovation framework with attitudinal factor to predict the future of mobility in the Indian market. Environ. Sci. Pollut. Res. 2023, 30, 98655–98670. [Google Scholar] [CrossRef]
- Salas-Nuñez, L.F.; Barrera-Ocampo, A.; Caicedo, P.A.; Cortes, N.; Osorio, E.H.; Villegas-Torres, M.F.; González Barrios, A.F.J.M. Machine Learning to Predict Enzyme–Substrate Interactions in Elucidation of Synthesis Pathways: A Review. Sci. Rep. 2024, 14, 154. [Google Scholar] [CrossRef] [PubMed]
Figure 1.
Roadmap and structure of deep-learning models. (a) Study design outlining the three steps used to train the prediction model, along with the datasets involved at each stage. (b) Overview of the DL model construction in three steps. The number below represents the number of neurons in each layer. Step 1: GAN model construction, which comprises a discriminator and a generator. The discriminator begins with an input layer of 382 features, followed by three hidden layers with 200 neurons each, designed to progressively capture patterns. An intermediate layer followed by one subsequent layer with 100 neurons allows the network to refine its learned features. The output layer culminates in a single neuron that outputs the prediction through a sigmoid activation function. For the generator, we employed an autoencoder model. The generator consists of an encoder and a decoder. In the encoder, Layer 1 compresses the input to 300 neurons, followed by BatchNorm and Rectified Linear Unit (ReLU) processing. Layer 2 further compresses to 75 neurons, again followed by BatchNorm and ReLU processing. The decoder’s Layer 1 expands from 75 to 150 neurons, and uses BatchNorm and ReLU. Layer 2 restores the original dimensionality with 382 features. Each hidden layer is equipped with ReLU activation to introduce non-linearity, aiding in learning more complex functions. Batch normalization is applied after each ReLU activation to stabilize learning and improve convergence rates. We employed the Adam optimizer for training our models, with the learning rate set at 0.005 and weight decay at 0.0001. The models were trained for 3000 steps to ensure convergence. In Step 2, for visualizing the generated data, we designed another autoencoder. The encoder’s input layer accepts an input of 382 features, which it further compresses through five layers (400, 300, 200, 100, and then down to 3 neurons), each followed by a LeakyReLU activation function. The decoder’s Layer 1 expands from the bottleneck of 3 neurons to four additional layers (100, 200, 300, and finally 400 neurons). We employed the Adam optimizer for training this model, setting the learning rate set at 0.0001. The models were trained for 2000 epochs to ensure effective learning and optimal convergence. In Step 3, we designed an MLP model for the final prediction. The model starts with an input of 382 features, which it sequentially reduces to half its size (191 neurons), then to a quarter of the original input size (95 neurons), and then to one-eighth of the original input size (47 neurons). The MLP model was also trained using the Adam optimizer, setting the learning rate at 0.00005 and weight decay at 0.0001, and then training for 200 epochs to ensure thorough learning and convergence of the model.
Figure 2.
Autoencoder training results. (a) This plot demonstrates how the loss values changed during the model training epoch. As the training progressed, the loss value gradually decreased and eventually stabilized at a minimal level. This indicates that the model was fully trained and able to capture the important features of the data. (b) This plot illustrates the changes in the differences between the generated data and the observed incident MI cases resulting from the GAN model with various training times. The green nodes represent the generated data, while the purple nodes represent the observed incident MI cases. It can be observed that, as the training time increased, the generated data became more aligned with the observed incident MI cases, indicating the improvement in the model’s ability to generate data that closely resembles the real cases.
Figure 3.
Comparison between feature-enhanced data and observed incident myocardial infarction cases. (a) Four plots illustrate the distributions of four clinical variables (BMI, sex, physical activity, and glucose tolerance groups) in the generated and observed datasets, respectively. The distribution of a subset of eigenvalues for each variable is shown. We observe a normal distribution trend in the generated data that closely resembles the observational data for BMI, but male sex and irregular and inactive physical activity, as well as prediabetes and type 2 diabetes in glucose tolerance groups, are mainly present in the generated dataset. (b) Here, we present the scatter plots showing the correlations between the generated and observed data. The points on the scatter plot are close to the diagonal line y = x, indicating that the mean and standard deviation of the generated data are similar to those of the observed data. (c) After-dimensional reduction using PCA (Principal Component Analysis); the trend of the feature-enhanced data is closer to the incident cases. We selected the most significant features for enhancement, and the results of the PCA analysis show that the feature-enhanced data aligns more closely with the observed incident MI cases.
Figure 4.
The loss curves of our models in various conditions as depicted by the MLP results. We replicated the entire process by splitting the combined dataset into training, testing, and validation datasets ten times, and each time, we replicated the prediction model training process and evaluated the model using test and validation datasets. (a) MLP training results with BCE loss. (b) MLP training results with GFE loss. (c–e) The variation in ACC in the MLP training process under different situations. (c) The training curve obtained directly from the combined data. (d) The training process curve obtained after a one-time SMOTE+ENN sampling; the accuracy rate of the validation dataset occasionally varied from 64% to 68%, resulting in unstable training results. (e) Two SMOTE+ENN sampling processes were conducted, and the training results were more stable. The accuracy of the prediction of MI, compared with the validation dataset, was approximately 70%.
Figure 5.
(a) The comparison between myocardial infarction prediction models (KORA Study). Due to the disparity between incident MI cases and non-MI individuals in the test and validation datasets, the model’s accuracy is determined by averaging the accuracy values for incident MI and non-MI cases of 50 outcomes. It compares our methods to five common methods, both ML (SVM, RF, and DT) and DL (CNN and LSTM). In addition, each method displays the model’s accuracy in predicting incident cases and non-MI cases, as well as the overall average accuracy on the validation dataset. The orange lines indicate that the prediction results were the same and overlapped despite being trained at different times. (b) Confusion metrics of the prediction model’s performance on the validation dataset. (c) Illustration of the significance of numerous model variables, including phenotypes and metabolites. When the change in sensitivity was larger, which indicated that the variable was more significant, the change in the model’s prediction values increased. The blue dots correspond to all variables, while the red dots refer to the top 20 variables with the most significant changes in sensitivity.
Table 1.
Characteristics of the baseline KORA S4 study participants.
Clinical Variables | Incident MI N = 78 | Non-MI N = 1376 | p-Values |
---|
Age, year | 65.50 [62.00, 70.00] | 64.00 [59.00, 68.00] | 0.005 |
Female sex, % | 25.61 | 52.07 | <0.001 |
BMI, kg/m2 | 28.72 [26.79, 32.58] | 27.93 [25.63, 30.90] | 0.005 |
Waist-to-hip ratio | 0.95 [0.90, 0.99] | 0.90 [0.83, 0.96] | <0.001 |
Systolic BP, mmHg | 144.75 [128.62, 159.62] | 135.00 [122.00, 148.00] | 0.001 |
Diastolic BP, mmHg | 80.00 [73.50, 87.00] | 82.25 [74.12, 89.00] | 0.100 |
Total cholesterol, mg/dL | 237.10 [215.10, 263.15] | 242.40 [214.57, 269.60] | 0.495 |
HDL cholesterol, mg/dL | 52.80 [42.38, 59.58] | 56.50 [46.40, 67.82] | 0.003 |
LDL cholesterol, mg/dL | 152.10 [126.27, 178.00] | 153.65 [130.85, 181.93] | 0.528 |
HbA1c (%) | 5.60 [5.40, 5.90] | 5.75 [5.50, 6.40] | 0.001 |
Fasting glucose, mg/dL | 99.00 [93.00, 109.00] | 107.00 [97.25, 136.00] | <0.001 |
Alcohol intake, g/day | 6.60 [0.00, 22.86] | 7.93 [0.00, 20.00] | 0.976 |
Smoker, % | 15.39 | 13.72 | 0.321 |
Physical activity, % | | | 0.001 |
Active (>2 h/week) | 12.84 | 17.41 | |
Moderate (>1 h/week) | 11.51 | 25.81 | |
Irregular (<1 h/week) | 11.52 | 15.33 | |
Inactive | 64.13 | 41.46 | |
Glucose tolerance groups, % | | | <0.001 |
Normal glucose | 39.21 | 61.30 | |
Prediabetes | 24.27 | 23.92 | |
Type 2 Diabetes | 35.12 | 14.59 | |
Fasting, % | 88.71 | 79.53 | 0.022 |
Stroke, % | 2.20 | 3.81 | 0.568 |
Statin user, % | 9.29 | 7.72 | 0.782 |
hs-CRP, mg/L | 1.67 [0.83, 3.38] | 2.55 [1.30, 7.00] | <0.001 |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).