1. Introduction
ST-segment elevation myocardial infarction (STEMI) is the leading cause of heart failure and death [
1]. Early diagnosis of STEMI can effectively shorten the revascularization time, which helps doctors adopt precise treatment strategies, thereby reducing the incidence of heart failure and mortality [
2]. Coronary angiography (CAG) is the gold standard for diagnosing STEMI, but it is invasive, time-consuming, and expensive. Electrocardiography (ECG) is a noninvasive and effective screening tool to detect STEMI in patients with chest pain [
3]. However, faced with a large number of ECGs, the diagnosis of STEMI has become a great challenge for clinical physicians [
4,
5].
Although most ST elevation in the ECG indicates myocardial ischemia, there are many nonischemic etiologies to induce ST elevation, such as bundle branch block, ventricular hypertrophy, ventricular preexcitation, premature ventricular beat, and pacemaker rhythm [
6]. These changes can mask the STEMI-triggered ST-segment elevation and cause real STEMI to be missed. In addition, the decrease in ECG amplitude can lead to missed diagnoses of STEMI, such as pulmonary disease, effusion, or anasarca [
7]. Moreover, the diagnostic accuracy of ECG varies by level of the doctor, especially in primary and community hospitals. Therefore, the rapid and accurate diagnosis of STEMI based on ECG is still an urgent issue that needs to be resolved.
With the rapid growth of machine learning technologies, several successful ECG automatic diagnosis algorithms have achieved positive results for the detection of STEMI patients [
8]. There are several machine learning algorithms for analysing ECG, which have solved the problems of noise reduction, feature extraction, detection of arrhythmia, and left ventricular hypertrophy [
9,
10,
11,
12]. For instance, an artificial intelligence (AI) network can analyze STEMI ECG through signal transformation and analysis, as well as automated ECG feature extraction [
13,
14]. However, these models have insurmountable defects, as most of them were trained and validated using data from the MIT-BIH database (PhysioNET) and the PTB database (physiobank) [
14,
15]. Moreover, some research excluded arrhythmias that may affect QRS morphology and ST-segment changes. Recently, a machine learning model was built based on real-world ECG data to detect ACS, but it failed to confirm the accuracy by comparison to CAG [
16]. Due to the above reasons, there were rare machine learning models that can effectively detect STEMI with arrhythmias and diagnose infarct-related arteries in myocardial infarction.
In this study, we established a real-world ECG database, which was confirmed by gold-standard CAG. Moreover, a LASSO regression model was built and trained to diagnose STEMI and determine the location of infarct-related arteries, followed by a comparison of the diagnostic performance between machine learning and doctors.
4. Discussion
In this study, we reported a machine learning algorithm based on 12-lead ECG to detect STEMI, which showed high sensitivity and specificity in distinguishing STEMI, with an AUC of 0.94. In addition, we demonstrated that the LASSO model improved the diagnostic accuracy of detecting LAD lesions, with a low false positive rate and a high NPV.
The first finding of this study was that the LASSO method was able to reduce the regression coefficient and cut 180 candidate ECG features down to 14 potential predictors in model 1 and 4 potential predictors in model 2. This method preceded traditional methods of choosing the ECG index according to the strength of the univariable association with outcome.
The innovation of data science, especially machine learning and AI, has brought revolutionary changes to the diagnosis of ECG, breaking through previous diagnosis concepts [
18]. Previous ECG signal acquisition, filtering, and processing capabilities were performed by ANN, SVM, AdaBoost, and naive Bayes classifiers, with ACCs reaching 99.7% [
19]. These algorithms extracted the signal of the original ECG diagram and detected the peak point of the QRS waveform by adopting a peak-detection algorithm. However, identifying the ST-segment and T wave changes is much more complex than identifying QRS waveforms. To avoid overfitting, random forest can be utilized in practical ECG applications, especially wearable medical devices and implanted medical devices, for wave detection and arrhythmia classification [
20,
21]. Many neural networks use a convolution process to mimic how the visual cortex addresses images. Unlike many other machine learning methods, deep learning models not only associate input features with outputs of interest but also learn the features from the original data [
18]. Recently, a new model, STA-CRNN, has been reported to recognize most arrhythmias, reaching an average F1 score of 0.835. Through visualization, it is proven that the learning characteristics of STA-CRNN are consistent with clinical judgment [
22].
AI technology is becoming smarter and more accurate in detecting arrhythmia, but it is still incompetent in the diagnosis of acute myocardial infarction. Yifan Zhao et al. proposed a Res-Net block to differentiate STEMI ECG from control ECG, with an AUC of 0.99, which was similar to that of cardiologists [
8]. However, these models cannot identify the infarct-related arteries of STEMI.
The second advantage of this study is that we used real-world ECG data, which were further confirmed by CAG in both the control and STEMI groups. Most previous AI algorithms were based on the MIT-BIH database (PhysioNET) [
19] or the PTB database (physiobank) [
23], both of which have small sample sizes. For instance, the MIT-BIH Arrhythmia Database consists of 549 records from 290 subjects, including 148 cases of myocardial infarction and 52 healthy controls, containing 48 half-hour excerpts of two-channel ambulatory ECG recordings.
Unlike previous databases, our datasets are superior, as we included abnormal ECG phenomena that affected ST-segment changes, such as complete left bundle branch block, complete right bundle branch block, ventricular pre-excitation, premature ventricular beats, and ventricular tachycardia. In our study, this type of abnormal ECG phenomenon accounted for 9.5% (70/734) in Cohort 1 and 30% (26/86) in Cohort 2, with high proportions of ventricular premature beats, complete right bundle branch block, and left ventricular hypertrophy. Nestelberger et al. found that AMI occurred in approximately 30% of complete left bundle branch blocks. Using the modified Sagarbossa combined with 0/1 h or 0/2 h hs-cTnT could increase the diagnostic rate to above 90% [
24]. Although previous studies have suggested that a new complete left bundle branch block should be cautiously extrapolated to AMI, it is still necessary to identify STEMI in patients with left bundle branch block accompanied by chest pain [
25]. Ventricular pre-excitation likely manifests as false myocardial infarction with abnormal Q waves and ST-segment elevation or other symptoms that cover up real myocardial infarction and can lead to clinical misdiagnosis and missed diagnosis [
26,
27]. Patients with left ventricular hypertrophy have a higher incidence of myocardial infarction and stroke [
28]. In survivors of myocardial infarction, left ventricular hypertrophy suggests more severe structural and functional damage to the heart [
29]. In this study, our algorithm can still achieve good accuracy in a dataset containing several kinds of abnormal ECG phenomena.
Compared with deep learning, the ECG features screened by LASSO regression were more interpretable. V1 (Q) and V6 (Q) suggested pathological Q wave, AVL (TB) and II (TB) suggested J-point elevation, and III (ST80) suggested ST-segment elevation. These abnormal indices compose the diagnostic model of STEMI. Pathological Q wave and ST-segment elevation are important indicators of STEMI [
30]. Another new finding of this study was that we identified prolongation of the QT interval and decrease in the R wave peak as important markers of ECG changes in STEMI. Interestingly, we also noticed that V1 (Q), V2 (Q), V2 (TB) and V3 (ST40) contributed to the diagnosis of LAD and were related to LAD innervating the anterior ventricular septum, the left ventricular anterior wall and the right ventricular anterior wall.
There were some limitations of this study. First, our LASSO model can only discriminate the infarct-related arteries between LAD and RCA/LCx. Because the occlusion of LCx or RCA is the major reason for inferior myocardial infarction (AIMI), it is difficult to diagnose the infarct-related arteries that is caused by RCA or LCx occlusion according to 12-lead ECG. There are several ECG criteria to solve this problem, and we will explore a new ML model with knowledge fusion. Second, the sample size of this retrospective study was small, especially the external test dataset. Third, in this study, patients with multiple vessel lesions were excluded, and patients with multiple vessel lesions accounted for more than 40–50% of patients with myocardial infarction [
31]. The ECG pattern of STEMI with multiple vessels is variable and atypical. The change in ECG depends on the infarction area and the contribution degree of each vessel. Our model just tries to explore the differential diagnosis of infarct-related arteries in patients with a single vessel disease. Figuring out the infarct-related arteries in patients with multi-vessel coronary artery disease is still a major challenge for clinical physicians. We will explore the diagnostic efficacy of ECG in patients with multiple vessels in the real-world using the LASSO method in further studies. Moreover, further research is needed to clarify the location of the lesion (proximal versus distal) and the size of the infarct-related arteries. In real-world data, the incidence of STEMI-combined ECG abnormal phenomena, such as bundle branch block (left and right) or arrhythmias (such as AF and VT), is low. Because the real-world data are used in our study, the proportion of STEMI combined above ECG abnormal phenomenon is 10% (82/821), and the AUC is 0.879 (0.797–0.961). In order to verify the accuracy and robustness of the algorithm, we plan to construct a prospective study. In the future, we will embed this model into the application system so that clinicians can directly import ECG data and output results.
In this study, we constructed a machine learning model that provided good performance for detecting STEMI based on 12-lead ECG features, which were autoextracted from a real-world database. This machine learning model performed exceptionally with high diagnostic accuracy similar to that of experienced cardiologists, especially in the location of LAD vessel disease.