1. Introduction
Every year, more than 17 million people die from cardiac diseases, which collectively remain the leading cause of mortality worldwide. According to the World Heart Federation, approximately 75% of all cardiovascular disease (CVD) patients reside in low-income communities [
1]. Electrocardiography (ECG) is the most accurate and trustworthy tool available for diagnosing cardiac disease since it is non-invasive and accurately represents the electrical rhythm of depolarization and repolarization of the cardiovascular system. ECG captures the electrical activity created by depolarizing the heart muscle, propagating pulsing electrical waves towards the skin. Even though the energy level involved is low, it may be successfully detected using sensors connected to the chest [
2]. The earlier an irregular cardiac rhythm is identified, the less severe the consequences are, and the faster the patient recovers from the condition [
3]. However, ECG signals have complicated and highly chaotic properties, making their interpretation time-consuming and laborious, even for experienced professionals [
4]. As a result, computer-assisted approaches are necessary to ease human workloads and eliminate misinterpretations caused by fatigue, differences between operators, and operator-specific mistakes, among other factors.
Machine learning (ML) is an essential tool for predicting and diagnosing deadly illnesses [
5,
6]. As a sub-branch of ML, deep learning (DL) has yielded outstanding results in the medical field, sometimes even outperforming physicians [
7], because its hierarchical structure allows substantial and high-level feature extractions that improve classification accuracy.
In tackling machine learning challenges, feature learning has emerged as a critical success factor [
8]. However, most ECG classification methods depend on hand-crafted techniques for feature extraction using signal processing tools and methods such as filters [
9], Fourier transforms, and wavelet transformation [
10,
11]. ML classifiers such as support vector machines (SVMs) have been utilized for classification [
12]. The separation of these methods’ feature extraction and pattern classification components is their main disadvantage. Additionally, these methods require domain experience with the processed data, and the attributes must be chosen. Furthermore, extracting features with the aid of experts takes time, and features may not be resistant to noise, resizing, or transformation, which means that they may not generalize well to new data [
13]. As a result, we must recognize the significance of effective models and the capacity to acquire new features automatically in order to develop a comprehensive feature extraction and classification model. Most shallow and classical DL models rely on a single model for their initial training. Many researchers have recently shown an interest in the performance of deep neural networks in interpreting ECG signals, particularly convolutional neural networks (CNNs) that use one-dimensional (1D) and two-dimensional (2D) convolution to enhance their performance. DL models may learn invariant and hierarchical features directly from data, with ECG signals as input and class prediction as output. Recurrent neural networks (RNN), CNNs [
14], and autoencoders are utilized for 1D ECG categorization. The input ECG data are converted into images or 2D representations for 2D ECG classification. Experiments have shown that the accuracy of 2D ECG classification is better than that of 1D ECG classification [
15]. This paper presents a new method for classifying ECG signals, inspired by previous work [
16], that takes advantage of two types of models and avoids their disadvantages, resulting in a better overall model. The proposed ConvXGB model is a new DL model for ECG classification that combines the performance of a CNN with XGBoost. As the results of this study demonstrate, ConvXGB performs better than either CNN or XGBoost alone and performs better than state-of-the-art models. The reasons for choosing these two model types were as follows:
XGBoost is a scalable ML approach for tree boosting designed to prevent the overfitting of data. It performs well on its own and in a variety of ML contexts. However, there is some uncertainty about the effectiveness of this method with respect to feature learning.
The use of CNN, a DL class with multiple levels of hierarchical learning, improves the clarity of the results.
This combination of methods has been tested on many datasets and has been shown to solve classification problems more accurately than other methods [
16]. To overcome the shortcomings of existing methods for ECG signal classification, a new model, referred to as ConvXGB, is proposed that handles the feature extraction perfectly and reduces the number of parameters required to achieve the best performance results among the methods compared with the lowest processing time and effort. The most significant contributions of this paper can be summarized as follows:
We propose an end-to-end method for ECG signal classification (tested on two commonly used datasets) without the need for any complicated signal pre-processing.
The proposed method is suitable for deployment in low-computational-power devices (such as mobile phones) because many hyperparameters are needed to reduce the prediction time (0.6 ms).
We achieved better results with this hybrid method than the existing state-of-the-art method in terms of accuracy, precision, recall, F1-score, and specificity, and the lowest false-negative and false-positive rates.
To demonstrate the robustness and generalization ability of the proposed method, we tested it on a dataset from a different source and achieved highly accurate results.
The remainder of this paper is structured as follows:
Section 2 provides a brief overview of studies relevant to ECG categorization. The methods used in this research are described in
Section 3.
Section 4 presents the experiment conducted.
Section 5 discusses the results obtained. Finally, we present conclusions drawn from the result in
Section 6.
2. Related Work
ML algorithms are used widely in ECG signal classification. ML classifiers such as SVMs have been shown to perform better in ECG signal classification than other algorithms, such as neural networks (NN), random forests (RF), and Bayesian algorithms [
17], [
18]. In [
19], an SVM was merged with linear discriminant analysis to produce a classification method for six arrhythmia types. Ref. [
20] presented a multi-layer perceptron (MLP) model, which exhibited good performance. Ref. [
21] presented a 1D CNN technique for five types of arrhythmia classification. A wavelet is first used for noise removal, and then a 1D CNN model is used for feature extraction. A fully connected layer with a softmax activation function is used for classification. The same 1D CNN method was used in [
22] to classify four ECG signal types with denoising pre-processing steps. Current methods perform signal pre-processing, feature extraction, and prediction [
23,
24,
25,
26,
27]. Various algorithms are used to perform these actions [
28,
29,
30,
31,
32,
33]. Other studies have used a deep belief network for ECG signal classification and thereby reduced false negatives effectively [
34,
35,
36,
37]. Stacked autoencoders have been used in other studies [
38,
39]. In [
40,
41], CNN and long short-term memory (LSTM) were used to build the encoding and decoding layers. Residual neural networks (RNNs) have been used to classify ECG signals and handle time-series data. RNN models have also been combined with other DL models, and LSTM has been combined with CNN [
42]. Additionally, bidirectional LSTM has proven to be successful in classifying ECG signals because of its ability to process data in both the previous and forward directions [
43].
It can be observed that most of the earlier mentioned works are based on machine learning methods after feature extraction steps that require domain knowledge and are time-consuming, or based on deep learning methods, which require a large amount of data and a long training time. Our suggested method uses both methods by using CNN for feature extraction (without training) and XGBoost classifier for classification. This technique reduced the training process time and achieved high results.
3. Materials and Methods
The overall classification process is illustrated in
Figure 1, using the MIT-BIH dataset (in the upper part of
Figure 1 for classifying five ECG signal classes) and the PTB dataset (in the lower part of
Figure 1 for classifying two classes). For each dataset, the signals are fed into the hybrid ConvXGB model for classification.
The proposed method has two main parts. The first part consists of three convolutional layers and a max pool layer for feature extraction, and the second part uses the XGBoost classifier for classification. Details of the architecture and model layers are shown in
Figure 2.
3.1. CNN Model
As mentioned earlier, the convolutional neural network represents the first part of the proposed method. For input data
in
rows and
columns,
can be defined as shown in Equation (1):
where
is the data value at
. When a kernel (or filter)
of dimension
is applied by a stride,
, the product can be defined as shown below:
For each convolutional layer
, a bias will be added and a convolution operation applied. For a feature map indexed by
, the output,
, of the
lth layer for the
nth feature map is obtained from the output of the previous layer,
, as follows:
where
is the used activation function (Relu),
is the bias matrix, and
is the filter of size
. The output of layer
for the
feature map,
, at position
is therefore as follows:
To avoid overfitting and downsampling the features extracted, a pooling layer is applied, which replaces the output with the average or maximum value of a sliding window. For example, by applying the pooling function
, the output will be defined as shown below:
The output is then classified using XGBoost in the next part of the method.
3.2. XGBoost Model
XGBoost is a machine learning method for classification and regression problems developed by Chen and Guestrin [
1]. It is a massively efficient approach for classification and regression problems and uses a tree ensemble to improve performance. The ensemble summation of all
and regression trees (CARTs), where each tree has
nodes, is as follows:
where
is the training set samples,
is the
tth tress’s leaf score, and
is all classification tress
scores. The results are improved by applying regularization as follows:
where
is the cost (loss) function used to calculate the difference between ground truth
class labels and the prediction label
.
is a function used for penalizing the model’s complexity and avoiding overfitting and can be expressed as follows:
where
and
are constants for the degree of regularisation,
is the number of leaves on each tree, and
is the leaf weight.
To simplify the objective at step (
), a second-order Taylor expansion is used, as shown below:
since
represents the leaf
instance set,
the optimal weight
of leaf
can be calculated for a fixed structure
as shown below:
The corresponding optimal value can thus be calculated as follows:
It is usually difficult to list all of the potential tree architectures
q, so instead, a greedy method is utilized, which starts with a single leaf and iteratively adds branches to the tree. After the split, let us assume that
IL and
IR are the instance sets of left and right nodes. Letting
I = IL ∪ IR, the loss reduction after the split can be described as follows:
The XGBoost model’s hyperparameters were configured as follows:
Subsample, colsample_bytree, colsam[le_by_level, lambda, and scale_pos_weight are set to 1,
Gamma, and alpha are set to 0, n_estimators: 100, booster: gbtree, max_depth:6, and learning_rate: 0.3.
Furthermore, we noticed that the learning rate increases the training time significantly with little improvement in accuracy when it is less than 0.001. Additionally, the model will be overfitted when increasing the n_estimators and max_depth.
3.3. ConvXGB Algorithm
Algorithm 1 describes the complete learning algorithm used in this study.
Algorithm 1: Training process of the ConvXGB model. |
Input: PTB/MIT-BIH Dataset D = {L, y} Output: The well-trained hybrid neural network Model- 1:
Train the model using training set DTr; - 2:
for start in range (0, length (DTr)) do - 3:
for beat sample Li ∈ DTr do - 4:
//Multi-lead Attention Module; - 5:
α1 = ReLU(W1Li + b1); - 6:
Xi = α1⊗Li; - 7:
//CONVXGB with Attention Mechanism; - 8:
C1 ← Conv1D(Xi, kernels); kernel size: (5, 5) has 16 kernels with one stride; - 9:
C1 ← activation (C1, ReLU); - 10:
C1 ← padding(same); - 11:
C2 ← Conv1D(C1, kernels); kernel size: (5, 5) has 32 kernels with one stride; - 12:
C2 ← activation (C2, ReLU); - 13:
C2 ← padding(same); - 14:
C3 ← Conv1D(C2, kernels); kernel size: (5, 5) has 64 kernels with one stride; - 15:
C3 ← activation (C3, ReLU); - 16:
C3 ← padding(same); - 17:
M ← MaxPooling(C3, window); the size of window is (5, 5) with two strides; - 18:
XG ← MaxPooling(M); - 19:
ypre ← XGB Classifier (XG); - 20:
end for - 21:
end for - 22:
return well-trained Model;
|
3.4. Performance Measurements
The suggested technique’s categorization task is ECG heartbeat categorization for arrhythmia and MI detection. The performance measures used for categorization are accuracy, precision, recall, F1-score, and specificity. These measures are calculated using the following equations:
where
is the number of instances correctly categorized as required,
is the number of cases incorrectly categorized as required,
is the number of instances correctly categorized as not required, and
is the number of instances incorrectly categorized as not required. Additionally, the AUC–ROC curve is a performance indicator for situations involving categorization with variable threshold values. The AUC value represents the degree or measure of separability, whereas the ROC value represents a probability curve. The AUC–ROC curve represents the model’s ability to differentiate across classes. For instance, the higher the AUC is, the more reliably the model predicts zero classes as zero and one class as one. For the MIT-BIH, a multiclass dataset macro averaging technique was used and a simple arithmetic means for all classes’ performance metrics. The macro method considers an equal weight for each class.
For the confusion matrix, a cut-off with a threshold value of 50% is used to identify the prediction class. Thus, when the class probability of the prediction is more than 50%, the sample will be classified in that class.
5. Results and Discussion
Table 4 summarizes the results of the application of the ConvXGB method to MIT-BIH and PTB datasets in terms of the performance measures previously mentioned.
Figure 6 shows the confusion matrices of the proposed model for both datasets. The diagonal entries reflect the percentages of successfully categorized classes, while entries off the diagonal reflect improper categorization. The
x-axis and
y-axis represent the predicted labels and actual labels, respectively.
Figure 7 shows the values obtained for AUC, another performance measure used to evaluate the results. Higher values of AUC indicate a better performance at distinguishing between the positive and negative classes.
Figure 7 shows that the AUC for the PTB dataset was 100%. The confusion matrix of the model predictions for the PTB dataset shows the low rates of false negatives and false positives (approaching zero), while the result for the MIT dataset indicates some misclassified samples for the AP and FVN classes.
The prediction time for one sample in each dataset was measured and was close for the two datasets (approximately 0.6 ms). This low prediction time indicates that the proposed method is suitable for implementing low-computational-power devices to be developed and applied in the medical field or used as screening tools.
5.1. Ablation Study
An ablation study was conducted to demonstrate the effectiveness of combining a CNN (for feature extraction) with XGBoost (for classification of the extracted features). We built CNN and XGBoost models separately, with the same architecture as in the ConvXGB model, and the same performance metrics were used. The results are shown in
Table 5.
Table 5 shows that the CNN model outperformed the XGBoost model with respect to all performance metrics by at least 2% for the MIT dataset and by approximately 10% for the PTB dataset. The XGBoost model, however, had by far the lowest training time.
On the other hand, the combination of both CNN and XGBoost (our proposed method) produced a better performance than the best performance achieved by the CNN model alone (approximately 1–2% better, as shown in
Table 4) and a significantly lower training time than the XGBoost model. These results demonstrate that the proposed ConvXGB method combines the best of both methods and achieves the best performance with the lowest training time.
5.2. ConvXGB Comparison with Literature
Table 6 presents the performance results obtained with the most recent state-of-the-art models and the ConvXGB model. The results show that the method proposed in this paper has the highest accuracy, precision, and recall for both datasets used. Some studies have only used accuracy to evaluate model performance, whereas, in this work, several performance measures were used to evaluate the different methods more comprehensively.
It is worth mentioning that for most of the methods compared, high computational power was required, while the proposed method does not require much computational power, and its running time is extremely low. The lower values of recall and precision for most of the state-of-the-art methods indicate higher false-positive and false-negative rates than those of the method proposed in this paper.
The main limitation of the method proposed in this paper is that it is based only on datasets of one-lead ECG signals. We plan to test this method on datasets based on multi-lead ECG signals in the future to overcome this limitation.
5.3. Testing of the Method on Another Dataset
To demonstrate the generalizability and robustness of the proposed method, we tested it using another publicly available ECG dataset from the Kaggle website (
https://www.kaggle.com/devavratatripathy/ecg-dataset, accessed on 16 June 2022). The dataset contains 2919 normal samples and 2079 MI samples. Each sample represents a complete ECG of a patient with 140 single-lead readings. The results obtained were even better than the results of our experiment, as shown in
Table 7.
The confusion matrix in
Figure 8 shows the low rates of false positives and false negatives (approaching zero), and
Figure 9 shows the ROC curve, with the area under the curve (AUC) approaching 100%.
6. Conclusions
The accurate classification of ECG waves is exceptionally beneficial in preventing and detecting cardiovascular diseases. By integrating medical and contemporary machine learning technologies, deep CNNs have proven to be highly effective in improving the accuracy of cardiovascular disease diagnosis through ECG signal feature extraction. Similarly, the XGBoost algorithm has demonstrated its exceptional ability in classification. We propose a new model, ConvXGB, that achieves both high computational efficiency and high accuracy by combining the CNN and XGBoost methods. The results of experiments conducted using PhysioNet’s MIT-BIH dataset for five distinct arrhythmias and the PTB diagnostics dataset for MI classification show that the proposed hybrid model is superior to both of its component models. The proposed model also outperforms existing state-of-the-art classification methods in terms of accuracy, precision, and recall. The most notable finding of the study is that using ConvXGB improves machine learning task performance compared to either approach used separately. Our proposed method correctly classified arrhythmias 99.38% of the time. This result demonstrates that the proposed ConvXGB approach is highly successful in classifying arrhythmia. As a limitation, we recommend re-training the model when using real-world data since the experiments conducted in the ideal datasets are not comparable to data from the real world.
In future research, this method should be fine-tuned and modified for use in real-time systems to classify heartbeat signals to advise medical experts. In addition, it would be more efficient to use multi-channel signals rather than depending on just one lead’s signal.