1. Introduction
Heart disease has emerged as one of the most pressing health challenges both globally and nationally [
1,
2]. Cardiovascular ailments account for a significant proportion of mortality rates, emphasizing the persistent and widespread nature of this medical concern. In some urban populations, changing lifestyles and genetic predispositions have contributed to a surge in cardiac-related issues, making them a primary public health focus [
3,
4]. As these diseases continue to be a major concern, the importance of technological advancements in diagnostic tools becomes even more evident. ECG signals, which depict the electrical activity of the heart over a period of time, have long been a cornerstone in the clinical detection of heart diseases [
5,
6]. However, the interpretation of these ECG signals is not straightforward and requires extensive medical knowledge and experience. Furthermore, in our current age of digital healthcare, vast amounts of ECG data are generated, necessitating the use of automated techniques that can efficiently process and interpret these signals to aid clinicians in their diagnostic processes. In contemporary medical practice, machine learning has emerged as an indispensable tool in the diagnosis of diseases [
7,
8]. It demonstrates the ability to process and interpret voluminous medical data swiftly, thereby enhancing diagnostic accuracy and efficiency. It finds its application in the analysis of intricate medical images—CT scans, MRIs, and X-rays—to identify minute patterns associated with diseases [
9]. These capabilities make the early detection of conditions such as cancer, neurodegenerative diseases, and cardiac ailments feasible [
10].
While ECG is a pivotal tool across various fields, it confronts certain inherent challenges that limit its in-depth analysis. For example, the ECG signal’s non-stationary nature [
11] indicates that its statistical features can change over time. Consequently, models that are trained using data from a specific time frame might struggle to effectively interpret data from a different time period, even if sourced from the same individual. This dynamic quality of ECG poses considerable obstacles to real-world applications. Furthermore, there is significant variability in ECG data between individuals. These variations, linked to diverse patterns among subjects [
12], can severely compromise the performance of models designed for a wider audience.
The use of machine learning in genomics enables the recognition of disease-associated genetic patterns and mutations, thereby facilitating personalized treatment approaches [
13,
14]. A particularly noteworthy application is predictive analytics, which uses machine learning to forecast disease outbreaks and progressions based on historical and real-time data, thus contributing significantly to public health management [
15]. However, traditional machine learning methods, despite their significant advancements, have demonstrated certain limitations when applied to ECG signal classification. Firstly, traditional machine learning techniques often require manual feature extraction from ECG signals, which can be time-consuming and may not capture all the nuanced information present in these signals [
16]. Secondly, ECG signals can have a vast range of morphologies due to patient-to-patient variability, and traditional methods might not always generalize well across diverse datasets. Thirdly, ECG signals are inherently non-linear and non-stationary, and traditional linear models can struggle to capture these dynamics effectively. Lastly, the rapid evolution of wearable devices producing continuous ECG monitoring necessitates models that can process large-scale data efficiently, adapt in real-time, and make instant predictions—tasks that might be beyond the scope of classical machine learning algorithms. These challenges have catalyzed the need for more advanced methods, such as deep learning architectures, which can automatically extract features, model complex relationships, and scale more effectively with large datasets [
17,
18]. These methods are, however, often sensitive to noise and lack robustness, leading to inconsistency in detecting complex patterns in ECG signals [
19,
20]. Furthermore, these algorithms may struggle to capture the intricate temporal dependencies inherent in ECG signals, often leading to significant information loss and, consequently, reduced diagnostic accuracy.
The complexities of these issues have necessitated ongoing and vibrant research efforts, capitalizing on innovative ideas, methodologies, and advancements. For instance, in recent years, deep learning models have demonstrated exceptional capabilities in an array of sequence and pattern recognition tasks. Their proficiency in these areas positions them as particularly promising tools for the interpretation of ECG signals. Specifically, LSTM [
21] models have proven adept at handling sequential data and capturing long-term dependencies in temporal sequences. On the other hand, CNN [
22] models have shown superior performance in detecting local patterns and learning spatial hierarchies directly from complex data. While there have been advancements and certain models excel in their respective areas, applying them independently to ECG signal classification might result in overlooking essential features that could be captured by the other. Furthermore, the accuracy of many methods is not yet as robust as required, limiting their trustworthiness for unsupervised real-world applications.
Considering these observations, the motivation for this study is rooted deeply in the aspiration to overcome the prevalent limitations of current methodologies. By recognizing that a singular approach often results in missed opportunities for comprehensive signal analysis, we propose an innovative ensemble of LSTM and CNN models tailored for the nuanced classification of ECG signals. This integrated framework not only amalgamates the strengths of both LSTM and CNN models, but also ensures a holistic capture of both temporal dynamics and spatial intricacies inherent in ECG signals. The end goal is a substantial improvement in the accuracy of heartbeat classification, setting a new standard in the field. Beyond the amalgamation of LSTM and CNN models, the study also places a significant emphasis on the quality of input data. Recognizing that the fidelity and reliability of the classification largely hinge on the quality of input signals, we employ advanced signal processing techniques to refine and enhance ECG signal inputs, ensuring that the deep learning model receives data of the highest possible quality. Through these concerted efforts, our study endeavors to pave the way for new research in ECG signal classification, where accuracy and reliability are paramount.
In this paper, we present a unique ensemble classification technique for ECG signals that offers several notable contributions. First, our method introduces a new ensemble framework expressly designed for time series classification, which integrates LSTM, bidirectional LSTM (BILSTM), and CNN models. This fusion capitalizes on the unique strengths of each model to enhance classification accuracy. Second, our methodology exhibits efficiency with its ability to achieve rapid processing speeds, thus facilitating real-time or near-real-time applications. Third, our ensemble model counteracts the issue of overfitting by utilizing the diversity among individual models, leading to improved robustness and generalization. Furthermore, the method we present is resilient against noise, demonstrating its applicability in real-world ECG signal classification scenarios. In this paper, we provide a detailed description of our approach and present the outcomes of experiments performed on benchmark ECG datasets. These results highlight the superior performance of our proposed ensemble model compared to traditional machine learning and standalone deep learning models in terms of accuracy, sensitivity, and specificity. By introducing this novel approach, we aspire to contribute significantly to the prompt and effective diagnosis of heart diseases.
3. Proposed Ensemble Technique
The ensemble neural network architecture, which harmoniously orchestrates multiple neural networks, plays a crucial role in bolstering the efficiency of classification systems [
46]. Its versatile applications span across a myriad of classification tasks, demonstrating its profound significance. This particular framework integrates the predictions drawn from a plethora of individual models, often referred to as base learners or weak classifiers, each trained independently. The collective inference from these diverse models culminates in the final classification verdict, embodying the strength of collaborative decision-making.
Employing LSTM and CNN in the classification of ECG signals presents a plethora of benefits. LSTM, with its distinct capability to identify and encapsulate long-term dependencies in sequential data, is exceptionally suited for interpreting ECG signals, which are inherently time series data. In contrast, CNNs are highly effective at discerning spatial patterns and hierarchies, thereby enabling them to reveal subtle features within ECG data that could indicate specific heart conditions. The amalgamation of LSTM’s expertise in temporal sequence recognition and CNN’s skill in spatial pattern identification offers a robust and highly accurate approach for ECG signal analysis. This combined methodology greatly enhances the precision of cardiac anomaly diagnosis, the personalization of treatments, and overall improvement in patient care within the field of cardiology.
Inspired by these considerations, we propose our ensemble architecture, which strategically leverages the strengths of LSTM and CNN models simultaneously. This novel approach aims to harness the power of both these methodologies, optimizing their individual benefits in a cooperative manner. Our proposed approach involves one-hot encoding the outputs from three individual classifiers, integrating them with the raw time series signal as well as transformed signals, and subsequently using them in the final classifier—an LSTM–CNN. The ensemble classifier structure for ECG signal classification is graphically illustrated in
Figure 3. The fundamental principle underlying our ensemble architecture is the belief that inherent diversity among individual models substantially enhances the system’s accuracy, resilience, and generalization capabilities. The following discussion expounds upon the benefits of our ensemble methodology.
Our proposed ensemble architecture brings significant advantages to the table when applied to time series classification, especially concerning the analysis of ECG signals. ECG signals, being dynamic and intricate in nature, are a rich source of information about the electrical activity of the heart. Our ensemble structure acts as a powerful tool to tackle the inherent variability and noise that are frequently encountered in ECG data. By combining predictions from a multitude of models, this ensemble approach manages to harness a broader range of patterns and is better equipped to adapt to the diverse characteristics of ECG signals. This results in a notable enhancement in the accuracy of ECG signal classification, which can be invaluable for diagnosing heart-related disorders, refining prosthetic control, and improving the outcomes of rehabilitation programs. Moreover, the robustness of our ensemble architecture against noisy ECG signals ensures a reliable analysis, considerably reducing the impact of signal artifacts on classification results.
It is important to acknowledge certain challenges inherent to the proposed ensemble neural network, which integrates LSTM and CNN architectures. First, while the design captures both spatial and temporal nuances of ECG signals, allowing it to manage data with minor to moderate noise levels, it can falter when confronted with highly noisy data. Such significant noise can mask pivotal features, potentially compromising the model’s ability to discern patterns. To circumvent this limitation, implementing noise reduction techniques before feeding data into the model is recommended [
47]. Second, consistent with the nature of ensemble methods, the computational demands of this model exceed those of individual neural networks. Third, due to its intricate architecture, the model’s performance might not meet the expected benchmarks when faced with an exceedingly limited dataset. However, data augmentation presents itself as an effective countermeasure to this challenge.
4. Pre-Classification Signal Processing
The dataset employed for this research, consisting of ECG recordings obtained using the AliveCor device, was generously made available by AliveCor [
48]. The training subset consists of 8528 single-lead ECG recordings, each lasting between 9 s and slightly over 60 s. In comparison, the test subset houses 3658 ECG recordings of similar lengths. These ECG recordings were sampled at a rate of 300 Hz and were subject to band-pass filtering using the AliveCor device. In
Figure 4, we present visual samples drawn from our comprehensive dataset, capturing the stark differences between atrial fibrillation and normal cardiac signals. Atrial fibrillation, a widely recognized cardiac arrhythmia, exhibits specific characteristics on the ECG, notably its rapid and irregular beats. In juxtaposition, the ‘normal’ cardiac signal epitomizes a rhythmic and systematic heart activity. However, even this ‘normal’ signal occasionally presents irregular anomalies (shown in
Figure 4), complicating the classification process. Such intricacies make classification challenging, if not unfeasible, with rudimentary models. Hence, there is a compelling necessity to employ more robust and powerful classifiers.
To enhance the performance of the classifier, we implement a feature extraction process. The choice of features to extract is guided by the computation of spectrograms, which we later employ as input for our deep learning network.
Figure 5 displays the spectrograms of both categories of signals in our dataset.
We then transform the spectrograms into one-dimensional signals using Time–Frequency (TF) moments. Each TF moment gleans specific information from the spectrogram, thus serving as a distinctive one-dimensional feature that can be inputted into the LSTM network. For this investigation, we focus on its spectral entropy. Spectral entropy is a measure of the flatness or “spikiness” of a signal’s spectrum [
49,
50]. A signal with a “spiky” spectrum (analogous to a sum of sinusoids) exhibits low spectral entropy, whereas a signal with a flat spectrum (such as white noise) shows high spectral entropy.
We derive the spectral entropy based on a power spectrogram, utilizing 255 time windows for the computation, similar to the approach adopted for instantaneous frequency estimation.
Figure 6 portrays the spectral entropy for each category of signal in our dataset. These refined features and the adopted signal processing methodology significantly contribute to the enhanced classification efficacy, as shown in the next section.
5. Numerical Results and Comparison
In this section, we provide a thorough evaluation of our proposed model’s performance, laying out an in-depth comparison against two simulations. The first simulation employed for comparison is a conventional LSTM model, which uses raw, unprocessed ECG signals. The primary aim of selecting this model is to assess the inherent capacity of the LSTM model to discern and learn from patterns within raw data and compare this ability with that of our proposed model. The second comparison method involves another conventional LSTM model. In contrast to the first method, this model utilizes ECG signals that have already undergone a preprocessing stage along with raw data. The specific preprocessing techniques applied were detailed and explained in
Section 4.
5.1. LSTM for Raw Time Series
The LSTM architecture implemented in this scenario starts with a sequence input layer, specifically engineered to handle an input sequence array of one dimension. This input layer is subsequently connected to an LSTM layer comprising 100 hidden units. A dropout layer, with a dropout rate of 0.2, is integrated next to mitigate overfitting. Subsequently, the network includes two fully connected layers, with 20 and 2 neurons, respectively, that classify the learned features. The outputs are then passed through a SoftMax layer that normalizes them into probabilities, before finally reaching a classification layer.
Figure 7 provides a visual representation of the model’s accuracy and corresponding loss for this particular case. This figure elucidates the trajectory of the accuracy and its associated loss function throughout the training process. It reveals an apparent issue: the model is struggling to extract significant features from the raw signal. This highlights an essential requirement for preprocessing the signal prior to inputting the data into the LSTM network.
Figure 8 presents the classification results obtained from testing data. In the figure presented, the vertical axis, labeled “True Classes”, delineates the actual categories or ground truth of the data samples. In contrast, the horizontal axis, termed “Predicted Class”, captures the classifications as perceived by the model. Together, these axes offer a visual comparison between the model’s predictions and the real labels, enabling an immediate assessment of classification accuracy and areas of potential discrepancy. It is evident that, despite numerous iterations, the LSTM model struggles to classify the raw signals effectively, revealing a lack of comprehension of the underlying function inherent in the data. This is further demonstrated by the fact that both true positives and true negatives account for less than 50% of the results. These observations highlight a pressing need for signal processing before feeding data to the LSTM model.
5.2. LSTM to Preprocess Time Series
Here, maintaining the previously defined neural network architecture, the preprocessed ECG data (delineated in
Section 4) is concatenated with the raw ECG signals to form an augmented dataset. This composite data serves as the input for the LSTM, allowing the model to concurrently leverage the enhanced features of the preprocessed data and the untouched, subtle details present in the raw data.
Figure 9 illustrates the evolution of the model’s accuracy and loss function throughout the training phase for this case, which incorporates both raw and preprocessed data. As can be observed, significant shifts in both metrics occur, suggesting the successful adaptation of the model in understanding the underlying function of the data. This is in stark contrast to the previous scenario as evidenced in
Figure 7, where the LSTM model struggled with raw data alone. Upon concluding the training phase, the classification efficacy of this case is evaluated on the test data.
Figure 10 offers a graphical depiction of these results. Upon comparing these outcomes with those illustrated in
Figure 8, there is a substantial improvement observed.
5.3. Proposed Approach
In our proposed methodology, we implement three distinct neural networks, each characterized by its own unique design. Following this, the outputs of these networks are amalgamated with both raw and preprocessed data, which then act as the input for a culminating network. The first classifier in our framework utilizes a bidirectional LSTM layer housing 100 hidden units. Subsequently, this forwards only the final sequence output to a two-neuron fully connected layer. Its terminal layers are composed of a regularization layer and a classification layer. The second classifier is designed with an LSTM layer containing 80 hidden units, which feeds into a fully connected layer with two neurons. The third classifier integrates an LSTM layer with 80 hidden units and a connected layer with a pair of neurons. This classifier concludes with a regularization layer followed by its classification layer. However, it is important to note that the number of layers and neurons in the proposed structure can be adjusted based on the specific application at hand.
Our concluding network initiates with a sequence input layer tailored for the input dataset, succeeded by a 1D convolution layer equipped with 96 filters, a max pooling layer with a pool size of 3, and a LSTM layer having 25 hidden units. The architecture is rounded off with a two-neuron fully connected layer and a final classification layer. Regarding training specifics, every network employs the Adam optimizer, with a cap of 30 epochs, a mini-batch size of 150, and an inaugural learning rate of 0.01. To counteract oversized gradients, a gradient threshold of 1 is set in place.
As displayed in
Figure 11, during the training phase, the model achieved convergence within a limited number of steps, marking a significant improvement in learning efficiency compared to the conventional LSTM model (see
Figure 7 and
Figure 9). The classification results of the proposed LSTM–CNN ensemble model are presented in
Figure 12. The ensemble model exceeded the performance of the conventional LSTM model by a substantial margin, confirming its superior capabilities in ECG signal classification.
When juxtaposing
Figure 8,
Figure 10, and
Figure 12, it becomes evident that our proposed approach, combined with preprocessing, induces a substantial improvement in the classification outcomes. The stark contrast in performance is especially noticeable when comparing the average accuracy rates. While
Figure 8 and
Figure 10 demonstrate the limitations of traditional and raw data methods,
Figure 12 showcases the efficacy of our method. In this case, the proposed method’s average accuracy is markedly high, achieving a score of 95.45%. These results underscore the value of implementing a combined strategy of data preprocessing and advanced network architecture in significantly enhancing the classification accuracy of ECG signals.
5.4. Comparison with the Bidirectional GRU Network Model
In this section, we contrast our approach with a recent method introduced in [
51]. This method employs a deep RNN, particularly leveraging the Gated Recurrent Unit (GRU) in a bidirectional configuration. The aforementioned study reports that their bidirectional GRU model, a fusion of RNN and GRU in a bidirectional setting, delivers impressive classification accuracy. To ensure a robust comparison, we utilized their model on our dataset, ensuring both techniques are evaluated using identically preprocessed data. The subsequent figure illustrates the achieved accuracy.
A comparative analysis between these results (
Figure 13) and ours, as depicted in
Figure 12, distinctly demonstrates the superior performance of our method. This enhancement is attributed to the advantageous pattern recognition capabilities of CNNs in our structure. Furthermore, our approach, due to its ensemble structure, exhibits resilience against overfitting—a pervasive issue in machine learning. Conversely, the bidirectional GRU model is predisposed to overfitting and necessitates meticulous tuning for real-world applications.
For a thorough understanding of the comparative performance of all the methods under study, we delineate their sensitivity, specificity, and accuracy, of which their formulations are given by:
We refer to the cases with atrial fibrillation as ‘positive’, denoted by (P), and to those without this condition (normal cases) as ‘negative’, represented by (N). Consequently, TP denotes the number of True Positives, TN represents the number of True Negatives. Additionally, FN signifies the number of False Negatives, and FP stands for the number of False Positives. These metrics provide different insights into the performance of a binary classifier. For instance, Sensitivity shows how well the model identifies positive cases, while Specificity indicates how well the model identifies negative cases. Accuracy provides an overall measure of how often the model is correct, regardless of the class.
Table 1 summarizes the results across all methods applied in this study. As detailed in the table, our proposed approach consistently surpasses other methods in performance metrics. Notably, even when juxtaposed with the bidirectional GRU—a sophisticated and advanced model specifically tailored for such datasets—our method is superior.
Looking ahead, there is a pressing need for future research to focus on advanced noise reduction techniques, specifically tailored for ECG data. Such techniques are vital in filtering out extraneous disturbances, which frequently obscure ECG readings and pose challenges in signal interpretation. Beyond noise reduction, the integration of data augmentation methods stands as a promising avenue. By artificially expanding and diversifying the dataset, these methods enable models to train on a broader spectrum of cardiac patterns. This not only bolsters the model’s predictive accuracy, but also enhances its generalization capabilities across varied and unseen ECG patterns. Moreover, considering the interdisciplinary nature of the problem, collaborations between biomedical engineers, data scientists, and cardiologists could yield novel insights and methodologies. By amalgamating these strategies and fostering collaborative efforts, the path is paved toward significantly elevating the reliability and precision of ECG signal classification. Such advancements will undoubtedly result in systems that are both robust and consistently accurate across diverse clinical and real-world settings.
6. Conclusions
An advanced ensemble methodology has been introduced, specifically crafted for ECG signal classification. By integrating the strengths of LSTM and CNN models, a significant enhancement in classification accuracy has been achieved. The architecture and methodology have been scrupulously detailed. Additionally, the preprocessing steps implemented to transform raw ECG signals into a classification-ready format have been outlined. Frequency analysis was performed on the raw time series, and spectral entropy was calculated during the preprocessing stage, serving as an integral part of the classification process. In the results section, we made a comparison between the LSTM model trained on raw data and the one trained on preprocessed data, as well as our proposed ensemble architecture. This comparison underscores the vital role of data preprocessing and the efficacy of the proposed ensemble approach in enhancing model performance. The statistical outcomes have underscored the superiority of our approach. Our proposed technique has consistently achieved an accuracy exceeding 94%. By integrating this ensemble into wearable devices, it can offer real-time cardiac monitoring and instantaneous alerts for abnormal rhythms. As telehealth gains momentum, such a system could bolster remote patient management and diagnostic accuracy. Strategic collaborations with medical device manufacturers and health-tech startups can further drive innovation, while pilot testing in select healthcare facilities ensures the model’s practical viability. Although our ensemble approach has yielded considerable advancements over the standard LSTM classifier, we acknowledge the perpetual potential for further research and optimization. Given the inherent dynamism and complexity of ECG data, future research should delve into advanced noise reduction techniques to filter out extraneous disturbances that often cloud ECG readings. Additionally, adopting data augmentation methods can enrich the dataset, allowing models to train on varied cardiac patterns, thereby enhancing their generalization capabilities. By amalgamating these strategies, we can significantly elevate the reliability of ECG signal classification, making it more robust and accurate in diverse settings.