1. Introduction
Cardiovascular diseases remain a major threat to human health [
1], constituting 30% of global deaths according to a study by the World Health Organization [
2]. Among these, arrhythmias are a common pathophysiological process within the cardiovascular system, characterized by irregularities in the rhythm and frequency of heartbeats due to disruptions in the conduction of cardiac electrical activity [
3]. Hence, the development of accurate cardiac diagnostics for timely medical intervention is crucial to saving lives [
4,
5].
Currently, electrocardiogram (ECG) analysis stands out as the most direct and effective method for diagnosing cardiac abnormalities [
6]. An ECG records the waveform of surface electrical signals, typically composed of P-waves, QRS complexes, and T-waves [
7]. The electrocardiogram, as a graphical representation of cardiac electrical activity, encompasses a series of crucial mathematical features, spanning both temporal and spatial dimensions. In terms of temporal features, the intervals between P-waves, QRS complexes, and T-waves provide valuable information about the duration of different cardiac phases, aiding in the assessment of cardiac rhythm and stability. The width of the QRS complex becomes a pivotal indicator for evaluating ventricular conduction velocity, playing a vital role in understanding cardiac electrical activity. Concerning spatial features, the ST segment in the ECG reflects the cardiac state between contraction and relaxation, with its slope providing crucial information about myocardial ischemia or injury. The electrical axis of the heart, as a mathematical feature, describes the direction of cardiac signal propagation, while the amplitude and shape of the T-wave delve into deeper aspects of cardiac electrical activity, particularly those related to myocardial status. A thorough analysis of temporal and spatial features provides a profound understanding of cardiac electrical activity, forming the basis for arrhythmia detection and comprehension. Changes in electrophysiological characteristics can alter the propagation patterns, leading to different types of arrhythmias, manifested as noticeable variations in ECG waveform patterns [
8,
9]. While ECG serves as a non-invasive and cost-effective detection tool that is widely used in the clinical diagnosis of heart diseases, the increasing volume of data and complexity of ECG signals have rendered traditional manual analysis methods inadequate. In this context, the rise of deep learning technology presents a new paradigm for the automatic classification of ECG signals, holding the potential for revolutionary changes in clinical medicine.
With the continuous advancement of artificial intelligence technology, deep learning techniques have found widespread applications in the detection and classification of ECG signals, achieving notable success [
10,
11]. However, despite these accomplishments, a comprehensive analysis of numerous studies on ECG signal classification reveals persistent challenges. Notably, existing ECG network models often have large parameter sizes, resulting in a high model complexity that is unfavorable for training and application in resource-constrained environments. Furthermore, the training of these deep learning models may encounter difficulties due to the requirement for substantial annotated data and high computational power. Addressing these challenges is crucial to enhancing the performance and generalization ability of deep learning models in the field of ECG signal classification.
The imbalance in datasets on ECG (electrocardiogram) poses a common challenge for deep learning in arrhythmia classification. An imbalanced dataset refers to a situation where the number of samples in the arrhythmia categories is significantly lower than that in the normal category. Rajesh K. N. and Dhuli R [
12] proposed a method using resampling techniques and an AdaBoost ensemble classifier to address the issue of a sparse number of arrhythmia samples in ECG datasets. Through resampling techniques, they aimed to balance the distribution of samples across different categories and improved the classification accuracy through the AdaBoost ensemble classifier, providing an effective strategy for handling imbalanced datasets. Niu et al. [
13] explored the use of representative notations and convolutional neural networks from various viewpoints for ECG classification. By introducing representative notations, they sought to enhance the sensitivity to individual differences in ECG signals, potentially improving the accuracy of classification. In the study by Sharma et al. [
14], a multiresolution wavelet transformation method was employed for accurate detection of the starting position of heartbeats and QRS waves in ECG signals. Feature extraction from each wavelet segment was implemented for data augmentation, aiming to enhance the adaptability of the model to changes in the QRS waveform morphology.
Current research extensively applies machine learning algorithms to identify arrhythmias from electrocardiogram (ECG) data. Methods include random forests [
15], artificial neural networks [
16], and support vector machines [
17]. However, these traditional machine learning techniques require feature extraction before application, involving the manual extraction of various handcrafted features that influence classification outcomes [
18,
19]. The manual feature extraction process is time-consuming, underutilizes the underlying information in the database, and is prone to overfitting issues [
20]. In traditional ECG signal classification studies, manual feature extraction is essential, involving the manual extraction of signal morphology and the design of feature engineering processes. These methods typically rely on features such as ECG R-R interval changes and waveform morphology [
21], but their adaptability is limited in the presence of dynamic ECG data and significant noise interference. Contrastingly, the use of traditional machine learning algorithms such as SVM and decision trees [
22] performs well on specific sample training and testing sets. However, when faced with a large amount of unknown test data, challenges arise, including the need for manual feature extraction and a poor generalization ability. The introduction of deep learning offers new possibilities for addressing these issues. By automatically learning high-level representations of the data, deep learning is expected to enhance the performance and generalization ability of classification algorithms.
Presently, deep learning has taken significant strides in the field of medical image analysis, particularly in data analysis [
23]. Additionally, some studies explore the application of deep learning to the classification of electrocardiogram (ECG) data. Xia et al. proposed a novel wearable ECG classification system using convolutional neural networks and active learning [
24]. Hannun et al. achieved arrhythmia detection and classification in long-term dynamic ECG that is comparable to the level of cardiac experts using deep neural networks [
25]. In terms of heartbeat classification, Xiang et al. employed a two-level convolutional neural network and RR interval differences for ECG heartbeat classification [
26]. Mathews et al. utilized deep learning for single-lead ECG classification, demonstrating its potential in medical imaging [
27]. Furthermore, Saadatnejad et al. introduced an ECG classification based on long short-term memory networks (LSTM) that is suitable for continuous monitoring on personal wearable devices [
28]. Kiranyaz et al. implemented real-time patient-specific ECG classification using one-dimensional convolutional neural networks [
29]. Tan et al. enhanced the recognition accuracy of coronary artery disease ECG signals by combining convolutional and long short-term memory networks [
30]. He et al. achieved automatic cardiac arrhythmia classification using a combination of deep residual networks and bidirectional LSTM [
31]. Rajpurkar et al. demonstrated cardiologist-level arrhythmia detection with convolutional neural networks [
32]. Cui et al. proposed a deep learning-based multidimensional feature fusion method for the classification of ECG arrhythmia [
33]. Additionally, some studies have focused on the application of deep learning in the automatic interpretation of multiview echocardiograms for congenital heart disease [
34]. These studies emphasize the extensive and promising potential for deep learning in image analysis and the classification of ECG data.
Drawing inspiration from the advancements in electrocardiogram (ECG) classification outlined above, we have developed an ECG classification mechanism employing multiscale convolution and causal attention. This study brings forth the following notable contributions:
We devised a deep learning model that eliminates the necessity for a distinct feature extraction program. Instead, it consistently employs deep learning techniques to extract resilient features from the input ECG signal. This approach permits direct training and classification of the preprocessed ECG signal, thereby curtailing classification expenses.
To address data imbalances, we employed mixed sampling techniques. Most classes underwent downsampling, while a few classes underwent SMOTE oversampling. This strategy balanced the dataset, narrowing the sample size disparity across all five heartbeat categories and heightening classification accuracy.
We introduced a Multiscale Convolutional Causal Attention network for ECG classification. This network leverages multiscale convolution for spatial feature extraction from signals. Furthermore, the causal convolution attention extraction module captures temporal features, culminating in precise ECG signal categorization and enhancing classification performance. This advancement not only elevates accuracy but also streamlines the model’s complexity.
The paper is structured as follows in the subsequent sections:
Section 2 outlines the materials and methods that were utilized and our proposed model. We discuss our experimental setup and results in
Section 3 and conclude with the main findings in
Section 4.
Section 5 summarizes our article.