HRIDM: Hybrid Residual/Inception-Based Deeper Model for Arrhythmia Detection from Large Sets of 12-Lead ECG Recordings

Moqurrab, Syed Atif; Rai, Hari Mohan; Yoo, Joon

doi:10.3390/a17080364

Open AccessArticle

HRIDM: Hybrid Residual/Inception-Based Deeper Model for Arrhythmia Detection from Large Sets of 12-Lead ECG Recordings

by

Syed Atif Moqurrab

,

Hari Mohan Rai

^*

and

Joon Yoo

^*

School of Computing, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si 13120, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Algorithms 2024, 17(8), 364; https://doi.org/10.3390/a17080364

Submission received: 5 July 2024 / Revised: 8 August 2024 / Accepted: 12 August 2024 / Published: 19 August 2024

(This article belongs to the Special Issue Machine Learning Algorithms for Biomedical Image Analysis and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Heart diseases such as cardiovascular and myocardial infarction are the foremost reasons of death in the world. The timely, accurate, and effective prediction of heart diseases is crucial for saving lives. Electrocardiography (ECG) is a primary non-invasive method to identify cardiac abnormalities. However, manual interpretation of ECG recordings for heart disease diagnosis is a time-consuming and inaccurate process. For the accurate and efficient detection of heart diseases from the 12-lead ECG dataset, we have proposed a hybrid residual/inception-based deeper model (HRIDM). In this study, we have utilized ECG datasets from various sources, which are multi-institutional large ECG datasets. The proposed model is trained on 12-lead ECG data from over 10,000 patients. We have compared the proposed model with several state-of-the-art (SOTA) models, such as LeNet-5, AlexNet, VGG-16, ResNet-50, Inception, and LSTM, on the same training and test datasets. To show the effectiveness of the computational efficiency of the proposed model, we have only trained over 20 epochs without GPU support and we achieved an accuracy of 50.87% on the test dataset for 27 categories of heart abnormalities. We found that our proposed model outperformed the previous studies which participated in the official PhysioNet/CinC Challenge 2020 and achieved fourth place as compared with the 41 official ranking teams. The result of this study indicates that the proposed model is an implying new method for predicting heart diseases using 12-lead ECGs.

Keywords:

cardiac abnormality detection; 12-lead ECG; deep learning; large dataset

1. Introduction

Heart diseases are a leading cause of death worldwide. Coronary heart diseases include arrhythmias, prolapsed mitral valves, coronary artery disease, congenital heart disease, congestive heart failure, and many others [1,2]. Various traditional approaches such as blood tests, chest X-rays, and ECGs are used to detect such diseases [3]. ECG is a widely used approach and is recognized as the most effective means of detecting heart problems in the current era [4]. It is a painless procedure to monitor heart health and is used to detect various heart conditions, including arrhythmias and blockages in arteries that cause chest pain or even lead to a heart attack [1,5]. Precise diagnosis of irregular heartbeats through ECG analysis can significantly contribute to the early detection of cardiac illness. Extracting the relevant and significant information from the ECG signals using computer systems poses a considerable challenge. The automatic identification and categorization of cardiac abnormalities has the potential to assist clinicians in diagnosing a growing number of ECGs [6,7]. However, accomplishing this task is highly challenging.

Although various machine learning algorithms have been developed over the past decades for classifying cardiac abnormalities, their testing has often been limited to small or homogeneous datasets [8,9,10,11]. To address this issue, the PhysioNet/CinC Challenge 2020 has provided a substantial number of datasets from diverse sources worldwide. The extensive dataset offers a valuable opportunity to develop automatic systems that are capable of effectively classifying cardiac abnormalities. Previously, several ML techniques have been employed to classify heart diseases based on raw ECG data, reflecting the growing interest in the automated detection of abnormal behavior in ECG signals. Recently, DL has emerged as a valuable tool in biomedical and healthcare systems by learning intricate patterns and features from signal data [11,12,13]. In DL, various deep neural networks such as recurrent neural networks (RNNs) [14], deep residual networks (ResNets) [15], transformers, and attention networks have been developed to classify the diseases. ResNet is one of the popular networks and it handles the complexity of large datasets efficiently with its unique deep layer-based architecture.

Despite of the many advancements in the field of machine learning (ML) and DL for ECG signal analysis, the existing solutions are limited in their ability to accurately detect cardiac abnormalities, especially in many categories. These limitations highlight the need for more robust and generalized DL models that can effectively handle large and diverse datasets. The challenge lies in developing models that not only improve detection accuracy but also enhance computational efficiency and generalizability. In order to address the shortcomings of other existing methods, this work presents the HRIDM (Hybrid Residual/Inception-based Deeper Model), a unique deep learning model. This work aims to enhance the detection efficiency of heart disease using a large 12-lead ECG dataset through DL neural networks. Specifically, this study aims to:

Develop a novel deep learning model for ECG abnormality detection, which could outperform the prior state-of-the-art (SOTA) models, considerably enhancing performance;
Utilize the annotated dataset provided by the PhysioNet/CinC Challenge 2020 to accurately classify the 27 different cardiac abnormalities, demonstrating the model’s capability to handle complex and large-scale datasets effectively;
Train the proposed model on a large 12-lead ECG dataset from over 10,000 patients, to improve its generalizability and robustness;
Illustrate the computational efficiency of the proposed model by achieving high accuracy with limited resources, such as minimal training epochs and no GPU support;
Benchmark the HRIDM against several SOTA models, including Inception, LeNet-5, AlexNet, VGG-16, ResNet-50, and LSTM, to validate its performance.

2. Literature Review

Numerous studies have been carried out on the application of ML/DL for the analysis of ECG signals [3,11] and the detection of cardiac arrhythmias [16,17]. In [18], authors recommended a Residual CNN-GRU-based deep learning model with an attention mechanism for classifying cardiac disorders. In this work, they utilized 24 groups for classification out of 27. The proposed approach attained a test accuracy of 12.2% and obtained 30th position among 41 teams in the official ranking. The authors of [19] introduced a modified ResNet model that includes numerous basic blocks and four modified residual blocks. The updated model is a 1D-CNN that supports attention across feature maps. The authors employed fine-tuning on the pre-trained model and their proposed algorithm attained a test accuracy of 20.8%. The authors of [20] presented a 1D-CNN with global skip connections to classify ECG signals from 12-lead ECG recordings into numerous classes. The authors have also utilized many preprocessing and learning methods such as a customized loss function, Bayesian threshold optimization, and a dedicated classification layer. Their implemented technique produced an accuracy of 20.2% on the test dataset. The authors of [21] introduced an SE-ResNet model with 34 layers in the DL model to classify cardiac arrhythmias. They assigned different weights to different classes based on their similarity and utilized these weights in metric calculations. The utilized DL model attained a validation accuracy of 65.3% and test accuracy of 35.9%. In [22], the authors presented a hybrid Recurrent Convolutional Neural Network (CRNN) with 49 1D convolutional layers, along with 16 skip connections and one Bi-LSTM layer to detect heart abnormalities using a 12-Lead ECG signal. Utilizing the proposed model with 10-fold cross validation and without preprocessing, the authors achieved 62.3% validation accuracy, and 38.2% test accuracy. The authors of [23] utilized an SE-ECGNet to detect arrhythmias from 12-lead ECG recordings. In their model design, they utilized squeeze-and-excitation networks in each model path and an attention mechanism which learns feature weights corresponding to loss. In this study, the authors achieved 64.0% validation accuracy and 41.1% test accuracy using the proposed model. In [24], the authors presented a deep 1D-CNN model consisting of exponentially dilated causal convolutions in its structure. Their proposed model achieved a challenge score of 0.565 ± 0.005 and an AU-ROC of 0.939 ± 0.004 using 10-fold cross validation techniques. In this work, they achieved 41.7% accuracy on the test dataset utilizing the proposed model. In [25], the authors introduced an 18-layer residual CNN for arrhythmia classification which has four stages for abnormality classification. The authors have also utilized preprocessing techniques, 10-fold cross validation, and post-training procedure refinement. The proposed approach achieved 69.5% validation score and 42% test accuracy. The authors of [26] utilized a hybrid DL model by integrating a CNN with LSTM along with adversarial domain generalization for the detection of arrythmias from 12-lead ECG signals. In this study, the proposed model obtained 43.7% accuracy on the test dataset. In [27], the authors presented a method for ECG signal classification in which they utilized scatter transform in combination with deep residual networks (ResNets). In their study, the authors obtained 48% test accuracy utilizing their proposed methodology. In [28], the authors designed an SE-ResNet-based DL model which is a variant of the ResNet architecture. In their model design, SE blocks are utilized to learn from the first 10 and 30 s segments of ECG signals. The authors also utilized an external open-source dataset for model validation. To correct and verify the output, they developed a rule-based bradycardia model based on clinical knowledge. Utilizing the proposed approach, the authors detected heart arrhythmias from 12-lead ECG recordings and obtained a validation accuracy of 68.2% and a testing accuracy of 51.4%. The authors of [29] introduced a DL method for the classification of arrhythmias utilizing 12-lead ECG signals. In their proposed approach, the authors presented a modified ResNet with SE blocks. Additionally, they applied zero padding to extend the signal to 4096 samples and downsampled them to 257 Hz. Utilizing custom weighted accuracy measure and 5-fold cross validation, they obtained a validation accuracy of 68.4%, and a test accuracy of 52%.

In [30], the authors proposed a novel approach using a Wide and Deep Transformer Neural Network for the detection of cardiac abnormalities utilizing 12-lead ECG recordings. In their methodology, they combine two features: transformer neural network features and random forest handcrafted ECG features. The utilized approach achieved an impressive accuracy of 58.7% on the validation dataset and 53.3% on the test dataset.

3. Materials and Methods

In this paper, we developed a new model (HRIDM) for classifying ECG signal abnormalities that integrates the strength of an inception network with residual blocks. The deep inception network can learn complex features from the dataset, whereas residual blocks improve model accuracy by resolving the problem of vanishing gradients. We validated the proposed model using a dataset of 12-lead ECG signals from patients with a range of cardiac disorders. We evaluated our model’s performance against a variety of SOTA models, including LeNet, AlexNet, VGG, LSTM, ResNet, and Inception. All the models were trained on the PhysioNet/CinC Challenge 2020 dataset and tested using the independent test dataset provided by the organizers. The reported accuracy was achieved through our own testing and validated using the validation techniques provided by the PhysioNet/CinC Challenge 2020 organizers. We observed that our model outperformed DNNs and the models outlined in previous research. The improved outcomes indicate that the proposed model is a novel and promising approach to classifying ECG data in order to identify cardiac anomalies.

3.1. Datasets

The study’s dataset, which included recordings, diagnostic data, and demographic information, is collected from several open-source databases freely available to download (Table 1). To generate this large dataset, five different sources were used. All the datasets contain 12-lead ECG recordings where the sample frequencies ranges from 257 Hz to 1 KHz. The datasets also include the demographic information such as age, sex, and types of diagnosis. There are 27 categories of ECG classes (diagnosis) which are presented along with SNOMED CT codes (Systematized Nomenclature of Medicine Clinical Terms). The following subsections detail the specific sources comprising the dataset.

CPSC Database (CPSC2018): The initial source of the ECG data is the China Physiological Signal Challenge 2018 [31], which contains 13,256 ECG recordings and 9458 patients;
INCART Database: The second source of the ECG data is the 12-lead ECG arrhythmias dataset, which is an open-source, publicly available dataset from the St. Petersburg Institute of Cardiological Technics (INCART), St. Petersburg, Russia [32]. This dataset has only 74 recordings and was contributed by 32 patients;
PTB and PTB-XL Database: The third dataset is a combination of two databases (PTB and PTB-XL), which contains 22,353 ECG recordings and was contributed by 19,175 patients;
Georgia 12-lead ECG Challenge (G12EC) Database: The fourth ECG dataset is also a 12-lead ECG dataset which was made available by the Emory University, Atlanta, Georgia, USA [32]. This dataset was collected from 15,742 patients and it contains 20,678 ECG recordings;
Undisclosed: This dataset was only used for testing the model performance in the challenge. The source of the dataset is an undisclosed American institution, and this dataset is completely different from the other datasets. This dataset has never been posted or disclosed publicly and will not be disclosed in future either. It contains 10,000 recordings, the number of patients is unknown, and no training and validation sets are used from this dataset.

Detailed information about the datasets is provided in Appendix A, specifically in Table A2. This table includes the number of recordings, mean duration of recordings, mean age of patients, sex of patients, and the sampling frequency for each dataset included in the PhysioNet/CinC Challenge 2020 dataset [33].

The ECG dataset used in this study was obtained from the PhysioNet/CinC Challenge 2020 database [32,33]. The dataset consists of 66,361 ECG recordings, of which 43,101 are for training, 6630 for validation, and 16,630 for testing. To train the model efficiently, we utilized the 43,101 training recordings and split them into training and validation sets in a 90:10 ratio, resulting in 38,790 training samples and 4311 validation samples. To ensure rigorous model evaluation and prevent data leakage, we opted to partition the provided training dataset (excluding the designated validation set of 6630 recordings) into separate training and validation subsets. This approach allowed for robust model training and hyperparameter tuning without compromising evaluation integrity. For testing, we used an undisclosed hidden dataset of 10,000 recordings, which has different sampling frequencies.

Figure 1 shows the distribution of ECG signal lengths in the dataset. The figure shows that 95% of the ECG signals in the dataset have a length of 5000 samples. The remaining 5% of the ECG signals have lengths that range from 5500 to 115,200 samples.

Figure 2 visualizes the distribution of cardiac abnormalities in the utilized datasets. The vertical axis represents the number of abnormalities, and the horizontal axis lists the names of the 27 classes in abbreviated form. The abbreviations used, corresponding to the Diagnosis and SNOMED CT codes, are provided in Appendix A, Table A1. It is worth noting that “Sinus Rhythm (SNR)” appears as the most frequent abnormality in the graph.

3.2. Data Preprocessing

Our hybrid residual/inception-based deep model (HRIDM) machine learning strategy for classifying ECG signals focuses on various preprocessing approaches. Our ECG dataset from PhysioNet is large and diverse, with recordings of different lengths and sizes (42,720). Hence, preprocessing is essential to prepare the data for useful analysis. We use Algorithm 1’s multi-step preprocessing approach to make sure the data are standardized and appropriate for use in machine learning models. In order to provide effective training, this iterative method serves as a generator function, continuously producing batches of features and labels.

Algorithm 1: Data preprocessing algorithm utilized for ECG data.

Initialization:

{g e n_{x} :

Generator for features,

g e n_{y} :

Generator for labels,

X_{t r a i n} :

Status result,

y_{t r a i n}

: Database

}

Input:

{g e n_{x}, g e n_{y}, X_{t r a i n}, y_{t r a i n}}

Output:

{X_{b a t c h}, y_{b a t c h}}

#Step 1: Initialize Parameters and Data Structures

Set

b a t c h_s i z e

Create

o r d e r_a r r a y

#Step 2: Generate Batches

While True :

Initialize empty arrays for

b a t c h_f e a t u r e s, b a t c h_l a b e l s

For each batch :

For i in range b a t c h_s i z e :

b a t c h_f e a t u r e s [i] = n e x t (g e n_{x})

b a t c h_l a b e l s [i] = n e x t (g e n_{y})

X_{n o r m a l i z e d} = \frac{b a t c h_f e a t u r e s - {\bar{X}}_{b a t c h_f e a t u r e s}}{σ_{b a t c h_f e a t u r e s}}

Y i e l d (X_{n o r m a l i z e d}, b a t c h_l a b e l s)

#Step 3: Shuffle Labels

While True :

For i in o r d e r_a r r a y :)

Y i e l d s h u f f l e d l a b e l s : (y_{s h u f f l e d} = y_{t r a i n} [i]

#Step 4: Preprocess Features

While True:

For i i n o r d e r_a r r a y :

Load and preprocess feature data:

d a t a, h e a d e r_d a t a = l o a d_d a t a (X_{t r a i n} [i])

X_{t r a i n_p a d d e d} = pad_sequences (d a t a, m a x l e n = 5000, t r u n c a t i n g =^{’} p o s t^{’}, p a d d i n g =^{’} p o s t^{’})

X_{t r a i n_r e s h a p e d} = X_{t r a i n_p a d d e d} . reshape (5000,12))

X_{t r a i n_r e s h a p e d_n o r m a l i z e d} = \frac{X_{t r a i n_r e s h a p e d} - {\bar{X}}_{t r a i n_r e s h a p e d}}{σ_{t r a i n_r e s h a p e d}})

Y i e l d (X_{t r a i n_n e w_n o r m a l i z e d})

The basic idea behind Algorithm 1 is the designation of a batch size, which establishes how many data points are processed simultaneously during training. Additionally, in order to prevent the model gaining biases from the original recording sequence, we shuffle the order of the training data points. Several generator functions retrieve features and labels for each data point inside each batch, most frequently by gaining access to external data sources. On the obtained features, we apply normalization to compensate for possible differences in signal strength between recordings. By scaling the features to a particular range (often between 0 and 1) based on the mean and standard deviation of the current batch, this normalization makes sure that each feature contributes equally during the training process [34]. Another loop shuffles the labels in the training set while batch generation is taking place. By preventing the model from picking up possible dependencies based on the initial label order, this step eventually enhances the model’s capacity to generalize to unseen inputs.

Preparing each individual feature is an additional vital component of preprocessing; this is achieved via an additional loop that iterates over the shuffled order array. Here, we use the current index in the order array to access a particular training data point, which is a raw ECG signal [35,36]. Using zeros as padding ensures that all of the input data points have the same format. This is required if the duration of the recovered ECG signal is less than the required input size, which is usually 5000 samples. The data are then rearranged into a two-dimensional structure with 12 columns (representing the 12 ECG leads) and 5000 rows (representing time samples) after padding. Through this reshaping, the one-dimensional data are effectively transformed into a format that allows the machine learning model to consider each ECG lead as an independent channel.

In the last stage of preprocessing, we have utilized the ECG data to normalize the whole training set using the pre-calculated mean

\bar{X}

and standard deviation

σ

. The normalization of ECG data is given by Equation (1) [37].

X_{n o r m a l i z e d} = \frac{b a t c h_f e a t u r e s - {\bar{X}}_{b a t c h_f e a t u r e s}}{σ_{b a t c h_f e a t u r e s}}

(1)

As a result, the model learns more robustly, and all characteristics are normalized on an identical scale across the training phase [38,39]. We produce batches of preprocessed features and shuffled labels continually by executing these processes recursively within the generator function. The machine learning model is now able to train on the information we have provided for accurate ECG signal classification in medical applications with ease due to our carefully developed preprocessing steps, which successfully tackles the issues posed by our sizable and varied ECG dataset. The preprocessing technique utilized in this work is presented in the form of a flow chart in Figure 3, and Figure 4 presents the segmented and preprocessed 12-lead ECG signals.

3.3. SOTA Models

The SOTA models utilized to validate the proposed methodologies and the proposed model (HRIDM) are LeNet-5, AlexNet, VGG16, ResNet50, Inception, and LSTM. These are prominent and commonly utilized models for various tasks, especially signal and image classification.

LeNet-5 was the first basic convolutional neural network (CNN) model introduced in 1998 by Yann LeCun et al. [40]. It consists of seven layers: three Convolution (Conv) layers, two pooling layers (average pooling), and two fully connected layers (FC) along with sigmoid or tanh activation functions. This was the first CNN model successfully trained on the MNSIT dataset for a digit recognition task. However, due to its structure containing fewer layers, it is not suitable for more complex tasks.

AlexNet was first introduced in 2012 by Alex Krizhevsky et al. [41], which introduced the ReLU activation function and used dropout layer to overcome overfitting [42]. AlexNet consists of eight layers: five Conv layers, among which the first, second, and fifth are followed by max-pooling layers, and three fully connected layers. All layers use ReLU activations except the output layer, which included the SoftMax activation function. To capture hierarchies in the data, this model makes use of the filters’ depth and stride. However, because it has an extensive number of parameters, computation costs are high.

VGG16 is a prominent deep CNN model introduced in 2014 by the Visual Geometry Group [43] at Oxford University. It consists of 16 layers: 13 Convolutional layers and three fully connected layers. VGG16 utilizes small 3 × 3 filters in Conv layers throughout its structure, and max-pooling layers are applied after some of the Conv layers to downsample the feature maps. Due to its deeper architecture, it is very effective in capturing fine details of the features and capable of performing more complex tasks effectively. Its specialty lies in its simple and uniform structure, which is easy to design and extend. However, its limitation is the slow training time due to large parameters and depth.

The ResNet50 model, developed in 2015 by Kaiming He et al. [44], which introduced the concept of the residual block, contains 50 layers and is capable of resolving the vanishing gradient problems in deeper networks. The residual block is mathematically defined as

(y = f (x) + x)

, where

f (x)

is the CNN output within the block and

x

is the input. This model introduced the concept of skip connections and allows the gradient to pass directly through these connections. It also introduced bottleneck design in its structure of the residual blocks consisting of 1 × 1, 3 × 3, and 1 × 1 Conv layers. The 50 layers in the ResNet50 model with residual blocks enable the capture of more complex patterns. However, its limitation is the high computational cost due to its complex and deep structure.

The Inception model is a deep CNN model that introduced the concept of inception modules to enhance the efficiency and accuracy of DL models. The first version, Inception v3, was introduced in 2015 by Szegedy et al. [45], and uses parallel Conv layers within the same modules to capture very fine details at different levels. Inception v3 has 48 layers with inception modules which consist of multiple Conv layers with different parallel filters (1 × 1, 3 × 3, 5 × 5) along with max-pooling layers. It is capable of capturing very complex patterns with a smaller number of patterns as compared to similar DL models.

LSTM (Long Short-Term Memory) networks, introduced in 1997 by Hochreiter and Schmidhuber [46], are especially designed for sequential data patterns. LSTM was introduced to resolve the problem of short-term memory by incorporating gates and states. An LSTM network consists of many cells, including a cell state

(c_{t})

and a hidden state

(h_{t}),

as well as gates such as the input gate

(i_{t}),

forget gate

(f_{t}),

and output

(o_{t})

gate. LSTM has the ability to capture long-term dependencies from sequential data, making it highly suitable for time series tasks and language modeling.

3.4. Proposed Model (HRIDM)

The aim of this research was to determine the most effective algorithm on the utilized dataset. Figure 5 depicts the proposed model and comprehensive methodology employed. The proposed HRIDM consists of three main sections. The first section serves as the input layer, incorporating multiple convolutional, residual, and inception blocks to extract the primary and fine features from the data, and is also responsible for producing output. We integrated residual blocks with inception blocks in our proposed model because this combination leverages the strength of both types of blocks, enhancing the overall performance of the model for arrhythmia detection in 12-lead ECG recordings. The residual blocks address the problem of vanishing gradient problem and perform deeper training through skip connections which allows the models to learn complex features in an effective manner whereas the inception blocks capture the multiscale fine and detail features because of the parallel Conv blocks with varying filter sizes. The fusion of both techniques provides the powerful structure of the DL model, enabling it to efficiently and effectively learn the diverse features and enhancing its ability to discriminate between different arrhythmia types, resulting in improved accuracy and robustness compared to other models.

The second section is connected with the first and further refines the extracted features, subsequently concatenating them. The third section is connected with both preceding sections (first and second) and combines the concatenated features. The output of the final section is given to the dense layer of the first section to produce the desired output. A detailed description of each section and its constituent blocks follows. The first section of the proposed model consists of the following layers:

The proposed model is a deep learning model that consists of the following layers:

1D convolutional (Conv) layers: In our proposed model, we have utilized multiple 1D-Convolution (Conv) layers for extracting high level features from the provided dataset. The first 1D-Conv layer employs 512 kernels, each of size 5 × 5, to learn informative patterns from the input data. The second 1D-Conv layer includes 256 kernels of size 3 × 3, further refining the extracted features. Following each 1D-Conv layer, we incorporate batch normalization (batchNorm) to improve training stability and accelerate convergence. Further, ReLU activation functions are included to introduce non-linearity and improve the model’s ability to learn complex relationships within the data. The convolutional layer computes the output $Output [i, j, k]$ at spatial position $(i, j)$ and output channel $k$ as given in Equation (2):

$Output [i, j, k] = \sum_{f, g, h} (Filter [f, g, h] \times Input [i + f, j + g, h]) + Bias [k]$

(2)
1D max-pooling (MaxPool) layer: This layer is utilized to downsample the data while preserving prominent features. The 1D-maxPool layer employs a 3 × 3 size filter with stride 2, and it computes the output $Output [i]$ at position $i$ as given in Equation (3):

$Output [i] = \max (Input [j \times stride : (j \times stride) + pool_size - 1])$

(3)
Residual block: This block is used to address the vanishing gradient problem and facilitate weight transfer. The residual block consists of three stacks, each comprising a 1D-Conv layer, batchNorm, and Leaky ReLU activation function with an alpha value of 1 × 10⁻², which are given by Equations (4)–(6), respectively:

$Y_{conv} = σ (W \times X + b)$

(4)

$Y_{bn} = γ (\frac{Y_{conv} - μ_{B}}{σ_{B}}) + β$

(5)

$Y_{out} = \max (α \cdot Y_{bn}, Y_{bn})$

(6)

where $W$ is the filter weights, $X$ is the input data, $b$ is the bias term, and $σ$ is the activation function (Leaky ReLU). $γ$ and $β$ are the scaling and shifting parameters, $μ_{B}$ and $σ_{B}$ are the batch mean and standard deviation. $α$ is the leakiness factor (0.01 in this case).

The convolutional layer sizes for each stack are 128, 128, and 256, respectively, with a kernel size of 1 × 1, as shown by Equations (7)–(9) for each stack, respectively.

(1): First Stack:

Y_{s t a c k 1} = \max (α \cdot (γ_{1} (\frac{σ (W_{1} \times X + b_{1}) - μ_{B 1}}{σ_{B 1}}) + β_{1}), γ_{1} (\frac{σ (W_{1} \times X + b_{1}) - μ_{B 1}}{σ_{B 1}}) + β_{1})

(7)

(2): Second Stack:

Y_{s t a c k 2} = \max (α \cdot (γ_{2} (\frac{σ (W_{2} \times Y_{1} + b_{2}) - μ_{B 2}}{σ_{B 2}}) + β_{2}), γ_{2} (\frac{σ (W_{2} \times Y_{1} + b_{2}) - μ_{B 2}}{σ_{B 2}}) + β_{2})

(8)

(3): Third Stack:

Y_{s t a c k 3} = \max (α \cdot (γ_{3} (\frac{σ (W_{3} \times Y_{2} + b_{3}) - μ_{B 3}}{σ_{B 3}}) + β_{3}), γ_{3} (\frac{σ (W_{3} \times Y_{2} + b_{3}) - μ_{B 3}}{σ_{B 3}}) + β_{3})

(9)

To maintain weight preservation, an additional convolutional layer with 256 filters and batch normalization is incorporated into the skip connection, linking it with the output of the third stack in the residual block, which is shown by Equations (10) and (11):

skip = {Conv 1 D}_{256} (X)

(10)

F (X) = Y_{s t a c k 3} + {Conv 1 D}_{256} (X)

(11)

Inception block: This block is used to extract further low-dimensional features. The inception block involves stacks of 1D convolutional layers, followed by batch normalization and Leaky ReLU activation with an alpha value of 1 × 10⁻². Each stack utilizes 64 filters, with kernel sizes of 1, 3, and 5.

(4): Kernel Size 1:

Y_{out 1} = \max (α \cdot γ_{1}^{(1)} (\frac{σ (W_{1}^{(1)} \times X + b_{1}^{(1)}) - μ_{B 1}^{(1)}}{σ_{B 1}^{(1)}}) + β_{1}^{(1)}, σ (W_{1}^{(1)} \times X + b_{1}^{(1)}))

(12)

(5): Kernel Size 3:

Y_{out 3} = \max (α \cdot γ_{3}^{(1)} (\frac{σ (W_{3}^{(1)} \times X + b_{3}^{(1)}) - μ_{B 3}^{(1)}}{σ_{B 3}^{(1)}}) + β_{3}^{(1)}, σ (W_{3}^{(1)} \times X + b_{3}^{(1)}))

(13)

(6): Kernel Size 5:

Y_{out 5} = \max (α \cdot γ_{5}^{(1)} (\frac{σ (W_{5}^{(1)} \times X + b_{5}^{(1)}) - μ_{B 5}^{(1)}}{σ_{B 5}^{(1)}}) + β_{5}^{(1)}, σ (W_{5}^{(1)} \times X + b_{5}^{(1)}))

(14)

Equations (12)–(14) illustrate how each stack within the inception block handles the input data

(X)

. The max operation integrates the Leaky ReLU output with a scaled and shifted variant to ensure non-linearity and extraction of features across various receptive fields (kernel sizes). The second and third sections contain almost identical layers, a repetitive structure of convolution, batch normalization, and Leaky ReLU to progressively extract increasingly detailed and refined features. The second section combines by concatenating the extracted features from its blocks with those from the first section. The third section follows a similar set of layers but incorporates skip connections to facilitate the flow of information across layers.

Convolutional blocks: These blocks are used to capture complex patterns within the data. Each convolutional block consists of a 1D-Conv layer, batchNorm, and Leaky ReLU activation with an alpha value of 1 × 10⁻². The first convolutional block uses 128 filters, a filter size of 5 × 5, and a stride of 1 × 1, complemented by instance normalization and parametric ReLU activation. A dropout layer with a rate of 20% and a 1D max-pooling layer with a filter size of 2 × 2 was added. The second convolutional block is similar to the first, except for the filter count in the convolutional layer. It employs 256 filters of size 11 × 11 and the third convolutional block omits the 1D pooling layer and utilizes a Conv layer with 512 filters of size 21 × 21;
1D global average pooling (Global Avg. Pool) layer: This layer is utilized mainly for reducing the dimensionality of the feature data, presented by Equation (15).

Y_{global_avg_pool} = \frac{1}{N} \sum_{i = 1}^{N} X_{i}

(15)

where feature numbers are represented by

N

and

X_{i}

are the input features;

Dense layer: This layer is used to classify the data. The dense layer has 27 neurons, and each neuron is activated using a softmax function, the output probabilities being given by Equation (16):

\begin{matrix} Z_{k} = \sum_{i = 1}^{N} W_{k i} \cdot X_{i} + b_{k} \\ Y_{k} = \frac{e^{Z_{k}}}{\sum_{j = 1}^{27} e^{Z_{j}}} for k = 1, 2, \dots, 27 \end{matrix}

(16)

where

Z_{k}

is the class logit,

W_{k i}

is the weight from input

i

to output

k, X_{i}

is input,

b_{k}

is the bias for

k, Y_{k}

is the softmax output ensuring probabilities sum to 1 for predictions.

The proposed (HRIDM) model is an effective method for the classification of time series data. The utilized model is capable of extracting high-level features from the data, and it is able to capture complex patterns within the data. The model is also able to generalize well to new data.

3.5. Activation Functions

The activation functions also have a very essential responsibility in designing deep learning models. The selection of the activation function depends upon the types of input data and the category of classifications. In this work, we utilized Leaky ReLU and ReLU activation functions, and we compared these with the most commonly used activation functions, sigmoid and tanh, as visualized in Figure 6.

Leaky ReLU: We employed the Leaky ReLU activation function for the present study, which offers strong benefits for ECG classification. It is important to analyze several activation functions that are particular to our task and dataset. Negative values in ECG signals are frequently indicative of certain types of cardiac activity. Leaky ReLU ensures that neurons continue to contribute to learning features from the data by keeping them from going into inactive states as a result of these negative inputs. Leaky ReLU is computationally more efficient than tanh and sigmoid, which is advantageous for training deeper and bigger neural networks using ECG data [47,48]. Leaky ReLU, in contrast to ReLU, keeps a little non-zero gradient for negative inputs. This feature might be useful in applications where it is important to detect even minute deviations from normal cardiac rhythm in order to capture minor changes in ECG patterns. Leaky ReLU can be mathematically expressed by Equation (17) [48]:

f (x) = \{\begin{matrix} x, & x \geq 0 \\ α x, & o t h e r w i s e \end{matrix}

(17)

ReLU (Rectified Linear Unit): This is simple and effective in terms of computing; it promotes sparsity by generating a large number of zero outputs. In comparison with sigmoid and tanh, it allows models to converge more quickly during training. But it has the “Dying ReLU” issue, which prevents learning by allowing neurons with negative inputs to be stuck at zero indefinitely. The mathematical definition of the ReLU activation function is given by Equation (18) [48,49]:

f (x) = \{\begin{matrix} x, & x \geq 0 \\ 0, & x < 0 \end{matrix}

(18)

Sigmoid: The sigmoid function reduces input values to the range [0, 1]. It is especially effective for binary classification problems that require probabilistic outputs. The function has a smooth gradient over its full range, allowing for effective gradient-based optimization during training. The sigmoid activation may be mathematically described as illustrated in Equation (19) [49]:

f (x) = \frac{1}{1 + e^{- x}}

(19)

Tanh (Hyperbolic Tangent): The tanh function converts input values to the range [−1, 1]. Similar to the sigmoid function, its output is zero-centered, which can help neural networks converge. The tanh activation function is mathematically stated as follows in Equation (20) [48]:

f (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

(20)

3.6. Evaluation Metrics

This section presents an overview of our proposed model and approach for identifying anomalies in ECG data. To understand the functioning of our model, we extract characteristics and employ several metrics that provide insights into its performance. In this research, we utilized the following metrics:

Accuracy (Acc) = \frac{T P + T N}{T P + T N + F P + F N} = \frac{T P + T N}{T o t a l N o . o f E C G s a m p l e s}

(21)

Precision (Pre) = \frac{T P}{T P + F P}

(22)

Recall (Rec) = \frac{T P}{T P + F N}

(23)

F 1 - S c o r e = \frac{2 \cdot (P r e \cdot R e c)}{(P r e + R e c)}

(24)

The Area Under the Receiver Operating Characteristic Curve (AUC) measure is used to see the model’s performance graphically across different classification levels. An increased AUC score (closer to 1) denotes better performance. Plotting recall versus specificity at different threshold values is what the ROC curve does. When comparing a model’s performance to a single criterion (such as accuracy), AUC offers a more thorough assessment.

The total number of ECG samples in the test data is equal to the sum of the number of positive and negative samples (

T P + T N + F P + F N)

. The confusion matrix is the foundation for assessing classification models. For a specific task, this matrix carefully counts the number of accurate identifications (TN, True negatives; TP, True positives) and inaccurate detections (FN, False negatives; FP, False positives). When there are significant impacts for missing positive cases, the confusion matrix becomes very important [50]. A high recall value implies that, even at cost of a higher level of false positives, the model reduces the possibility of missing significant instances. This trade-off is important, especially when the value of incorrectly detecting an instance that is negative is greater than the cost of missing a positive one [38,51].

4. Results

This section provides a detailed outline of the experiments performed and their corresponding outcome. The experimental setup involved implementing the experiments primarily using Python, with most of the computations performed on computer system with 16 GB Ram, utilizing a Tesla T4 GPU. To optimize model performance, we utilized a dataset comprising 43,101 training samples. This dataset was then divided into 90% training (38,790 samples) and 10% validation (4311 samples) sets to facilitate model training and evaluation. For testing, two categories of datasets were provided: 6630 known and disclosed records, and 10,000 unknown and undisclosed records, totaling 16,630. However, for this work, we utilized the undisclosed hidden dataset of 10,000 recordings provided by the CinC 2020 organizers to ensure unbiased testing of the proposed model. Throughout the experiment, a number of libraries were used: for data visualization, Matplotlib, Seaborn, and Ecg-plot; for data processing, NumPy and Pandas; and for modeling, TensorFlow and Keras. For the assessment of the models, Sklearn was used, along with other libraries like SciPy and WFDB. The following hyperparameters were used for training all the DL models including the proposed model: an Adam optimizer, a batch size of 32, min_delta: 0.0001, a dropout rate of 0.2, and a filter size of 5 × 5. A learning rate decay mechanism was used as a callback function depending on the AUC score during the model’s training. The learning rate was degraded by multiplying it by 0.1 in the optimizer if the AUC score did not show improvement every epoch, suggesting a lack of convergence.

We showcased the experimental outcomes derived from the models employed in this investigation. LeNet, AlexNet, VGG-16, ResNet-50, Inception, and LSTM are among the models. To ensure a fair comparison with the proposed HRIDM model, all SOTA models were trained from scratch on the same dataset rather than utilizing pre-trained weights. This approach allowed for a direct evaluation of model performance under identical conditions. The experimental setup previously described was used for training and evaluating these models on the dataset. The outcomes offer valuable perspectives on how well each model performs in appropriately classifying cardiac disorders. To evaluate the models’ performance, evaluation criteria including accuracy, precision, recall, and F1-score were used. To further illustrate the trade-off between the true positive rate and false positive rate for various categorization levels, ROC curves were calculated. The experimental results provide insight into how well each model detects and classifies different heart problems. In order to satisfy the research objectives, we analyzed each model’s performance, compared the outcomes, and expressed the advantages and disadvantages of each.

4.1. LeNet-5

The first model we trained on our train and validation dataset was LeNet-5. Figure 2 displays the model’s performance as training history curves. Figure 7a illustrates the accuracy, precision, and AUC metrics for both the training and validation datasets. Figure 7b depicts the loss curve of the model. The training curves show that the LeNet-5 model obtained an accuracy of roughly 70% on the training data but only 60% on the validation data. A similar tendency may be seen using the AUC score. On both datasets, the precision score remained stable at roughly 50%. We confined the training iterations to 20 epochs in order to observe the model’s learning curve. This confined training undoubtedly led to the model’s underfitting, as seen by the declining precision near the end of the epochs. Furthermore, the loss curve exhibits a rather smooth fall, with values of about 20% for both training and validation data. While a falling loss curve is ideal, the absence of considerable decrease in this case implies that the model is not efficiently reducing the cost function.

4.2. AlexNet

In our second experiment, we examined the AlexNet model’s training history with a particular emphasis on AUC, accuracy, loss, and precision (Figure 8). In comparison to the LeNet-5 model, AlexNet performed noticeably better. Each epoch exhibited a steady decline in the training and validation loss curves, suggesting good convergence towards the minimal loss value. Furthermore, the model consistently produced a precision of around 60% for both sets of data, and it demonstrated a stable and impressive accuracy of about 80% on both training and validation sets. Promising AUC values of around 70% during training are also shown in Figure 8a.

The training loss curve in Figure 8b, on the other hand, does not demonstrate the same smooth drop as the validation loss curve, indicating probable overfitting. This mismatch suggests that the model is remembering the training data rather than generalizing effectively to unknown data. Furthermore, the dataset’s high volatility and imbalanced classes continue to be a concern. Addressing these data restrictions may improve the AlexNet model’s performance and capacity to handle classification challenges successfully.

4.3. VGG-16

We trained the VGG-16 model on both the training and validation datasets in our third experiment, which evaluated the model. The model’s accuracy, precision, AUC, and loss training history curves are displayed in Figure 9. With accuracy and AUC averaging 80% throughout the training process, VGG-16 produced very smooth training curves. The accuracy curve, however, stayed mostly stable. Despite the lack of gain in precision, VGG-16 outperformed preceding models. The model exhibited not only consistent and noteworthy accuracy, but also outstanding accuracy of over 80% for both training and validation data. These figures demonstrate that both accuracy and AUC were around 80% throughout training (Figure 9a).

While the loss curve (Figure 9b) shows a smooth decline for both training and validation data, more research into potential overfitting is needed. Similar to the preceding models, the existence of high data volatility and class imbalance is likely to impede optimal performance. Addressing these data concerns might improve the VGG-16 model’s ability to handle classification tasks.

4.4. ResNet-50

In this subsection, we examine the accuracy and loss observations during the training of the ResNet-50 model, as depicted in Figure 10. The loss plot reveals that the ResNet model exhibited better performance compared to previous approaches when applied to validation data. However, it is important to note that the model faced challenges in minimizing the parameters from the start, resulting in slow learning. This was likely due to the large size of the model, which had 23.5 million parameters. The dataset contained 27 classes, with one particular class having a disproportionately large amount of training data. This imbalance led to misclassifications, as the model erroneously classified each validation data point as belonging to that specific class. Nevertheless, the training plot demonstrates that the model made progress over the course of several epochs.

Despite constraints, the model demonstrated progress over epochs during training, achieved accuracy value around 85% and the AUC around 81%. Nonetheless, precision fluctuated and stayed at a low level (around 50%). This shows that, perhaps as a result of the data variation, the model has difficulties correctly predicting some classes. These metrics suggest that the model was able to learn to distinguish between different classes to some extent. However, there is still room for improvement, as the model’s performance suffered due to the significant variance present in the data. Overall, the model exhibited misclassifications, highlighting the need for further refinement.

4.5. Inception Network

In the fifth experiment, we trained the Inception model and plotted the training history in terms of accuracy, precision, AUC, and loss, as shown in Figure 11. These plots provide a comprehensive overview of the recorded training metrics for each epoch. The accuracy plot reveals that the model exhibited commendable performance during both the training and validation phases, with accuracy scores surpassing 95%. Notably, its precision on the validation and training sets fluctuated due to the influence of the learning rate. However, it is worth mentioning that the model’s training process was not optimal, as indicated by the similar performance observed on the validation set. This phenomenon can be attributed to the excessive layering of the architecture, leading to a diminished number of features in the output feature matrix. Consequently, despite employing learning decay, there was a spike in the loss for the validation data, indicating a lack of convergence compared to the gradual convergence observed for the training curves (Figure 11a).

While the precision of validation reached 74% and validation AUC exceeded 70%, these measures did not show significant improvements on the training data, adding to the likelihood of overfitting. The positive aspect is that the loss curves (Figure 11b) for both training and validation data converged smoothly, confirming the model’s overall learning capability. In this case, the precision score on the training dataset showed a good score. These metrics suggest that the model was able to learn to distinguish between different classes to some extent. However, there is still room for improvement, as the model’s performance suffered due to the significant variance present in the data. Overall, the model exhibited misclassifications, highlighting the need for further refinement.

4.6. LSTM

Our sixth experiment explored the Long Short-Term Memory (LSTM) model, which is ideal for dealing with sequential data. Figure 12 depicts the training curves for both the training and validation datasets. While LSTMs excel with sequential data, our model’s performance deteriorated owing to dataset limitations. The considerable volatility in data distribution, as well as the imbalanced class representation, caused challenges.

The model most likely ignored other classes in favor of predicting the one that predominated in the imbalanced dataset. This emphasizes how crucial it is to deal with data imbalances prior to training subsequent models. The data variance prevented the cost function from fully converging, even if the graphs showed that it was heading in the direction of the minimum. Comparing this to earlier versions, the performance was not as good. While the training accuracy increased to 70–75%, other parameters, such as precision and AUC, continued to decline, averaging 58–64% (Figure 12a). There were notable fluctuations (20–30%) in the loss values as well (Figure 12b). The difficulties in using LSTMs on imbalanced, highly variable datasets are highlighted by these findings. Although the LSTM model works well with sequential data, our dataset’s constraints made it less effective in this experiment. Improving the effectiveness of LSTMs in this particular situation may require addressing data imbalance and maybe investigating methods to handle data volatility.

4.7. Proposed Model (HRIDM)

In the final experiment, our proposed HRIDM model was trained for 20 epochs, similar to the other models. As seen in Figure 13, the model showed consistently high accuracy, with both training and validation accuracy settling between 95 and 97%. This demonstrates high learning capacity, as the model accurately caught the patterns in the training data and generalized well to previously encountered data in the validation set. The precision graphs support the model’s learning progress. Training precision was roughly 80%, whereas validation precision was around 70% (Figure 13a). These findings indicate the HRIDM model’s capacity to achieve excellent accuracy and precision on both training and validation datasets.

Figure 13b depicts the decreasing trend in both training and validation loss across the training procedure. This represents a satisfactory convergence to the ideal solution (global minimum). Over the course of 20 epochs, the training loss dropped from above 0.15 to less than 0.090. The validation loss followed a similar pattern, beginning over 0.14 and decreasing to less than 0.085. Lower loss values suggest that the model is better able to predict the target variable (abnormality in ECG data). Finally, the AUC values for both training and validation data remained around 70%, with an increasing trend noted near the end of training. This indicates the model’s efficacy in terms of accuracy, recall (as represented by AUC), and overall classification ability. These combined findings demonstrate the HRIDM model’s effectiveness. The model improved its ability to detect anomalies in the big ECG dataset by including both inception and residual attributes in the model’s structure.

After training the models, we tested them to find their performance on the unseen dataset, shown in Table 2. This section provides a comprehensive analysis of the results obtained from all the implemented models, including the proposed model. As shown in Figure 14, the proposed model achieved significantly higher accuracy than previous models, achieving a test score of 50.87%. The Inception network emerged as the second-best model, with a test score of 40.6%, outperforming most of the models mentioned in the literature. On the other hand, ResNet-50, VGG-16, and AlexNet demonstrated comparable performance, with test scores ranging from 33.8% to 35.1%. However, LeNet-5 and LSTM did not meet the performance standards. This is likely due to the highly imbalanced class distribution in the data, making it difficult for these models to learn complex patterns.

To provide a visual representation of the results, Figure 15 presents the confusion matrices. The X-axis of each matrix represents the actual values, while the Y-axis represents the predicted values of our model. The obtained challenge score for our model, using the evaluation matrix of the challenge site, was 0.50897. The confusion matrices demonstrate the model’s attempt to classify each class label despite the highly imbalanced data. Among SOTA models, our proposed model outperformed others, exhibiting favorable performance compared to previous approaches mentioned in the literature review.

5. Discussion

In this section, we investigated various research papers published on the CinC 2020: Program website [52] and IEEE Xplore and utilized the PhysioNet/CinC Challenge 2020 dataset to effectively classify abnormalities in ECG signal data (Table 3). The objective was to develop a model capable of learning complex patterns within ECG signals and accurately distinguishing between 27 different abnormalities.

Among all the state-of-the-art techniques, the highest test score obtained was 53.3%, achieved by utilizing the Wide and Deep Transformer neural network model [30]. In this work, the authors utilized an extensive feature engineering approach (Handcrafted ECG Features) along with preprocessing techniques (finite impulse response bandpass filter). The second highest official ranking, with a 52.0% test score, was obtained using Modified ResNet with a Squeeze-and-Excitation Layer (MResNet-SE). This approach utilized zero padding to extend the signal sample to 4096 samples and down sampling techniques to sample the signal to 257 Hz [29]. In this study, the authors did not utilize any preprocessing or feature extraction techniques, but they used additional approaches such as a constrained grid-search for addressing data imbalance problems and a bespoke weighted accuracy metric for evaluating model performance.

Another study [28] also utilized SE-Resnet and acquired a test score of 51.4% on the undisclosed test dataset. In this work, the authors used squeeze-and-excitation blocks to extract the fine details for 10 s to 30 s segments of the ECG signal and validated model performance using an external open-source dataset. The authors also used wavelet-based denoising techniques to remove the noise from the ECG dataset before training, but no additional feature extraction techniques were utilized. All other methods had lower test scores compared to our proposed model. Among them, ref. [18] had the lowest score with 12.2%, but due to the correct evaluation strategy, they secured a place in the official ranking. In this work, the authors used Polyphase Filter Resampling techniques for preprocessing of the ECG data. No additional feature extraction techniques were used. With this approach, they classified into 24 categories out of 27 and utilized 75 epochs for training.

Among all the reviewed studies, most authors [18,20,24,25,26], utilized CNNs in their model design, either directly or indirectly. The second most utilized model was ResNet, as employed by many authors [19,21,27,28,29], in their methodology, directly or indirectly. The analysis concluded that all the top four studies, including ours, which secured the first four places, utilized residual structure in the model design. Based on the observation of preprocessing techniques such as denoising or data augmentation, it can be concluded that the studies [19,20,21,22,29] did not utilize any of these techniques for preprocessing of the ECG data. Among these studies, only [29] achieved a better score than proposed technique. While studies [21,23,26,30] employed additional feature extraction techniques to enhance the ECG abnormality classifications, only [30] achieved a better score as compared to our proposed techniques. Additionally, only studies [19,20,22,29] did not utilize any preprocessing or feature extraction techniques, although they employed some additional approaches for result enhancements. Among them, only study [29] achieved a better score as compared to our proposed methodology.

Although our proposed method achieved fourth place with a 50.87% test score as compared with the official rankings, its strength lies in its efficiency and simplicity. Unlike other studies, we did not employ any feature extraction, denoising, additional strategies, or data augmentation in our proposed methodology. Due to the integration of residual and inception blocks in our model architecture, its performance is exceptionally good without relying on common preprocessing steps.

Among the state-of-the-art classifiers, the highest accuracy achieved was 40.6% using the Inception network. However, the results obtained using these established architectures did not yield satisfactory performance. The DNNs, LSTM, and LeNet-5 struggled to classify beyond a single class. This limitation can be attributed to the imbalanced dataset and the significant variance in data distribution. Consequently, these shorter networks with fewer parameters failed to converge and could not effectively identify complex patterns within the data. On the other hand, more recent and deeper architectures such as ResNet-50, VGG-16, and AlexNet attempted to classify multiple classes but fell short of achieving the desired results.

In contrast, our proposed hybrid residual/inception-based deeper model (HRIDM) outperformed all the aforementioned architectures. Despite its superior performance, the training time for our model was comparable to that of the Inception network. When comparing the accuracy achieved by our model with previous research discussed in the literature review, it surpassed many existing approaches. Only two models exhibited significantly higher test accuracy. One of these models employed a transformer, which is a relatively larger model that would require substantially more training time. The other approach proposed a hybrid network comprising an SE-ResNet and a fully connected network that incorporated age and gender as additional input features, which were later integrated for classification.

Overall, our proposed approach demonstrated remarkable performance in comparison to previous research and state-of-the-art architectures, achieving an impressive test score of 50.87%.

6. Conclusions

In this research, we addressed the crucial medical challenge of identifying heart diseases using 12-lead ECG data. We presented a novel DL model (HRIDM) which integrated two key components: residual blocks and inception blocks. This work utilized the official PhysioNet/CinC Challenge 2020 dataset, which includes over 41,000 training data samples and 27 categories of ECG abnormalities. We carefully tuned the hyperparameters for each block to achieve the best possible results. By allowing the network to identify complex patterns in the inputs, the use of inception blocks increased performance while the addition of residual blocks reduced the impact of the vanishing gradient issue. Compared to previous investigations, our model significantly outperformed them with an excellent accuracy score of 50.87% on the test dataset. We have also validated and compared the outcomes of our proposed model with SOTA models and techniques. Our findings open up new avenues for heart disease diagnosis research and demonstrate the promise of deep learning models in the field of cardiology.

7. Future Work

There are several avenues for further exploration and extension of this research in the future. Firstly, the application of data augmentation techniques can be employed to address the issue of imbalanced datasets. By augmenting the existing data, we can achieve a more balanced representation of each class, reducing the likelihood of misclassifications.

Additionally, incorporating demographic features such as age, gender, and sex into the model architecture can lead to the development of a hybrid network. This hybrid network can leverage these additional features in conjunction with the ECG data for more accurate classification.

Author Contributions

Methodology, S.A.M.; Conceptualization, S.A.M. and H.M.R.; software, H.M.R.; visualization, H.M.R.; writing—original draft preparation, S.A.M., H.M.R. and J.Y.; validation, J.Y. and S.A.M., writing—review and editing, J.Y., H.M.R. and S.A.M.; supervision, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

No funding was received for this work.

Data Availability Statement

The dataset utilized in this work is freely available on the official PhysioNet/CinC Challenge 2020 website: https://physionet.org/content/challenge-2020/1.0.2/ (accessed on 29 July 2022).

Conflicts of Interest

There are no conflicts of interest present for this work.

Appendix A

This appendix contains two tables: Table A1, which provides the abbreviations used in Figure 2, and Table A2, which offers a detailed description of the dataset.

Table A1. The Abbreviations corresponding to the Diagnosis and SNOMED CT codes used in the utilized dataset.

Diagnosis	SNOMED CT Codes	Abbreviation
1st degree AV block	270492004	IAVB
Atrial fibrillation	164889003	AF
Atrial flutter	164890007	AFL
Bradycardia	426627000	Brady
Complete right bundle branch block	713427006	CRBBB
Incomplete right bundle branch block	713426002	IRBBB
Left anterior fascicular block	445118002	LAnFB
Left axis deviation	39732003	LAD
Left bundle branch block	164909002	LBBB
Low QRS voltages	251146004	LQRSV
Nonspecific intraventricular conduction disorder	698252002	NSIVCB
Pacing rhythm	10370003	PR
Premature atrial contraction	284470004	PAC
Premature ventricular contractions	427172004	PVC
Prolonged PR interval	164947007	LPR
Prolonged QT interval	111975006	LQT
Q wave abnormal	164917005	QAb
Right axis deviation	47665007	RAD
Right bundle branch block	59118001	RBBB
Sinus arrhythmia	427393009	SA
Sinus bradycardia	426177001	SB
Sinus rhythm	426783006	NSR
Sinus tachycardia	427084000	STach
Supraventricular premature beats	63593006	SVPB
T wave abnormal	164934002	TAb
T wave inversion	59931005	TInv
Ventricular premature beats	17338001	VPB

Table A2. Number of recordings, mean duration of recordings, mean age of patients in recordings, sex of patients in recordings, and sample frequency of recordings for each dataset.

Dataset	Number of Recordings	Mean Duration (Seconds)	Mean Age (Years)	Sex (Male/Female)	Sample Frequency (Hz)
CPSC (all data)	13,256	16.2	61.1	53%/47%	500
CPSC Training	6877	15.9	60.2	54%/46%	500
CPSC-Extra Training	3453	15.9	63.7	53%/46%	500
Hidden CPSC	2926	17.4	60.4	52%/48%	500
INCART	72	1800	56	54%/46%	257
PTB	516	110.8	56.3	73%/27%	1000
PTB-XL	21,837	10	59.8	52%/48%	500
G12EC (all data)	20,678	10	60.5	54%/46%	500
G12EC Training	10,344	10	60.5	54%/46%	500
Hidden G12EC	10,344	10	60.5	54%/46%	500
Undisclosed	10,000	10	63	53%/47%	300

References

Dziadosz, D.; Daniłowicz-Szymanowicz, L.; Wejner-Mik, P.; Budnik, M.; Brzezińska, B.; Duchnowski, P.; Golińska-Grzybała, K.; Jaworski, K.; Jedliński, I.; Kamela, M.; et al. What Do We Know So Far About Ventricular Arrhythmias and Sudden Cardiac Death Prediction in the Mitral Valve Prolapse Population? Could Biomarkers Help Us Predict Their Occurrence? Curr. Cardiol. Rep. 2024, 26, 245–268. [Google Scholar] [CrossRef]
Santangelo, G.; Bursi, F.; Faggiano, A.; Moscardelli, S.; Simeoli, P.S.; Guazzi, M.; Lorusso, R.; Carugo, S.; Faggiano, P. The Global Burden of Valvular Heart Disease: From Clinical Epidemiology to Management. J. Clin. Med. 2023, 12, 2178. [Google Scholar] [CrossRef] [PubMed]
Kim, S.Y.; Lee, J.-P.; Shin, W.-R.; Oh, I.-H.; Ahn, J.-Y.; Kim, Y.-H. Cardiac biomarkers and detection methods for myocardial infarction. Mol. Cell. Toxicol. 2022, 18, 443–455. [Google Scholar] [CrossRef]
Rai, H.M.; Chatterjee, K.; Dashkevych, S. The prediction of cardiac abnormality and enhancement in minority class accuracy from imbalanced ECG signals using modified deep neural network models. Comput. Biol. Med. 2022, 150, 106142. [Google Scholar] [CrossRef]
Kim, M.-G.; Choi, C.; Pan, S.B. Ensemble Networks for User Recognition in Various Situations Based on Electrocardiogram. IEEE Access 2020, 8, 36527–36535. [Google Scholar] [CrossRef]
Gong, Z.; Tang, Z.; Qin, Z.; Su, X.; Choi, C. Electrocardiogram identification based on data generative network and non-fiducial data processing. Comput. Biol. Med. 2024, 173, 108333. [Google Scholar] [CrossRef]
Rahman, A.-U.; Asif, R.N.; Sultan, K.; Alsaif, S.A.; Abbas, S.; Khan, M.A.; Mosavi, A. ECG Classification for Detecting ECG Arrhythmia Empowered with Deep Learning Approaches. Comput. Intell. Neurosci. 2022, 2022, 6852845. [Google Scholar] [CrossRef]
Choi, G.; Ziyang, G.; Wu, J.; Esposito, C.; Choi, C. Multi-modal Biometrics Based Implicit Driver Identification System Using Multi-TF Images of ECG and EMG. Comput. Biol. Med. 2023, 159, 106851. [Google Scholar] [CrossRef]
Zeng, Y.; Zhan, G. Extracting cervical spine popping sound during neck movement and analyzing its frequency using wavelet transform. Comput. Biol. Med. 2022, 141, 105126. [Google Scholar] [CrossRef] [PubMed]
Asif, R.N.; Abbas, S.; Khan, M.A.; Rahman, A.U.; Sultan, K.; Mahmud, M.; Mosavi, A. Development and Validation of Embedded Device for Electrocardiogram Arrhythmia Empowered with Transfer Learning. Comput. Intell. Neurosci. 2022, 2022, 5054641. [Google Scholar] [CrossRef]
Asif, R.N.; Ditta, A.; Alquhayz, H.; Abbas, S.; Khan, M.A.; Ghazal, T.M.; Lee, S.-W. Detecting Electrocardiogram Arrhythmia Empowered with Weighted Federated Learning. IEEE Access 2024, 12, 1909–1926. [Google Scholar] [CrossRef]
Kim, H.J.; Lim, J.S. Study on a Biometric Authentication Model based on ECG using a Fuzzy Neural Network. IOP Conf. Ser. Mater. Sci. Eng. 2018, 317, 012030. [Google Scholar] [CrossRef]
Kim, Y.; Choi, G.; Choi, C. One-Dimensional Shallow Neural Network Using Non-Fiducial Based Segmented Electrocardiogram for User Identification System. IEEE Access 2023, 11, 102483–102491. [Google Scholar] [CrossRef]
Islam, M.S.; Hasan, K.F.; Sultana, S.; Uddin, S.; Quinn, J.M.; Moni, M.A. HARDC: A novel ECG-based heartbeat classification method to detect arrhythmia using hierarchical attention based dual structured RNN with dilated CNN. Neural Netw. 2023, 162, 271–287. [Google Scholar] [CrossRef] [PubMed]
Hammad, M.; Pławiak, P.; Wang, K.; Acharya, U.R. ResNet-Attention model for human authentication using ECG signals. Expert Syst. 2021, 38, e12547. [Google Scholar] [CrossRef]
Rai, H.M.; Chatterjee, K. Hybrid CNN-LSTM deep learning model and ensemble technique for automatic detection of myocardial infarction using big ECG data. Appl. Intell. 2022, 52, 5366–5384. [Google Scholar] [CrossRef]
HRai, M.; Chatterjee, K. 2D MRI image analysis and brain tumor detection using deep learning CNN model LeU-Net. Multimedia Tools Appl. 2021, 80, 36111–36141. [Google Scholar] [CrossRef]
Nejedly, P.; Ivora, A.; Viscor, I.; Halamek, J.; Jurak, P.; Plesinger, F. Utilization of Residual CNN-GRU with Attention Mechanism for Classification of 12-lead ECG. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [Google Scholar] [CrossRef]
Yang, S.; Xiang, H.; Kong, Q.; Wang, C. Multi-label Classification of Electrocardiogram with Modified Residual Networks. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [Google Scholar] [CrossRef]
Vicar, T.; Hejc, J.; Novotna, P.; Ronzhina, M.; Janousek, O. ECG Abnormalities Recognition Using Convolutional Network with Global Skip Connections and Custom Loss Function. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [Google Scholar] [CrossRef]
Jia, W.; Xu, X.; Xu, X.; Sun, Y.; Liu, X. Automatic Detection and Classification of 12-lead ECGs Using a Deep Neural Network. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [Google Scholar] [CrossRef]
Fayyazifar, N.; Ahderom, S.; Suter, D.; Maiorana, A.; Dwivedi, G. Impact of Neural Architecture Design on Cardiac Abnormality Classification Using 12-lead ECG Signals. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [Google Scholar] [CrossRef]
Chen, J.; Chen, T.; Xiao, B.; Bi, X.; Wang, Y.; Li, W.; Duan, H.; Zhang, J.; Ma, X. SE-ECGNet: Multi-scale SE-Net for Multi-lead ECG Data. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [Google Scholar] [CrossRef]
Bos, M.; van de Leur, R.; Vranken, J.; Gupta, D.; van der Harst, P.; Doevendans, P.; van Es, R. Automated Comprehensive Interpretation of 12-lead Electrocardiograms Using Pre-trained Exponentially Dilated Causal Convolutional Neural Networks. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [Google Scholar] [CrossRef]
Min, S.; Choi, H.-S.; Han, H.; Seo, M.; Kim, J.-K.; Park, J.; Jung, S.; Oh, I.-Y.; Lee, B.; Yoon, S. Bag of Tricks for Electrocardiogram Classification with Deep Neural Networks. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [Google Scholar] [CrossRef]
Hasani, H.; Bitarafan, A.; Soleymani, M. Classification of 12-lead ECG Signals with Adversarial Multi-Source Domain Generalization. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [Google Scholar] [CrossRef]
Oppelt, M.; Riehl, M.; Kemeth, F.; Steffan, J. Combining Scatter Transform and Deep Neural Networks for Multilabel ECG Signal Classification. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [Google Scholar] [CrossRef]
Zhu, Z.; Wang, H.; Zhao, T.; Guo, Y.; Xu, Z.; Liu, Z.; Liu, S.; Lan, X.; Sun, X.; Feng, M. Classification of Cardiac Abnormalities from ECG Signals Using SE-ResNet. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [Google Scholar] [CrossRef]
Zhao, Z.; Fang, H.; Relton, S.; Yan, R.; Liu, Y.; Li, Z.; Qin, J.; Wong, D. Adaptive lead weighted ResNet trained with different duration signals for classifying 12-lead ECGs. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [Google Scholar] [CrossRef]
Natarajan, A.; Chang, Y.; Mariani, S.; Rahman, A.; Boverman, G.; Vij, S.; Rubin, J. A Wide and Deep Transformer Neural Network for 12-Lead ECG Classification. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [Google Scholar] [CrossRef]
Liu, F.; Liu, C.; Zhao, L.; Zhang, X.; Wu, X.; Xu, X.; Liu, Y.; Ma, C.; Wei, S.; He, Z.; et al. An Open Access Database for Evaluating the Algorithms of Electrocardiogram Rhythm and Morphology Abnormality Detection. J. Med. Imaging Health Inform. 2018, 8, 1368–1373. [Google Scholar] [CrossRef]
Goldberger, A.L.; Amaral, L.A.N.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.-K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 2000, 101, e215–e220. [Google Scholar] [CrossRef]
Alday, E.A.P.; Gu, A.; Shah, A.J.; Robichaux, C.; Wong, A.-K.I.; Liu, C.; Liu, F.; Rad, A.B.; Elola, A.; Seyedi, S.; et al. Classification of 12-lead ECGs: The PhysioNet/Computing in Cardiology Challenge 2020. Physiol. Meas. 2020, 41, 124003. [Google Scholar] [CrossRef]
Rai, H.M.; Chatterjee, K. A Novel Adaptive Feature Extraction for Detection of Cardiac Arrhythmias Using Hybrid Technique MRDWT & MPNN Classifier from ECG Big Data. Big Data Res. 2018, 12, 13–22. [Google Scholar] [CrossRef]
Jahmunah, V.; Ng, E.Y.K.; Tan, R.S.; Oh, S.L.; Acharya, U.R. Uncertainty quantification in DenseNet model using myocardial infarction ECG signals. Comput. Methods Programs Biomed. 2023, 229, 107308. [Google Scholar] [CrossRef]
Barua, P.D.; Aydemir, E.; Dogan, S.; Kobat, M.A.; Demir, F.B.; Baygin, M.; Tuncer, T.; Oh, S.L.; Tan, R.-S.; Acharya, U.R. Multilevel hybrid accurate handcrafted model for myocardial infarction classification using ECG signals. Int. J. Mach. Learn. Cybern. 2023, 14, 1651–1668. [Google Scholar] [CrossRef]
Al-Jibreen, A.; Al-Ahmadi, S.; Islam, S.; Artoli, A.M. Person identification with arrhythmic ECG signals using deep convolution neural network. Sci. Rep. 2024, 14, 4431. [Google Scholar] [CrossRef] [PubMed]
Baumgartner, M.; Veeranki, S.P.K.; Hayn, D.; Schreier, G. Introduction and Comparison of Novel Decentral Learning Schemes with Multiple Data Pools for Privacy-Preserving ECG Classification. J. Healthc. Inform. Res. 2023, 7, 291–312. [Google Scholar] [CrossRef]
Janbhasha, S.; Bhavanam, S.N.; Harshita, K. GAN-Based Data Imbalance Techniques for ECG Synthesis to Enhance Classification Using Deep Learning Techniques and Evaluation. In Proceedings of the 2023 3rd International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), Bhilai, India,, 5–6 January 2023; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2023. [Google Scholar] [CrossRef]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems; Pereira, F., Burges, C.J., Bottou, L., Weinberger, K.Q., Eds.; Curran Associates Inc.: Red Hook, NY, USA, 2012; Available online: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf (accessed on 27 July 2023).
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A.; Liu, W.; et al. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Maniatopoulos, A.; Mitianoudis, N. Learnable Leaky ReLU (LeLeLU): An Alternative Accuracy-Optimized Activation Function. Information 2021, 12, 513. [Google Scholar] [CrossRef]
Dubey, A.K.; Jain, V. Comparative Study of Convolution Neural Network’s Relu and Leaky-Relu Activation Functions. In Applications of Computing, Automation and Wireless Systems in Electrical Engineering: Proceedings of MARC 2018; Springer: Singapore, 2019; pp. 873–880. [Google Scholar] [CrossRef]
Dubey, S.R.; Singh, S.K.; Chaudhuri, B.B. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing 2022, 503, 92–108. [Google Scholar] [CrossRef]
Rai, H.M.; Yoo, J.; Dashkevych, S. Two-headed UNetEfficientNets for parallel execution of segmentation and classification of brain tumors: Incorporating postprocessing techniques with connected component labelling. J. Cancer Res. Clin. Oncol. 2024, 150, 220. [Google Scholar] [CrossRef] [PubMed]
Ye, X.; Huang, Y.; Lu, Q. Automatic Multichannel Electrocardiogram Record Classification Using XGBoost Fusion Model. Front. Physiol. 2022, 13, 840011. [Google Scholar] [CrossRef] [PubMed]
CinC 2020: Program. Available online: https://www.cinc.org/archives/2020/ (accessed on 29 July 2022).

Figure 1. Data distribution based on signal length.

Figure 2. Distribution of ECG signals across 27 diagnosis categories.

Figure 3. The flowchart of the utilized preprocessing technique.

Figure 4. Segmented and preprocessed 12-lead ECG signals.

Figure 5. Building blocks of proposed hybrid residual/inception-based deeper model (HRIDM).

Figure 6. Comparison of output responses for ReLU, Leaky ReLU, sigmoid, and tanh activation functions.

Figure 7. Training history curve of LeNet-5 model on training and validation data. (a) Accuracy, precision, and AUC. (b) Loss.

Figure 8. Training history curve of AlexNet model on training and validation data. (a) Accuracy, precision, and AUC. (b) Loss.

Figure 9. Training history curve of VGG-16 model on training and validation data. (a) Accuracy, precision, and AUC. (b) Loss.

Figure 10. Training history curve of ResNet-50 model on training and validation data. (a) Accuracy, precision, and AUC. (b) Loss.

Figure 11. Training history curve of Inception Network on training and validation data. (a) Accuracy, precision, and AUC. (b) Loss.

Figure 12. Training history curve of LSTM model on training and validation data. (a) Accuracy, precision, and AUC. (b) Loss.

Figure 13. Training history curve of proposed model on training and validation data. (a) Accuracy, precision, and AUC. (b) Loss.

Figure 14. Accuracy comparison of SOTA vs. proposed model on test dataset.

Figure 15. Confusion matrix of the proposed model with SNOMED CT classes on the test dataset.

Table 1. Dataset description.

Database	Total Recordings	Recordings in Test Set	Recordings in Validation Set	Recordings in Training Set	Total Patients
CPSC	13,256	1463	1463	10,330	9458
INCART	74	0	0	74	32
PTB	22,353	0	0	22,353	19,175
G12EC	20,678	5167	5167	10,344	15,742
Undisclosed	10,000	10,000	0	0	Unknown
Total	66,361	16,630	6630	43,101	Unknown

Table 2. Testing accuracy achieved by multiple SOTA models.

Model	Test Score
LeNet-5	15.7%
AlexNet	33.8%
VGG16	34.9%
ResNet50	35.1%
Inception	40.6%
LSTM	17.9%
Proposed (HRIDM)	50.87%

Table 3. Comparative analysis of the proposed HRIDM model with the state-of-the-art methods for 12-lead ECG abnormality detection on the PhysioNet/CinC Challenge 2020 dataset.

Reference	Model	Noise Reduction or Augmentation	Feature	Additional Approach	Test Score
[18]	Residual CNN-GRU with attention mechanism (RCNN-GRU-AM)	Polyphase Filter Resampling	NA	24 types, 75 epochs	12.2%
[19]	Improved residual network (iResNet)	NA	NA	Ensemble Method	20.8%
[20]	Modified ResNet Type Convolutional Neural Network (MR-CNN)	NA	NA	Bayesian Threshold Optimization	20.2%
[21]	SE-ResNet34	NA	1D-CNN	Evaluation Metrics based on weights assigned to different classes	35.9%
[22]	Hand-Designed Recurrent Convolutional Neural Network (RCNN)	NA	NA	Utilized Second Neural Model	38.2%
[23]	Squeeze-and-excitation networks for ECG classification (SE-ECGNet)	Data augmentation	multi-scale features extracted	NA	41.1%
[24]	Exponentially Dilated Causal Convolutional Neural Network (ED-CCNN)	Pre-trained on a physician-annotated dataset of 254,044 12-lead ECGs.	NA	Ensemble Method	41.7%
[25]	18-Layer Residual Convolutional Neural Network (ResNet-18)	Notch filters	NA	Post-Training Refinements	42%
[26]	Adversarial Multi-Source DomainGeneralization (AMDG)	Augmentation techniques	CNN and LSTM	Adversarial domain generalization	43.7%
[27]	Deep Residual Neural Network with Scatter Transform (DResNet-ST)	Augmentation	NA	24 types, trainable layers in between scatter transforms	48.0%
[28]	Ensembled SE-ResNet Model	Wavelet denoising	NA	External Open-Source Data, Rule-Based Bradycardia Model	51.4%
[29]	Modified ResNet with a Squeeze-and-Excitation Layer (MResNet-SE)	NA	NA	Constrained Grid-Search, Bespoke Weighted Accuracy Metric	52.0%
[30]	Wide and Deep Transformer Neural Network (WDTNN)	Finite impulse response bandpass filter	Handcrafted ECG Features	Combination of Features	53.3%
Proposed	HRIDM	NA	NA	NA	50.87%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Moqurrab, S.A.; Rai, H.M.; Yoo, J. HRIDM: Hybrid Residual/Inception-Based Deeper Model for Arrhythmia Detection from Large Sets of 12-Lead ECG Recordings. Algorithms 2024, 17, 364. https://doi.org/10.3390/a17080364

AMA Style

Moqurrab SA, Rai HM, Yoo J. HRIDM: Hybrid Residual/Inception-Based Deeper Model for Arrhythmia Detection from Large Sets of 12-Lead ECG Recordings. Algorithms. 2024; 17(8):364. https://doi.org/10.3390/a17080364

Chicago/Turabian Style

Moqurrab, Syed Atif, Hari Mohan Rai, and Joon Yoo. 2024. "HRIDM: Hybrid Residual/Inception-Based Deeper Model for Arrhythmia Detection from Large Sets of 12-Lead ECG Recordings" Algorithms 17, no. 8: 364. https://doi.org/10.3390/a17080364

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

HRIDM: Hybrid Residual/Inception-Based Deeper Model for Arrhythmia Detection from Large Sets of 12-Lead ECG Recordings

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Datasets

3.2. Data Preprocessing

3.3. SOTA Models

3.4. Proposed Model (HRIDM)

3.5. Activation Functions

3.6. Evaluation Metrics

4. Results

4.1. LeNet-5

4.2. AlexNet

4.3. VGG-16

4.4. ResNet-50

4.5. Inception Network

4.6. LSTM

4.7. Proposed Model (HRIDM)

5. Discussion

6. Conclusions

7. Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI