1. Introduction
Atrial fibrillation (AF) is the major concern with irregular heart rhythms, especially if a person approaches the age of 65. The heart is a pump used for efficient operation and it is regulated by an internal pacemaker that controls the heartbeat, the electrical impulse usually starts from the sinoatrial node [
1,
2,
3] and moves from atria to ventricles, which can cause regular rhythmic contractions of the chambers. AF risk remains unacknowledged in some patients, because of its unawareness, and other patients may be mindful of the erratic behaviour of heartbeat and may be uncomfortable with sensation. The irregular heartbeat indicates the symptoms of any accidental strokes that may result in further prolonged illness and, as a result, it leads to ultimate heart failure [
4,
5,
6,
7,
8]. The AF detection methods are mainly focused on the RR intervals, short term study of heart rate variability, and sequential examination to check the existence of P-wave [
9]. The current studies are mostly focused on the feature extraction techniques; therefore, the features have significant fictional effect on the final outcome of the models. The major improvements in electrocardiogram (ECG) data and the advancement of Dynamic Neural Network (DNNs) and, in particular, the implementation of algorithms such as long short term memory (LSTM) and convolutional neural network (CNN). These algorithms can be directly trained on the large-scaled dataset that RR Intervals of the ECG signal requirements and the performance of the ECG models is dramatically improved [
10]. It is worth mentioning that, in the past decade, deep learning (DL) classifiers have been successfully applied in different fields, such as speech recognition, image classification [
11], and many other domains such as natural language processing [
12]. However, DL algorithms have not been widely applied for AF detection. However, there are few studies applied aforementioned classifier for AF prediction, but the comparison results show that the performance was unsatisfactory when compared to the image classification and image recognition [
13]. It is to be noted that there is not any study available to detect the AF using DL classifiers.
In order to overcome the current issues, we build six models that are based on the feature-based approaches and DL approaches including support vector machine (SVM), multilayer perceptron (MLP), CNN and LSTM. The feature-based model is trained based on the manually extracted features, and DL methods are trained on raw data without any feature engineering. It is envisioned that this is the only exploiting automated feature engineering that focuses on DL for the automatic AF detection and classification. Additionally, the DL classifiers are compared with shallow learning classifiers.
In summary, the paper reports three major contributions that are outlined below:
We developed a novel deep learning architecture for convolutional neural network (CNN) and long short-term memory (LSTM) to automatically detect AF. In addition, in depth comparison has been done with state-of-the-art approaches as well as baseline models, such as ResNet and Convolutional LSTM.
Comparative analysis of the proposed approach with two widely online benchmark datasets.
It is to be noted that, unlike the traditional machine learning algorithms, the deep learning methods have integrated feature extraction into the model, thus the handcrafted features are not needed. In addition, these methods can mine well different types of data sources and have good generalization ability, allowing for the computer to automatically learn and extract related features for any given issues. We developed an end-to-end approach that is based on deep learning approaches, which does not require feature selection and feature extraction technique.
Additionally, we developed novel framework that can detect AF based on raw ECG signals than instead of other ECG features.
This paper structured, as follows: in
Section 2, related work on the machine learning (ML) and DL classifiers for AF detection are presented. In
Section 3, our novel approach for automated AF detection using ML and DL are described. In
Section 4, the experimental results are discussed.
Section 5 provides the discussion of the experimental results. Lastly, a conclusion is drawn out in
Section 6 with future recommendations.
4. Experimental Results
In this section, we explain the experimental setup, followed by the results and discussions. In order to evaluate the performance of the proposed approach, there are different evaluation metrics, including accuracy, precision, recall, and f-measure are used [
54].
where
denotes true positive,
presents true negative,
is false positive, and
represents false negative respectively.
PhysioNet Dataset: In order to understand the performance of our proposed approach, we used the open labeled dataset of 2017 Physionet challenge dataset which consists of 8528 single channel ECG waveforms that were donated by the AliveCor. The waveform was recorded on the average of 30 s with a short waveform of 9 s and a long waveform of 61 s. The records were manually labeled into four different classes, including normal rhythm, AF rhythm, other rhythms, and noisy recording. The dataset consists of 8528 ECG recordings. The dataset consists of duration for each recording from 9 s to 60 s and it has four different class, such as normal, AF, noisy, and other. The sampling frequency of the record of 300 Hz.
MIT-BIH Atrial Fibrillation Dataset: the dataset consists of 23 ECG recording of human subjects with AF arrhythmia. Each of the ECG recording signals are sampled with 250 Hz with 12-bit resolution over range of ±10 millivolts. In this study, we used four segments and labelled each of them based on the threshold parameter.
In order to evaluate the performance of the approach, the dataset is divided into 70% of training, 20% of testing and 10% of validation. The generalisation model has been evaluated while using the performance of the model on 48 short-term recording. The dataset consists of four different subjects that are more likely to learn the feature.
Table 2 shows the summary of parameters to train ML and DL models.
Table 3 shows the summary of ML and DL classifiers. The DL algorithms, such as CNN and LSTM, achieved better performance as compared to ML classifiers using PhysioNet, as shown in the Table.
Table 4 show the summary of ML and DL classifiers. As shown in the Table the DL algorithms such as CNN and LSTM achieved better performance as compared to ML classifiers while using MIT-BIH Atrial Fibrillation.
Table 5 shows the summary of results for ML and DL classifiers to identify the AF. The comparative experimental results show that the deep learning algorithms, such as LSTM and CNN, achieved better performance as compared to traditional ML classifiers. The experimental results show that CNN and LSTM achieved better performance when compared to traditional ML classifier. The proposed CNN architecture does not require any feature engineering as compared to machine learning classifier and it can generate the best performance as compared to machine learning algorithms such as MLP and logistic regression. We need to point out that the proposed CNN is more accurate to use the feature learning as compared to MLP and logistic regression, because the proposed approach can improve the AF detection by using DL classifiers. However, the CNN results show that the capability of learning features of identify of AF which can outperform in another ML algorithm. In order to evaluate the current results the Convolutional LSTM and ResNet have been added. However, the baseline ResNet and Convolutional LSTM achieved lower performance as compared to the proposed architecture for CNN and LSTM. However, the training time for Convolutional LSTM is lower than the training time for CNN and LSTM.
Table 3,
Table 4,
Table 5 and
Table 6 has been updated. In addition, the ResNet took a long time to train the model. It is to be noted that the major drawback for ResNet is that is takes a long time to train the model. In addition, the convolutional LSTM required lots of resources and trained data to be trained on the AF dataset; therefore, it obtained lower performance as compared to proposed architecture for CNN and LSTM.
Table 5 presents the summary of results for ML and deep learning classifiers in order to identify the AF using MIT-BIH Atrial Fibrillation. The initial experimental results demonstrate that the deep learning algorithms, such as LSTM and CNN, achieved better performance when compared to traditional machine learning classifiers.
The proposed DL approaches, including CNN and LSTM, achieved better performance as compared with traditional machine learning classifiers, such as SVM, MLP, Logistic regression, and XGBoost, as shown in
Table 6. The overall accuracy of LSTM is 87.5%. The main reason why deep learning model achieved better performance as compared to a machine learning model is the feature extraction ability of DL classifiers. However, the main problems of deep learning classifiers is the computation speed. The TensorFlow Python package is used in order to train the model. The models trained used two hundred epochs with back propagation, the Adam optimizer is used for minimising the categorical cross-entropy loss function. In order to find the best accuracy for CNN, the different layers of CNN have been used in order to find out the best accuracy for CNN. The 2017 Physionet challenge dataset has been evaluated with a different number of convolutional layers, as shown in
Table 7. The initial experimental results show that, as the number of convolutional layers are increased, the performance is improved and, also, once the number of convolutional layers increased to five, the performance is decreased gradually. Therefore, we have chosen four convolutional layers, as the performance is improved. It is worth to mention that, as number of convolutional layers increased, the model takes more time to train. According to the results shown in
Table 7 and
Table 8, the precision, recall and f-measure for layer 4 is better as compared to other layers. In addition, the cohen’s kappa statistical values were above 0.8, where it shows that our proposed system is the most perfect approach in detecting AF.
There are different layers of LSTM that have been used to find the best model architecture to detect the AF in order to find the best accuracy of LSTM. As shown in
Table 9, the 2017 Physionet challenge dataset has been evaluated with a different number of LSTM layers. The experimental results reveal that an increased number of LSTM layer increases the time to train the model and, as the number of layers are decreased, the model is faster to train. According to results, as the number of layer increase from one to two, the performance of the model is rapidly improved. However, as the number of the layers increased from two to three layer the accuracy decreased gradually. The two-layer LSTM achieved better performance in terms of other evaluation metrics, such as precision, recall, and f-measure. In addition,
Table 10 evaluated a different number of LSTM layers with MIT-BIH Atrial Fibrillation dataset. As shown, the LSTM algorithm with two-layer achieved better performance when compared to other layers.
Comparison with State-of-the-Art Approaches
Table 11 demonstrates the comparison of our proposed approach with other state-of-the-art approaches. As demonstrated, our proposed approach achieved better performance in terms of accuracy, precision, recall, and f-measure.