2.1. ECG Dataset
The proposed diagnostic tool uses a recent dataset that is public [
61], including images of ECG records for patients with COVID-19 and other cardiac problems. Until now, to the best of our knowledge, this is the primary and single public dataset for ECG records of COVID-19. ECG images available in the dataset are 1937 of distinct categories. The dataset consists of 250 scans of cases with the novel coronavirus, 300 trace records of cases with a present or former myocardial infarction (MI), 548 ECG records of irregular heartbeats, and 859 normal images without any heart complications as shown in
Table 1. Data were acquired using a 12-lead system with a sampling frequency equal to 500 Hz through an EDAN SE-3 series 3-channel electrocardiograph.
Table 1 also illustrates the number of images used for the training and validation sets of the proposed tool. The dimension of the images varied from 952 × 1232 to 2213 × 1572. The x-scale is 25 mm/s, and the y-scale is 10 mm/volt. Six ECG electrodes were placed on the chest representing six precordial leads. Another three electrodes were placed on the two arms and left leg representing six limb leads, including augmented voltage right (AVR), augmented voltage left (AVL), augmented voltage foot (AVF), Lead I, II, and III. The images of the dataset were evaluated by medical professionals using a telehealth ECG diagnostic scheme. This evaluation was carried out under the supervision of expert cardiologists who had long experience in ECG annotation and exploration. These medical experts removed all uncertain, ambiguous, and misleading images from the dataset.
In the binary classification level (normal versus COVID-19), 250 normal and 250 novel coronavirus records were utilized. Whereas in the multiclass classification level, a total of 750 images were employed, 250 for cardiac complications, 250 for normal cases, and 250 for COVID-19 cases. To avoid the classification bias that occurs due to the class imbalance structure of the ECG dataset (the number of images per class is not equal) that affects the classification process, an equal number of images was selected and used for each class to train the classification models. An ECG trace record sample for a COVID-19 patient is shown in
Figure 1.
2.2. Proposed Tool
The proposed automated tool consists of four steps: ECG trace image preprocessing, deep feature extraction and feature incorporation, hybrid feature selection, and classification. The proposed method used ten DL approaches.
Figure 2 shows a diagram that describes the steps of the proposed diagnostic tool.
DL is an emerging technology that has been widely employed in several fields. DL approaches are the recent class of machine learning (ML). They consist of numerous architectures; however, convolution neural networks (CNNs) are the architectures most widely used for medical images [
69]. Therefore, the proposed diagnostic tool utilizes ten CNNs of various architectures. These networks include InceptionResNet, ResNet-18, ResNet-50, ShuffleNet, Inception V3, MobileNet, Xception, DarkNet-19, DarkNet-53, and DenseNet-201.
Inception V3 Google proposed the Inception CNN architecture in 2016 [
70]. It is a newer version of GoogleNet [
71], but with some modifications. It was first introduced to run well with reduced memory requirements and computational cost. Its principal component is the inception unit which merges numerous filters into a novel filter structure which correspondingly lowers the number of parameters. To expand the information stream into the network, the Inception module considered the depth as well as the width of the layers during the construction of the network [
72].
ResNet is one of the time-efficient CNNs that gained popularity due to its novel structure created by He et al. in 2015 [
73]. ResNet counts on the residual block which embeds crosscuts in the interior layers of a standard CNN to cross several convolution layers which quickens and eases the convergence procedure of the CNN despite the huge number of convolution layers.
Xception is a new version of the Inception network introduced in 2017 [
74]. The inception layers contain depthwise convolution layers, followed by a pointwise convolution layer. The Xception structure involves double layers of convolutional, then several depthwise separable convolution layers, and standard layers of convolutional and fully connected. The Xception module is more robust and powerful than the Inception module and can perform cross-channel and spatial interaction correlations while fully dissociated [
75].
Inception-ResNet-V2 presented a mixture of residual network architecture and the inception module [
76]. It has a number of filters of various dimensions that are merged with residual joints. The main advantage of this fused architecture is enhancing the performance of the network and pace of convergence.
DenseNet was created by Huang et al. [
77] in 2017, who extended the idea of shorter connections between layers near the input/output layers. The key building block of this network is the ‘dense block’. The major difference between the residual block and dense block is that the latter attaches every layer to each layer having a similar input resolution, whereas the former generates shorter links among adjacent layers. Second, each layer of DenseNet accomplishes a concatenation of the earlier outputs; in contrast, ResNet performs a summation. DenseNet-201 was utilized in this article, containing 201 layers.
ShuffleNet is an effective CNN primarily designed by Zhang et al. in 2018 [
78]. ShuffleNet was initially produced to serve fields that require low computational capability. It contains two key blocks known as pointwise group convolution and channel shuffle. The first block utilizes convolution layers of dimension 1 × 1 to reduce training speed while attaining adequate precision. The second block supports the data flowing across feature channels by allowing a cluster of layers to control input data belonging to distinct groups, where the output/input channels are connected.
DarkNet is a new DL architecture designed by the authors of [
79]. It employs YOLO-V2 as the backbone of its structure. DarkNet uses filters of dimension 3 × 3 and then doubles the number of channels after every pooling phase. It employs a pooling stage to perform detection and classification as well as 1 × 1 filters to reduce the feature presentation between 3 × 3 convolutions. Darknet-19 involves 19 convolutional layers, whereas DarkNet-53 contains 53 convolutional layers.
MobileNet is a fine and time-efficient DL architecture that was originally designed in [
80]. It can decrease the complexity of the training model by lowering the number of parameters while maintaining an acceptable performance. These are convolutional layers of dimensions 3 × 3 and 1 × 1, respectively. MobileNet has 53 deep layers.
2.2.1. ECG Image Preprocessing
Initially, the dimensions of the ECG images are changed according to the input layer dimension of each CNN model. Then, those ECG records are augmented to increase the amount of records available in the data set and prevent the likelihood of overfitting that could occur in the case of small data. Those augmentation methods included in the proposed diagnostic tool are flipping in both the x- and y-orientations, and translation in both the x- and y-directions where the range of the translation distance is picked randomly within the range (−30, 30). The scaling augmentation method is also applied to the image in the x- and y-directions where the image is scaled with a scale factor chosen randomly from the range (0.9, 1.1).
Table 2 demonstrates the dimensions of the input layers of each of the CNN models and the extracted features length.
Table 2 shows that the number of features extracted from the last fully connected layer of each CNN for the binary classification and multiclass classification levels is 2 and 3, respectively.
2.2.2. Deep Features Extraction and Feature Incorporation
Some complications may occur while CNNs are being trained, including convergence and overfitting. These issues impose the adjustment of a few parameters in the CNNs to guarantee that the weights of the CNN layers are updated at the same rate during the training process. Transfer learning (TL) is a method that can solve this problem. TL re-employs a CNN that was previously learned with a huge dataset like ImageNet for another classification problem [
81]. In other words, TL uses a pretrained CNN that has learned feature representations from a large dataset to solve another classification problem dealing with a small dataset (similar to the dataset used in this paper). This process can enhance detection accuracy if used for comparable problems [
81]. For that reason, this paper used ten CNNs that were pretrained. Before retraining the ten CNNs, the number of their output layers was changed to 3 or 2 which is equal to the number of classes in the case of the multiclass and binary class classification categories of the proposed diagnostic tool. In other words, the DL models were retrained for the novel classification task. Then, after the retraining process was finished, deep features were extracted from the last fully connected layers of the ten pretrained CNNs. The number of features extracted from each CNN was 2 in the case of the binary classification category and 3 in the multiclass classification category. Afterward, the proposed tool incorporated the deep features extracted from the ten DL models in a concatenation way to form one feature vector consisting of 20 and 30 features in the case of the binary and multiclass classification categories, respectively.
2.2.3. Hybrid Feature Selection
Feature selection (FS) is an essential step to selecting the most valuable features available in the feature space to reduce its dimension, which correspondingly boosts the diagnostic accuracy and avoids overfitting [
82,
83]. FS methods can be categorized into three categories: filter, wrapper, and hybrid [
84]. Hybrid FS merges filter and wrapper methods. This category combines the benefits of previous FS types [
84]. Thus, a hybrid FS approach was presented and employed in this study.
The hybrid FS step presented in the diagnostic tool combines the chi-squared test filter FS approach with a wrapper FS approach based on three search strategies. The chi-squared-test is a well-known and commonly used FS method [
85]. It attempts to determine the significant features
tk that best differentiate positive and negative sets of instances of class
Ci. The chi-squared test score is calculated using Equation (1).
where N is the total number of ECG records (samples in a dataset); A = the number of samples in class
ci that contain the feature
tk; B = the number of samples that contain the feature
tk in other classes; C = the number of samples in class
ci that do not contain the feature
tk; D = the number of samples that do not contain the feature
tk in other classes.
The hybrid FS method initially ranks deep features extracted from the ten CNN models utilizing the chi-squared test filter FS. Then, it employs this ranking to guide the three feature search strategies within the wrapper FS approach. These three search strategies are backward, forward, and bidirectional. The first searching approach starts with all features in the feature space and then ignores features of lower ranks iteratively. Conversely, the forward approach begins with one feature having the greatest rank and then adds the following features one by one. The bidirectional alternates between the forward and backward strategies. Note that for the three strategies, only the features that improve the classification results are kept, while others are deleted.
2.2.4. Classification
The classification phase was performed in two schemes. The first scheme was an end-to-end deep learning classification with ten CNNs, including InceptionResNet, ResNet-18, ResNet-50, ShuffleNet, Inception V3, MobileNet, Xception, DarkNet-19, DarkNet-53, and DenseNet-201. The second scheme used several machine learning classifiers trained with deep features extracted from the last fully connected layers of the ten CNNs. These classifiers involved a support vector machine (SVM), random forest (RF), K-nearest neighbor (KNN), the linear discriminate classifier (LDA), quadratic discriminate analysis (QDA), and decision tree (DT). The classification step included two levels: binary and multiclass. At the former level, classifiers were used to identify COVID-19 and normal patients. The multiclass level classified images into normal, COVID-19, and cardiac complications. The 10-fold cross-validation method was used to validate the results. The classifiers were run 10 times and the average classification performance of all these runs is displayed in the results section. Classification was carried out in two phases. Phase I used the deep features extracted from the ten CNN models to train the classifiers. Phase II employed the hybrid FS approach to select features used to train the classifiers.
LDA is a popular machine learning technique used for both classification and feature reduction. It searches for the linear combinations of features that have a high ability to explain the data. LDA separates class labels of data using hyperplanes. These planes are achieved by looking for the projection of data points that can minimize their variance and maximize the distance between class labels.
K-NN is a commonly used classifier in the field of machine learning due to its simplicity, straightforwardness, and effectiveness even with noisy data. Although it is simplistic, it has the ability to reach good classification accuracy in medical applications. It allocates a label to every instance in the test data equivalent to the label amongst the k nearest neighbors included in the training data. This label is chosen according to the distance measured between the instance being classified and those instances in the training data. This distance shows that instance in the test data to those in the training data. The distance used in our approach was the Euclidean similarity measure and the number of neighbors (k) was equal to 1 and 5 for binary and multiclass classification levels, respectively, with equal distance weights.
Decision Trees are well-known machine learning classifiers that are widely used in medical applications due to several reasons. They are capable of visualizing interactions between extracted features. This visualization process enables a doctor to easily understand how the classifier decision is made. The DT classifier creates instances of data according to conditions. The DT has a tree structure with a root node whose leaves demonstrate class labels, and the branch nodes present the extracted features and reasons that result in this class label. The nodes of a tree are connected by an arc that represents the condition of the feature. The tree is divided into branches and leaves based on a metric such as information gain, gain ratio, or Gini index. The maximum number of splits in this study was 100, and the splitting criterion was the Gini diversity index.
Random Forest is an ensemble classifier that consists of multiple decision trees. RF uses the divide-and-conquer approach (DAC) to perform classification. The DAC method divides the input feature space into several partitions depending on a goodness metric. Subsequently, the classification outputs of all trees are averaged to produce a final decision. The Gain ratio metric was used in the proposed tool. There, the number of trees was 100.
SVM is a robust machine learning classifier. It transforms linear or nonlinear input data points into a new domain that can easily separate between classes of data. A hyperplane is employed to separate between classes of input data to facilitate classification. A kernel function maps the similarity between the input vector and the new higher-dimension feature space. The linear kernel function was employed.
On the other hand, for retraining the CNNs for end-to-end classification, the learning rate, number of epochs, and minimum batch size were adjusted to 0.0003, 10, and 4, respectively. Whereas the validation frequency was modified to 87 and 131 for binary and multiclass classification levels, respectively. The ten CNNs were trained with the stochastic gradient descent with a momentum algorithm. The other hyperparameters were kept unchanged. The proposed diagnostic tool was implemented using the Weka Data Mining Tool [
86] and MATLAB R2020a.