1. Introduction
Novel Coronavirus disease (COVID-19), also called as Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), is a contagious disease caused by a newly studied out coronavirus that is firstly evaluated as an outbreak but is declared as a pandemic by World Health Organization (WHO) on 11 March 2020 [
1]. Most people infected with COVID-19 experience respiratory illness in mild to moderate stages and recover without special treatment. Contrarily, with elderly and those with underlying medical problems such as chronic respiratory disease, cancer, cardiovascular disease and diabetes, virus is more likely to develop serious life-threatening effects.
For appropriate quarantine and treatment of the disease, it is a priority to screen large numbers of suspected cases to control the spread of COVID-19. Although the clinical symptoms of SARS, MERS and COVID-19 seem similar, differential diagnosis have been recorded to date [
2,
3]. The diagnosis of COVID-19 relies on some criteria as tracking clinical symptoms, epidemiological history and positive X-ray or Computed Tomography (CT) chest images, as well as positive pathogenic testing. The clinical characteristics of COVID-19 includes respiratory symptoms, fever, cough, dyspnea, and pneumonia [
4,
5,
6], which are nonspecific, and may be confused with the other diseases. The definitive test for COVID-19 is the Real-Time Reverse Transcription Polymerase Chain Reaction (RT-PCR) test and is believed to be highly specific but may have false negative instances with as high as 60–71% for detecting COVID-19, which is a real clinical problem [
7,
8,
9]. Due to false negative results of RT-PCR, the complemental practices such as computed tomography (CT) and X-ray in combination with RT-PCR are considered to achieve a more accurate diagnosis in clinical practice [
10]. Thus, laboratory and imaging features in combination with clinical tests are required for a complete clinical characterization of this disease. Clinical findings, laboratory examination, and radiological imaging features of COVID-19-positive patients are also of great importance in improving reliable evaluation and diagnosis. In Diagnosis and Treatment Protocol for Novel Coronavirus Pneumonia (Trial Version 6) published by the National Health Commission of the People’s Republic of China, definitive diagnosis based on chest radiological features has been reported to contribute an important role in the treatment of patients with suspected COVID-19 infection [
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21].
Makris et al. [
22] conducted a study on 9 common Convolutional Neural Networks (CNNs) for the classification of X-ray images recorded on patients with COVID-19, pneumonia, and healthy individuals. Research results emphasized that CNNs have the power to detect respiratory diseases with high accuracy (specifically VGG16 and VGG19 achieved 95% accuracy), although they need a large amount of sample images [
22]. In another study, authors proposed a deep neural network-based method nCOVnet, an alternative on fast screening to detect the COVID-19 by analyzing the X-rays of patients [
23]. Zebin et al. [
24] experimented on convolutional network architecture with VGG-16, ResNet50, and EfficientNetB0 pre-trained on ImageNet dataset for detecting COVID-19 on chest X-ray images. These three backbones achieved the accuracies of 90%, 94.3%, and 96.8%, respectively [
24]. In another study based on machine learning methods, new Fractional Multichannel Exponent Moments (FrMEMs) is used as a feature extractor. The process is parallelized with a multi-core computational framework. Modified Manta-Ray Foraging Optimization based on differential evolution is used to optimize the feature selection process. The proposed method is evaluated with two COVID-19 X-ray datasets and achieved accuracy rates of 96.09% and 98.09% for the first and second datasets, respectively [
25].
In the work of Azemin et al. [
26], ResNet-101 CNN architecture, which is a prominent deep learning technique, is trained with millions of images to detect and classify abnormality found in X-ray images. The outcome of the presented model in terms of AUC, sensitivity, specificity, and accuracy were82%, 77.3%, 71.8%, and 71.9%, respectively [
26]. Rajamaran et al. proposed iteratively pruned deep learning model ensembles to detect COVID-19 on chest X-rays [
27]. In the work of Sahlol et al. [
28], an enhanced hybrid recognition approach is proposed. This method combines CNN and the swarm-based Marine Predators Algorithm to select the most significant features and classify them. An automated Siamese neural network-based pulmonic disease score is introduced for COVID-19 prediction in a clinical study [
29].
In the work of Sitaula and Hossain [
30], a new deep learning framework based on attention module with VGG-16 is proposed. This attention model extracts the spatial relationship between the ROIs in CXR images. Then, four layers of VGG-16 is used in addition to the attention module. Sitaula and Aryal [
31] propose a new Bag of Deep Visual Words (BoVW) technique over deep features. In this work, the feature map normalization step is removed as deep features normalization step is added on the raw feature maps. This step proves to be very significant to distinguish between COVID-19 and pneumonia. Furthermore, in the work of Sitaula et al. [
32], the workflow is the application of BoVW with VGG-16. The extracted features are wired to the SVM, which provided suitable classification accuracy.
In the work of Shorfuzzaman et al. [
33] a new CNN based deep learning fusion method applying the transfer learning concept is presented. By providing a fusion model that can also efficiently identify certain areas on X-ray images, which are related to the disease, the study is expected to assist clinicians to automate the process of COVID-19 detection. The proposed method presents 95.49% accuracy with high sensitivity and specificity. In the work of Hasan et al. [
34] machine learning tools area applied to perform one-hot encoding. Furthermore, several deep learning techniques such as CNN, VGG16, Average Pooling 2D, dropout, flat-ten, dense, and input are used to build a detection model. The proposed model presented 91.69% COVID-19 detection accuracy. Moreover, several other studies [
35,
36,
37] utilize various machine learning, deep learning, and image processing techniques to detect COVID-19, which demonstrates the trend and usability of such approaches in assisting the medical society.
Considering the literature, the purpose of our study is to evaluate the diagnostic performance of a CNN-based ALO system including various classification methods (Softmax, Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Naïve Bayes (NB), Decision Tree (DT) using X-ray images (the early-stage radiological imaging before CT imaging needs) to detect COVID-19. Our motivation to compare various classification methods is to find the classification algorithm that provides better results and combine its strength in classification with CNN features in the deep learning steps [
38]. Our results reported in this work show that this approach can be used for applying deep learning techniques to extract high-level features from X-ray images for COVID-19 diagnosis. To the best of our knowledge, this will be the first study on the optimized classification process of COVID-19 and healthy instances from the chest X-ray images and will enlighten the new studies on AI-based diagnosis systems. In this study, Naïve Bayes classifier and Ant Lion Optimization Algorithm (ALO) are combined with CNN, an approach not utilized before, which presented remarkable results. This work brings a new optimized deep learning application of COVID-19 recognition, which is validated with two datasets, compared with the related studies in the literature and showed remarkable results.
2. Materials and Methods
2.1. Convolutional Neural Network (CNN)
CNN consists of convolutional layers, pooling layers, and fully connected layers between input and output layers. CNN function presents three ideas that are shared weights, local receptive fields, and spatial or temporal subsampling. In the first stage, by using local receptive fields, the neurons extract the initial visual features such as points and edges. The extracted features are connected with intermediate deeper layers to extract high level features such as corners and circles. Finally, the fully connected layer tries to predicate the labels of the data by using high level features that are extracted in the previous layers. Between output layer and the output result, the classifier is located.
2.1.1. Convolutional Layers
Convolutional layers are used to extract features by convolving several filter masks with input feature map and output image of the previous layer [
39]. The features’ matrix (map) consists of two-dimensional weights.
represents mth features map of
layer,
represents weight filter connecting to
th feature matrix (map) of input layer and
is the convolved with input features to produce the output feature. Then, the mathematical model of output feature in layer
is formulated in Equation (1):
where ∗ denotes the convolutional procedure and
represents the activation function which can be Hyperbolic Tangent (tanh), Sigmoid or Rectified Linear Unit (ReLU). The activation functions can be replaced according to the data type.
2.1.2. Pooling Layer
Pooling operation is deployed to downsize the feature maps and offer invariance of the output to shifts and distortions. The pooling layer function reduces the size of the feature map. This leads to a reduction of the computational time of the whole network, which is also important for extracting only the predominant features. Downsizing the feature map also helps prevent overfitting and lessens the parameters that must be trained. Mathematically the pooling layer can be represented as shown in Equation (2):
where
down (⋅) is a type of pooling operation. In this paper, the pooling stage is created with max pooling that selects the largest value from the map in which average pooling selects the average of the matrix.
2.1.3. Fully Connected (FC) Layer
In FC layer, the neurons and the activations in previous layers become in full connection. The output of the layer can be calculated with multiplying the matrix followed by a bias offset. Then, the output of FC becomes a vector which represents high level features of input data. The number of neurons in last FC layer is equal to the number of classes (labels) in the classification problems. The mathematical model of this step is presented in Equation (3):
where
y represents the labels of the data,
indicates the feature vector in K-dimension,
indicates the parameters of the weight vector.
2.2. Classifiers
2.2.1. Softmax Classifier
Softmax function is a core element used in deep learning classification tasks, which is an activation function that convert numbers (logits) into probabilities that sum to one. Softmax function provides a vector as an output that represents the probability distributions of a list of potential outcomes. It is good at multi-dimensional classification instead of binary classification. The mathematical model of Softmax can be seen in Equation (4):
where logits of
matrix [2.0, 1.0, 0.1] is converted into probabilities [0.7, 0.2, 0.1], which adds up to 1.0.
2.2.2. Support Vector Machines (SVM)
SVM technique is utilized to find a hyperplane in an N-dimensional space, where N is the number of features, that classifies the data points clearly. SVM finds the best hyperplane that has the largest margin between two classes. The support vectors are the closest data points to the separating hyperplane; these points are on the boundary as represented in
Figure 1 with + indicating data points of type 1, and—indicating data points of type −1.
2.2.3. K-Nearest Neighbors (KNN)
KNN is a technique that utilizes a distance function, which checks the nearest neighbors to find the class most common among its neighbors. Then the case is classified as being assigned to this common class. The distance function can be Euclidean, Manhattan and Minkowski as seen in Equations (5)–(7):
2.2.4. Naïve Bayes (NB)
NB is a classification algorithm for binary and multi-class classification problems, which is based on Bayes’ Theorem. Rather than calculating the values of each attribute value
P (d1, d2, d3|h), these attributes are considered as conditionally independent. Consequently, the calculation is handled as
P (d1|h) ∗
P (d2|h) and so on, as seen in Equation (8):
where
is a posterior probability (
A is the class as Normal/Abnormal and
B is the predictor),
is the likelihood of the predictor to the class,
is the class prior probability and
is the predictor prior probability.
2.2.5. Decision Trees (DT)
DT approach hierarchically sorts the instances from the root to leaf nodes, which contain a test attribute of the instance. Finally, a branch descending from a node carries one possible value of the attribute. It creates the tree as seen in
Figure 2 using Information Theory.
2.3. Ant Lion Optimization (ALO) Algorithm
ALO algorithm simulates antlions and ants’ interactions in the snare. The interaction can be modelled as the movement of the ants over the search space, as antlions are allowed to hunt them and become fitter using traps. During the search for food, the ants move randomly in the nature. The random walk of the ants represented mathematically as shown in Equation (9):
where
represents the cumulative sum,
n indicates the maximum number of iterations,
t represents the random walk steps, and
represents the stochastic function [
41]. This stochastic function is formulated in Equation (10):
where random walk (iteration) is represented by
t and the random number is generated in range [0, 1] with uniform distribution. Then, during optimization, the positions of ants are saved and utilized as shown in matrix in Equation (11).
Each ant position is saved in matrix called
.
refers to the value of the
variable of
ant. An ant’s number is represented by
and the variable number is represented by
. The fitness function matrix for the ants is displayed in Equation (12):
where each ant’s fitness is saved in matrix called
.
indicates the rate of
dimension of
, the number of ants is represented by
, and the objective function is represented by
.
In addition to ants in the search space, the antlions are tasked to perform hiding somewhere [
42]. The following matrices in Equations (13) and (14) are employed to save their positions and fitness values:
Each antlion position is saved in the
matrix,
indicates the
dimensions value of
antlion, the number of antlions is represented by
, and the number of the variable is represented by
.
All antlion fitness values are saved in a matrix called , indicates the dimension’s value of antlion, the antlion’s number is represented by , and the objective function is represented by .
At each step of optimization, the ant’s positions are updated using random walk based on Equation (9). This equation cannot be applied immediately to update ants’ positions because each search space has a range of variables. The min-max normalization equation, which can be seen in Equation (15), is applied to determine the random walks in range of search space [
42].
The minimum value of the random walk is represented by for variable and the maximum value of the random walk is represented by for variable. Furthermore, the minimum value of variable is represented by at iteration, and the maximum value of the variable is represented by at iteration .
As presented above, the antlion traps affect an ant’s random walks. This hypothesis is represented mathematically as shown in following Equations (16) and (17).
The minimum values of variables at iteration are represented by and the maximum values of all variables at iteration are represented by The minimum value for ant are represented by , the maximum value for ant is represented by , and the position of the determined antlion at iteration is represented by .
A roulette wheel is applied to model the hunting capability of antlions. During the optimization, a roulette wheel operator is utilized by ALO algorithm for determining antlions based on their fitness. This technique promises higher opportunity to the fitter antlions for hunting the ants [
43].
This manner can be modelled mathematically as the radius of an ant’s random walks hyper-sphere is reduced adaptively. The mathematical model is presented below in Equations (18) and (19):
in which I is the ratio, the minimum value of all variables is
at
iteration, and the maximum value included in vector is
at
iteration.
In the last stage, the ant becomes fitter and catching prey occurs. Then, ALO updates its location to the final location of the hunted ant to optimize its chance of catching new prey. This action mathematically represented as shown in Equation (20).
Current iteration is represented by , the position of the determined antlion is represented by at iteration and the position of ant is indicated by at iteration.
Finally, one of the important properties of evolutionary algorithms is elitism, which assists them to keep the best solutions obtained at any level of optimization procedures. In ALO, the best antlion is obtained and saved as an elite, which, in other words, is the fittest antlion. The elite can mathematically be represented as in Equation (21):
where the random walk determined by the roulette wheel around the antlion is represented by
at
iteration, the random walk around the elite is represented by
at
iteration and the position of
ant is represented by
at
iteration [
44].
2.4. Proposed Method
In this study, the proposed method consists of three stages: Data resizing and feature extraction using AlexNet, feature selection using ALO and classification using Bayes Naïve classifier.
2.4.1. Data Resizing and Feature Extraction
In the first stage AlexNet, a pre-trained CNN network is used as a feature extractor. In last few years, AlexNet has shown high performance [
45] and many studies have shown very good results in image classification [
46,
47,
48,
49,
50]. After AlexNet was introduced, Artificial Intelligence (AI) society that studies on image recognition-based classification has focused much more on CNNs and has improved AlexNet’s performance by altering some parameters or number of layers, making it deeper [
46,
48,
49]. AlexNet combines the benefits of Inception-V4 and ResNet50, which can quickly initialize the network model and maintain good network generalization performance. In experiments, it is apparent that there are certain disadvantages to VGGNet. One issue is that it is slow to train. Furthermore, the weights of the network architecture considering disk and bandwidth are quite large. So, AlexNet is better suited for most image classification problems [
51]. This is the reason AlexNet is utilized in this study.
Each X-ray image provided to the system are resized automatically to 227 × 227 × 3 for optimizing the input and making it symmetrical. Symmetry is a vital concept for neural networks. Although it is possible to pass the data as asymmetrical input, an adapter may be needed for conversion to symmetrical input for optimization. The AlexNet is trained to classify images to 1000 classes (for the inner stages) and consists of several layers such as: Convolutional layer, pooling layer, and fully connected layers as shown in
Figure 3. AlexNet consists of five convolution layers with three fully connected layers. The ReLU activation function is applied in all layers of the network. The three fully connected layers consist of 4096-4096-1000 neurons, respectively. The fully connected layers represent the layers 6, 7, and 8. In this study, the features extracted from layer 6 with 4096 features.
2.4.2. Feature Selection
In the second stage, ALO was applied as the feature selection method to reduce the size of the features, which are considered as the output of the fully connected layers. The feature selection function maximizes the classification accuracy and reduces the size of the features to a minimum. Then we formulate the feature selection problem as the objective function presented in Equation (22):
where
Error Rate represents the error rate of the classification model (NB, SVM, DT, SoftMax, KNN).
represents the number of selected features and
represents the total number of features. The two parameters α and β represent the significance of classification characteristic and the subset length, α ∈ [0, 1] and β = (1 − α). Several advantages are obtained when feature selection functions applied to any problem:
Decrease Overfitting: Less redundant features mean less chance to encounter decisions based on noise.
Enhance Accuracy: Less misleading features mean an increase in model accuracy.
Decrease Training Time: Less features means that the classifiers train faster.
2.4.3. Classifiers
In the last stage, the classifiers (Softmax, SVM, KNN, Multiclass Naïve Bayes and DT) are trained in a supervised approach to classify the features that are extracted by AlexNet in the previous stage. The features wired from layer 7 are directed by the fully connected layer to the classifiers. The proposed architecture is visualized in
Figure 4.
In order to measure the classification performance of the proposed application, a confusion matrix was used. A confusion matrix contains information regarding estimated and actual classifications carried out by an algorithm. Performance and outcome of these algorithms are generally measured using the data in the confusion matrix.
Figure 5 shows the confusion matrix for a two-class classifier.
Taking all the correct classified samples into account, accuracy, precision, F1 score, sensitivity and specificity criteria can be calculated through Equations (23)–(27), respectively.
3. Results
In this study, MATLAB2018 was applied to execute our proposed methodology for COVID-2019 detection. The analysis was executed using PC configuration including Intel core i7-670 @ 2.60 GHz CPU and 8 GB of RAM. Random sampling technique is applied to evaluate the proposed method. The goal of using random sampling is to avoid overfitting. Furthermore, the experiment is repeated 5 times and the average values are measured for each case. T-test and P value parameters calculated to evaluate the obtained results after and before applying ALO. In our test environment, ALO considers 45 search agents and 300 iterations as parameters. Two well-known datasets were used to test the performance of the proposed results as shown in
Table 1. Several parameters are calculated such as accuracy, precision and F1 score for each classifier. Comparative results are shown in
Table 2.
Then, our framework that applies ALO to select features from the output of CNN and reduces the size of the features that will be classified. This phase assists in reduction of the computational time and classification complexity. Our novel framework CNN-ALO-classifier presented the best results compared to the classifier without using ALO. The results are presented in
Table 3.
A 2-tailed paired
t-test is applied on the two matched groups with diagnosis of COVID-19 and
p-value is calculated as 0.031011, which is less than the standard level of significance (
p < 0.05). Therefore, a statistically significant difference between using and not using ALO is noted on this dataset from [
53].
Furthermore, the variation of mean squared error of the classifiers versus number of epochs are presented in
Figure 6.
In addition, the variation of accuracy versus number of epochs is provided in
Figure 7. Experimental results show that NB classifier presented best results compared to other approaches (SVM, SoftMax, KNN, DT). Furthermore, the execution times for these classifiers are also acquired and visualized in
Figure 8.
On the other hand, COVID-19 public dataset in [
54] is also used to validate the proposed method. This dataset consists of 460 COVID-19, 1266 normal, and 3418 pneumonia training X-ray images and 116 COVID-19, 314 normal, and 855 pneumonias for testing. The main critical issue in this dataset is that it contains pneumonia that is caused by bacterial infection and not COVID-19.
Figure 9 illustrates images that are randomly selected from the class samples.
In our first iteration, we implemented our CNN+NB method directly on the unbalanced dataset. The results shown in
Table 4.
Furthermore, our method based on CNN + ALO + NB is applied to the same dataset. The results of our method are presented in
Table 5. Our method presented an overall accuracy of 98.9%. Furthermore, it was able to detect COVID-19 cases with 100% accuracy. This means that the proposed method of CNN + ALO + NB can detect COVID-19 cases without any misclassified instances and is not affected negatively with low number of instances of COVID-19.
A 2-tailed paired
t-test is applied on the two matched groups with diagnosis of COVID-19 and
p-value is calculated as 0.041011, which is less than the standard level of significance (
p < 0.05). Consequently, a statistically significant difference between using and not using ALO is noted on the dataset from [
54].
4. Discussion
RT-PCR tests and viral nucleic acid testing serve as the gold standard methods for the diagnosis of COVID-19. However, false negative results reported in early studies may block the prevention and control of outbreak, especially since these tests play an important reference role [
55,
56]. So, clinical tests, laboratory results, image findings, and other epidemiological factors must be carefully examined for the full characterization and correct diagnosis of COVID-19.
In the routine progress, for image analysis, the radiologists, who may be somewhat experienced in interpreting chest X-ray imaging, examine chest X-ray images and decide on positive or negative X-ray findings by consensus. The radiologists also classify the chest X-rays as positive or negative for COVID-19. Accurate assessment is often based on education and experience, but it can be subjective at times. Less experienced radiologists can produce results with enough specificity but low sensitivity in differentiating COVID-19 from viral pneumonia on chest X-rays or CT. This is due to the difficulties to make reproducible radiology evaluations for accurate diagnosis and classification given the urgency, patient burden and hospital facilities in the COVID-19 outbreak. Since radiology includes visual perception as well as decision making under uncertainty, mistakes are inevitable, especially under such limited conditions [
57,
58]. These facts underline the need for immediate and accurate detecting and differentiating methods that can be used in local hospitals and clinics responsible for the diagnosis of COVID-19 and management for patients.
Deep learning approach has proven its potential for different classification tasks with the best results on varying image data sets. This data-driven approach allows for more abstract feature information [
59,
60,
61,
62]. While various deep learning architectures have been researched to address different tasks, the most common deep learning architecture typologies in medical imaging today are CNNs. Thus, in this study, we proposed a CNN-based model to classify COVID-19 from chest X-ray images using transfer learning. Transfer learning or using pre-trained networks in other datasets is often used when dealing with rare or little data with no need for data augmentation progress [
15]. We adopted the transfer learning approach and used AlexNet architecture trained in the patient dataset from COVID-19 and healthy subjects to extract the features. These properties are transferred to the classifiers of the respective models, and the results are compared to the classifiers. The promising results of these classifiers are evaluated and presented for accuracy, precision, and F1-score metrics. NB classifier with Ant Lion Optimization Algorithm and CNN produced the best results with 98.31% accuracy, 100% precision and 98.25% F1-score and with the lowest execution time.
Table 6 and
Table 7 present the comparison between our method based on CNN + ALO + NB and several significant works dealing with detection of COVID-19 using state-of-the art approaches.
Table 6 contains the works, which are tested with the dataset in [
54] and
Table 7 includes the works that are tested with the dataset in [
53].
After viewing
Table 6 and
Table 7 which show the superiority of CNN + ALO + NB method compared to the several studies in the literature, we can note that our proposed method produced better results. Furthermore, the proposed ALO algorithm is compared with three well known algorithms PSO, GA, and Bat.
Figure 10 shows this comparison in which ALO produces better results than the other algorithms.
The main reason of this superiority is related to the usage of ALO as feature selector after feature extraction stage with CNN and before NB. ALO presented high contribution to optimize the system performance according to the following issues:
Random choice of antlions and the usage of a roulette wheel ensure exploration of the search space.
Random walks of ants around the antlions additionally accentuate exploration of the search range around the antlions.
The local optima are resolved by using roulette wheel support and random walk.
ALO approximates the global optima by avoiding the local optima in the population of search agents.
ALO algorithm is flexible and appropriate for solving various problems, as it has small number of adaptive parameters to fine-tune.
PSO is easy to fall into local optimum in high-dimensional space and has a low convergence rate in the iterative process. This causes problems for feature selection, especially from complex data such as COVID-19 X-ray images.
GA is computationally expensive. Consequently, GA implementation requires high amount of optimization. Moreover, designing an objective function and acquiring the representation and operators right can be difficult.
There are some limitations in our study. Successful deep learning models such as AlexNet must be trained with more image information. Nevertheless, the amount of COVID-19 data in this study are hardly available and limited by the fact that there is a shortage of laboratory records during the outbreak. On the other hand, X-ray data of MERS, SARS, and other relevant syndromes are not included in this study. In order to acquire a more comprehensive understanding of COVID-19, it would be suitable to include a greater dataset from a wide geographic scope. Additionally, a deep learning approach with integrated radiology image features and RT-PCR results may make more effective scanning and treatment of COVID-19.