Detection of COVID-19 Cases Based on Deep Learning with X-ray Images

Wang, Zhiqiang; Zhang, Ke; Wang, Bingyan

doi:10.3390/electronics11213511

Open AccessArticle

Detection of COVID-19 Cases Based on Deep Learning with X-ray Images

by

Zhiqiang Wang

^*,

Ke Zhang

and

Bingyan Wang

Beijing Electronic Science & Technology Institute, Department of Cyberspace Security, Beijing 100070, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(21), 3511; https://doi.org/10.3390/electronics11213511

Submission received: 1 October 2022 / Revised: 19 October 2022 / Accepted: 20 October 2022 / Published: 28 October 2022

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Since the outbreak of COVID-19, the coronavirus has caused a massive threat to people’s lives. With the development of artificial intelligence technology, identifying key features in medical images through deep learning, infection cases can be screened quickly and accurately. This paper uses deep-learning-based approaches to classify COVID-19 and normal (healthy) chest X-ray images. To effectively extract medical X-ray image features and improve the detection accuracy of COVID-19 images, this paper extracts the texture features of X-ray images based on the gray level co-occurrence matrix and then realizes feature selection by principal components analysis (PCA) and t-distributed stochastic neighbor embedding (T-SNE) algorithms. To improve the accuracy of X-ray image detection, this paper designs a COVID-19 X-ray image detection model based on the multi-head self-attention mechanism and residual neural network. It applies the multi-head self-attention mechanism to the residual network bottleneck layer. The experimental results show that the multi-head self-attention residual network (MHSA-ResNet) detection model has an accuracy of 95.52% and a precision of 96.02%. It has a good detection effect and can realize the three classifications of COVID-19 pneumonia, common pneumonia, and normal lungs, proving the method’s effectiveness and practicability in this paper.

Keywords:

COVID-19 image detection; deep learning; attention mechanism; residual neural network

1. Introduction

As shown in Figure 1, as of 25 March 2019, COVID-19 had caused more than 400 million infections and more than 6 million deaths worldwide, according to WHO statistics. As the virus mutates, COVID-19 and its variant strains will continue to threaten the medical and health security of countries worldwide for a long time to come. The current method of screening positive cases through the nucleic acid test can quickly screen positive cases of COVID-19, but it has the disadvantage of low sensitivity. The use of medical imaging, such as computed tomography and chest X-ray, has the advantage of high accuracy in the diagnosis of COVID-19 patients. Medical imaging is often used to help doctors diagnose COVID-19 cases in clinical practice.

Different from traditional manual detection, artificial intelligence-assisted diagnosis and treatment technology enable the computer to automatically learn image features in the way of deep learning, save the learning model, and then make judgments on the basis of the model. Its learning speed, learning efficiency, and accuracy rate are better than manual detection. Under the condition that the number of learning samples is enough and accurate enough, the judgment accuracy of machine learning is also extremely high. Therefore, the use of AI-assisted diagnosis and treatment technology can share the pressure of medical workers and help them diagnose COVID-19 quickly and accurately. This technology can achieve early detection, early isolation, and early treatment, so as to reduce the spread of COVID-19 and the risk of infection of social personnel.

To better apply the AI-assisted diagnosis and treatment technology in the detection of COVID-19, a large number of medical image datasets of COVID-19 need to be used in the neural network model for training. This training process requires professional doctors to label image data, lung slice data, and infected areas, which leads to a small number of medical image data sets for COVID-19, resulting in low accuracy of model detection. As the coronavirus continues to spread around the world, tens of thousands of chest X-rays are being generated every day. Today, when the epidemic is becoming normal, it is of practical significance to use AI-assisted diagnosis and treatment technology to relieve the pressure on medical resources and help diagnose COVID-19.

To solve the problem of the smaller medical image data samples and low detection accuracy, this paper uses the deep learning method to carry out research on COVID-19 image detection. This paper studies the feature extraction and neural network model design in the process of medical image detection, and carries out the experiment and analysis. The main contributions include the following:

Designed a feature extraction scheme for X-ray medical image data. To better extract the features of lung X-ray images, based on the feature that medical images are grayscale images, this paper uses the gray level co-occurrence matrix to extract the texture features of the images, and designs a feature extraction scheme. In the preprocessing operation, we clipped and interpolated the image three times, and then used the gray co-occurrence matrix to extract the texture features of the image. Considering that the gray co-occurrence matrix contains a large number of features, we used PCA and T-SNE algorithms to complete feature dimensionality reduction and visualize the data. Finally, we verified the effectiveness of the feature extraction scheme through experiments.
Designed a detection model based on a multi-head self-attention mechanism. Based on the residual network, we added a multi-head self-attention module of the attention mechanism. The multi-head self-attention module inputs the position information, such as image dimension, into the encoder as the position matrix. It performs vector operation together with other feature information such as Query, Key, and Value. In addition, it obtains the final operation result in the decoder, which improves the efficiency of the neural network. The multi-head self-attention module is used to replace the bottleneck block in the residual network, and the residual value is calculated by the multi-head self-attention module. Based on this, the MHSA-ResNet is designed, and the number of parameters and operation efficiency of the neural network are evaluated by experiments.
Tested and analyzed the detection scheme’s effectiveness and the medical images’ detection effect. Firstly, we determined the optimal combination of hyperparameters such as convolution kernel size, activation function, optimization algorithm, learning rate, and iteration times of the detection model through experiments. Secondly, we verified the classification effect of COVID-19, common pneumonia and normal lung images by a confusion matrix. Then, we conducted an ablation experiment to evaluate the multi-head self-attention mechanism and the feature extraction scheme based on a gray co-occurrence matrix. Finally, the proposed scheme was compared with the latest image detection methods of COVID-19, including the long short-term memory neural network, twin neural network, and convolutional neural network, to verify the effectiveness of the proposed scheme.

The rest of the paper is organized as follows: Section 2 discusses the relevant work. Section 3 introduces the materials and methods in detail. Section 4 introduces the relevant experiments, and it analyzes and discusses the experiments and results. Finally, we provide the conclusions in Section 5.

2. Related Work

At present, researchers have conducted a lot of research on image detection of COVID-19, including the use of convolutional neural network for image detection of COVID-19 and the use of attention mechanism for image detection of COVID-19. The following will specifically introduce the research status and progress of image detection of COVID-19.

2.1. Image Detection of COVID-19 Based on the Convolutional Neural Network

Many scholars use the residual network (ResNet) to carry out experiments and complete feature extraction by calculating residual values. For instance, InceptionV3 neural network, ResNet, and InceptionResNetV2 neural network are used in [1]. These three neural networks are used to detect chest X-ray images to determine whether the patient is infected with COVID-19. In [2], a neural network model based on X-ray images was designed to detect COVID-19. The model implements two tasks. The first is to perform medical image classification, classifying images of patients with COVID-19 and images of normal lungs. The second task is to detect the image by defining the abnormal score in the image detection task to detect COVID-19.

Unlike the ResNet method, Ref. [3] used a transfer learning method for training based on visual geometry group (VGG). In [4], a Bayesian convolutional neural network is used for training, and the model’s weight is dynamically adjusted to achieve the training purpose. The authors collected chest X-ray images of COVID-19 positive patients and normal patients from the publicly available kaggle dataset. The authors found through experiments that the accuracy rate of the Bayesian convolutional neural network was 92.9%. Compared with VGG, the detection effect of this method is much better. The authors annotated the focus of neural network on the lung heat map of COVID-19 X-ray, and proposed the optimization direction through the subsequent work of the heat map to further improve the accuracy of detection.

In addition, on the basis of the U-Net neural network, researchers segmented the lung image lesion region, improved the encoder in the feature extraction process to achieve a mosaic feature effect, and thus completed the training on a convolutional neural network [5,6,7,8]. Researchers used a convolutional neural network to perform multiple classifications of datasets and divided the datasets into the normal lung, COVID-19, other pneumonia and other types to achieve the effect of multiple classifications [9,10,11,12]. In addition, based on U-Net, Ref. [13] adopted the metric-based method in small-sample learning to perform semantic segmentation in the pulmonary infection area of COVID-19 cases and designed an algorithm to dynamically fine-tune the weight of U-Net. Compared with traditional methods, the main difference of this algorithm is that it adopts an online learning paradigm instead of U-Net static supervised learning, which makes it more effective in segmenting the pulmonary infection region in X-ray images. The authors finally achieved 94.8% accuracy on the experimental dataset. When the sample of the experimental dataset is small, the detection accuracy of the convolutional neural network is not very high. When a convolutional neural network is combined with image segmentation technology, detection accuracy can be improved. However, there are also problems of losing feature information and low efficiency.

From the above literature, it can be found that convolutional neural network is widely used in medical image classification and medical image segmentation. However, the detection accuracy of the convolutional neural network is not very high when the sample of dataset adopted in the experiment is small. When the convolutional neural network is combined with the image segmentation technology, the detection accuracy can be improved.

2.2. Image Detection of COVID-19 Based on an Attention Mechanism

The memory aggregation network (MA-Net) is proposed in [14], which improves the model by adding an attention mechanism on the pooling layer based on the residual neural network. The authors also applied the multi-scale attention-guided deep network of soft distance regularization (MAG-SD) to automatically classify COVID-19 case images in lung X-ray images. This method can improve the robustness of the training model, solve the problem of lack of training data, and achieve the effect of enhancing image expression and reducing noise. Ref. [15] constructed deep supervised learning with a self-adaptive auxiliary loss (DSN-SAAL) based on the attention mechanism to identify images of COVID-19 cases. The authors added the attention mechanism module to the convolutional neural network to complete the model design, such as the pooling layer of the convolutional neural network. The attention mechanism is used to strengthen the receptive field area of the image for feature extraction to improve the effect of model feature recognition [16,17].

By combining the attention mechanism with the residual network, researchers can calculate the residual value in the residual network through the attention mechanism and obtain high-level features in images to improve the extraction efficiency. Ref. [18] proposed a dual-sampling attention network model based on ResNet34 and adds a dual-sampling strategy to alleviate the imbalance in the dataset. In [19], a deep 3D Multi-instance learning neural network based on attention mechanism was proposed, and the pooling method based on attention mechanism was applied to 3D lung computed tomography (CT) images. Based on the residual neural network, Ref. [20] integrated the attention mechanism and used the attention mechanism to complete the calculation of the residual value in the residual neural network, so as to identify the X-ray images of COVID-19. In the neural network, the residual blocks can capture the advanced features of the image and input them into the attention module. Stacking multiple attention modules between the residual blocks can prevent the model from overfitting. On the basis of image segmentation, the excised lesions were fused with the attention module to complete the detection. In [21], a deep neural network based on focus attention was proposed to identify positive cases of COVID-19 through label data of lung CT images.

At present, most of the methods of applying deep learning technology to medical image detection are completed based on a convolutional neural network. The convolutional neural network uses a large number of labeled data sets to train the model to achieve the ideal detection effect. Although the technology of convolutional neural network is mature in image classification, image segmentation, and other problems, it still has the problems of losing feature information and low efficiency. In contrast, the neural network with an attention mechanism can make the model better extract the image’s relevant features to improve the model’s detection accuracy.

3. Materials and Methods

This section introduces the datasets and methods used to detect and classify lung X-ray medical images. It outlines the datasets, feature processing techniques, feature selection methods, classification methods, experimental environment and evaluation indicators in this study. The detection process of X-ray images of COVID-19 is shown in Figure 2.

In the detection process, it is necessary to convert the digital imaging and communications in medicine (DICOM) files of X-ray images into portable network graphics (PNG) and other image format files, and then preprocesses the obtained images. The preprocessing includes three interpolation operations and image cropping operations to ensure image normalization. In the feature extraction part, this paper adopts a gray co-occurrence matrix to extract the texture features of medical images and combines PCA and T-SNE algorithms for feature dimension reduction visualization operation. After neural network training, the lung X-ray image detection task is completed. Then, the neural network is trained to complete the task of lung X-ray image detection.

3.1. Dataset

The datasets used in this paper are from the open source lung X-ray datasets from Github, Kaggle, and other websites. We also use image datasets collated in the Covid-Net literature [3,22], where the kaggle dataset is composed of research institutions such as universities and hospitals from places such as Qatar. The researchers created a database of chest X-ray images used to identify positive cases of COVID-19, which is still being updated. The database in Github [23] is annotated and processed by radiologists from Tongji Hospital in Wuhan, China, and contains chest X-ray images of patients with COVID-19 received by Tongji Hospital in Wuhan from January to April 2020. Another open source dataset used on Github [24], from universities in Canada, includes chest X-ray images of COVID-19 and other viral pneumonia.

In this paper, all pneumonia except COVID-19 was classified as common pneumonia because detecting other viral pneumonia was not the focus of this test task. The datasets contain approximately 468 positive cases of COVID-19 and 173 other pneumonia. In this paper, the sorted datasets are divided into three categories: COVID-19, common pneumonia, and normal lungs, and the datasets are divided into the training set, test set, and validation set. The classification of datasets is shown in Table 1. In order to ensure that the model can be fully trained, this paper adjusted the weight parameters during training to make the weight of COVID-19 higher than that of ordinary pneumonia to balance the loss [25].

3.2. Image Preprocessing

The X-ray medical image format is usually stored in DICOM format. DICOM is an international standard for medical imaging and related information, and one of the most widely used medical and healthcare information standards in the world at present. The data collected in this paper are also stored in DICOM format. When deep learning technology is used for image detection, DICOM files need to be first transformed into commonly used image format files. The pseudo-code of DICOM file transformation is shown in Algorithm 1.

Algorithm 1 DICOM file conversion

Input: DICOM File

Output:

^{*}

.png File

For dcmfile in dcmlist:

IF (dcmfile is “.dcm”) Then //Read the dcm file

dcm_file = read(dcmfile)

create new txtfile with “.txt”

(name of txtfile) = (name of dexfile)

txtfile ←

w r i t e (h e x (d e x_f i l e))

EndIF

EndFor

txtlist = list of txt_path // Get all the names of txt

For txtfile in txtlist:

IF (txtfile is “.txt”) Then //Convert a dcm file to a png file

create new xrayfile with “.png”

name of xrayfile = name of txtfile

count←

r e a d_l i n e s (t x t f i l e)

x∧2←

c o u n t

y∧2 ←

c o u n t

dcmfile = Image.new(“Xray”,(x,y))

dcmfile = image_read_pixel(txtfile)

EndIF

EndFor

After DICOM file conversion, the open source computer vision library (OpenCV) is also needed to complete image preprocessing operations, including image cropping and cubic interpolation operations. Image cropping can remove unwanted information and determine the receptive field area, which improves the accuracy and speed of processing. Cubic interpolation is to use cubic polynomials to get better interpolation function results, through the interpolation function to calculate the weight of 16 gray values around the location to be calculated, and then according to the weight of 16 gray values calculated to obtain the gray value of the location to be calculated. The preprocessing operation can improve the speed of feature processing, and the image features can be processed better. Because most medical images are gray maps, after the completion of the preprocessing operation, this paper also generates a gray histogram to describe the gray distribution in X-ray images. All the pixels in the digital image can be counted, and the frequency of their appearance can be counted according to the size of the gray value, so as to facilitate the statistics of where the features in the X-ray image appear. The gray scale is shown in Figure 3. The horizontal axis represents the gray level, which is set to 256, and the vertical axis represents the number of times each gray level occurs. Figure 3 can visually show which part of the medical image has a larger gray value and further identify the location of the lesion.

3.3. Feature Selection

In this paper, the gray level co-occurrence matrix (GLCM) is used to extract image texture features. Considering that the lung X-ray images contain a large number of features in the gray-level co-occurrence matrix, a dimensionality reduction algorithm is needed to complete the processing. In this paper, the most widely used PCA algorithm is selected, and the T-SNE algorithm is used to complete the visualization operation.

3.3.1. Gray Level Co-Occurrence Matrix (GLCM)

GLCM is one of the important methods for image feature analysis and extraction. It describes image texture by studying the spatial correlation of the gray level. In this paper, the gray level co-occurrence matrix is used to complete the texture feature extraction of medical images, aiming at the application of small-scale datasets to extract image features. The transformation of the gray level co-occurrence matrix is shown in Figure 4.

The gray level co-occurrence matrix adopts four angles: 0°, 45°, 90°, and 135°, which respectively represent different directions of each gray level. Taking Figure 4 as an example, the process of converting a grayscale map with a horizontal direction of 0° and offset distance of 1 into a gray level co-occurrence matrix is shown. In the grayscale map on the left, there is one pair of adjacent horizontal numbers of “0” and “0” in the upper left corner, so the value at [0, 0] in GLCM is 1. There are two pairs of adjacent horizontal numbers of “0” and “1” in the grayscale map, so the value at [0, 1] in GLCM is 2. By analogy, the whole gray level co-occurrence matrix can be obtained. According to the situation, grayscale maps with different angles of 45°, 90°, and 135° can be selected. The matrices with different angles represent different directions of each gray level, and different offset distances can be selected to generate a grayscale co-occurrence matrix according to needs.

After obtaining a GLCM, the texture feature parameters can be further obtained from the gray level co-occurrence matrix, and these parameters can describe the texture features of the image. There are up to 14 kinds of eigenvalues [26] in the gray level co-occurrence matrix for statistics: energy, entropy, contrast, uniformity, correlation, variance, sum average, sum variance, sum entropy, difference variance, difference average, difference entropy, correlation information measure, and maximum correlation coefficient. This paper did not use all these feature data in the experiment, but selected some features [27], including angular second moment, correlation, entropy, contrast, contrast score matrix, and energy. Figure 5 shows a lung X-ray image processed with GLCM features.

Angular second moment (ASM): Energy is the sum of the squares of the elements of the gray level co-occurrence matrix to describe the uniformity of image gray level distribution and the thickness of texture. The calculation formula is as:

A S M = \sum_{i} \sum_{j} p {(i, j)}^{2}

(1)

Correlation: The correlation degree reflects the similarity degree of spatial gray level co-occurrence matrix elements in the row or column direction, and reflects the local gray level correlation of the image.

Entropy: Measures the randomness (intensity distribution) of an image texture. Entropy is a random measure of the amount of information contained in an image, which shows the complexity of an image.

Contrast: Contrast reflects the sharpness of an image and the depth of furrows in the texture. How the value of the metric matrix is distributed and how much local variation in the image reflects the sharpness of the image and the depth of the furrow in the texture.

Inverse Differential Moment (IDM): The IDM reflects the homogeneity of texture (clarity and regularity) and measures the local changes in image texture.

Energy is the sum of the squares of each element value of the gray level co-occurrence matrix. It is a measure of the stability of the gray level change of the image texture and reflects the uniformity of the gray level distribution of the image and the thickness of the texture.

3.3.2. Principal Component Analysis (PCA)

PCA [28] is a very widely used feature dimensionality reduction algorithm, which can reduce the complexity of data and retain the original data information. PCA uses linear transformation methods such as the eigenvalue decomposition covariance matrix or singular value decomposition covariance matrix to reduce the dimension of original features. To reduce the N-dimensional data to K-dimensional data, the PCA algorithm contains the following steps:

Find K vectors, project the data onto the space expanded by these K vectors, and find the minimum error;
Calculate the covariance of these K vectors, obtain the eigenvalues and eigenvectors of these covariances, and normalize these eigenvectors;
Sort the calculated eigenvalues in order from the largest to the smallest, select the largest K eigenvectors, and then take the corresponding K eigenvectors as column vectors to form the eigenvector matrix;
Project the sample onto the selected feature vector to complete the simplification of data.

3.3.3. T-Distributed Random Neighborhood Embedding Algorithm (T-SNE)

T-SNE [29] is a widely used technology to reduce dimension and visualize its features. The principle of the algorithm is as follows: similar data points in the high-dimensional space are mapped to similar distances in the low-dimensional space. The gradient formula can thus be simplified by using a T-distribution instead of a Gaussian distribution in low-dimensional Spaces. Its distribution probability formula is as:

q_{i j} = \frac{(1 + | | y_{i} - y_{j} {| |}^{2})^{- 1}}{\sum_{k \neq l} (1 + | | y_{i} - y_{j} {| |}^{2})^{- 1}}

(2)

Let the points mapped from the higher dimensional space to the lower dimensions be

y_{i}

and

y_{j}

, and k be the reference point.

q_{i j}

is the probability distribution function that we set. Because we are using a T-distribution instead of a Gaussian in our T-SNE algorithm, we set

σ

to be

\frac{1}{\sqrt{2}}

. The gradient of the T-SNE algorithm can be calculated through the distribution probability, and its gradient calculation formula is as:

\frac{\partial C}{\partial y_{i}} = 4 \sum_{j} (p_{i j} - q_{i j}) (y_{i} - y_{j}) (1 + | | y_{i} - y_{j} {| |}^{2})^{- 1}

(3)

In the formula, C represents the cost function,

p_{i j}

is the probability distribution function of the picture in the higher dimension, and

q_{i j}

is the probability distribution function of the picture in the lower dimension. Through the gradient descent method, the gradient of its descent at each point is quickly calculated to achieve the optimal local solution, so as to facilitate the T-SNE algorithm.

T-SNE algorithm is shown in Algorithm 2. The algorithm calculates the probability distribution of each data in the dataset, and compares the calculated cross-entropy with the perplexity value to update the gradient descent rate. The perplexity value can be understood as the number of valid neighboring points around the data point.

Algorithm 2 T-SNE algorithm

Input: Dataset X = {x1, x2, …, xn}

Calculate the perplexity

//The iteration number is T, the gradient descent rate is y(t), and the learning rate is

α

Begin

compute p_{j|i} with perplexity Perp //Calculate the conditional probability of perplexity

compute P_{ij}

initial y(0) = {y1, y2, …, yn}

for t = 1 to T

compute q_{ij} // Compute q in lower dimensions

compute gradient

update y(t) //Update the gradient

end

End

3.3.4. Design of the Feature Extraction Scheme

During feature extraction, in order to make the features of medical images prominent, the GLCM was used to extract the texture features of lung X-ray images. The GLCM with the distance of

0^{\circ}

,

45^{\circ}

,

90^{\circ}

, and

135^{\circ}

was obtained respectively, and the sliding window size was

5 \times 5

, and its eigenvalues were obtained. The feature extraction algorithm based on GLCM is depicted in Algorithm 3.

Algorithm 3 Pseudo-code of feature extraction scheme

Input: Source image of size M×N

Output: GLCM in four directions, and eigenvalues

//Initialize four

T \times T

GLCMs to store GLCM

0^{\circ}

, GLCM

45^{\circ}

, GLCM

90^{\circ}

, and GLCM135

for i = 0 to M − 1 do

for j = 0 to N −1 do

if j < N − 1 then //GLCM

0^{\circ}

GLCM

0^{\circ}

[Edge[i][j], Edge[i], [j + d]]⇐ GLCM

0^{\circ}

[Edge[i][j], Edge[i], [j + d]] + 1

end if

if i < M − 1 and j < N − 1 then //GLCM

45^{\circ}

GLCM

45^{\circ}

[Edge[i][j], Edge[i + 1], [j + 1]]⇐ GLCM

45^{\circ}

[Edge[i][j], Edge[i + 1], [j + 1]] + 1

end if

if i < M − 1 then //GLCM

90^{\circ}

GLCM

90^{\circ}

[Edge[i][j], Edge[i + 1], [j]]⇐ GLCM

90^{\circ}

[Edge[i][j], Edge[i + 1], [j]] + 1

end if

if i < M − 1 and j > N − 1 then //GLCM135

GLCM135[Edge[i][j], Edge[i + 1], [j − 1]]⇐ GLCM135[Edge[i][j], Edge[i + 1], [j − 1]] + 1

end if

end for

//Use the greycoprops package to extract the feature values in the grayscale co-occurrence matrix

//Extract {‘contrast’, ‘entropy’, ‘homogeneity’, ‘energy’, ’correlation’, ’ASM’}

greycoprops(GLCM)

In view of the large number of features generated by the gray level co-occurrence matrix, in order to improve the efficiency of operation, PCA and T-SNE are used to reduce the dimension algorithm, and the data are visualized to complete the feature extraction scheme. Figure 6 depicts the data results visualized by PCA and T-SNE dimensionality reduction. The left figure shows the effect after PCA dimensionality reduction, and the right figure shows the visualization effect of T-SNE. In the figure, red represents the X-ray images of COVID-19, brown is the X-ray images of normal lungs, and purple is the X-ray images of common pneumonia. The dimension of the X-ray images was reduced and visualized.

3.4. Classification Methods

3.4.1. Design of Detection Model

This section introduces the neural network model designed in this paper, which is named MHSA-ResNet, according to its characteristics. The neural network model was used to detect X-ray images of COVID-19. The neural network model is composed of the following parts, including the convolutional layer, pooling layer, bottleneck layer, and fully connected layer. Among them, the convolution layer in the residual network is divided into several groups to facilitate the calculation of residual values. In the bottleneck layer, the convolution kernel of 1 × 1 is replaced by the multi-head self-attention module to facilitate the extraction of features, shorten the training time and improve the operation efficiency of the neural network. This section also describes the model training process in detail. The sample diagram of the MHSA-ResNet neural network is shown in Figure 7.

Convolution layer: The convolution layer performs convolution operation on the input data and extracts useful information from it. The convolution parameters include the size of the convolution kernel, the step size of each movement of the convolution kernel, and other parameters.

Pooling layer: The pooling layer provides a nonlinear operation that can be performed from a sequence of values in the input matrix and returned. The parameters in the pooling layer include kernel size, which means the window size in the pooling layer. In the model,

3 \times 3

means the length and width of the pooling window are

3 \times 3

. The stride parameter represents the moving distance of each step of the pooling window. In the model, 2 represents the moving of two pixels after each pooling operation. The type parameter indicates the type of the pooling operation, including the maximum pooling operation and the average pooling operation. The maximum pooling layer refers to the retention of the maximum output value in the pooling window matrix. In the average pooling operation, all the values in the pooling window matrix are averaged for retention.

Fully connected layer: parameter num output represents the number of neurons in the fully connected layer, and parameter activation function represents the activation function of the fully connected layer. The ReLu activation function is used in fully connected layer 1 and fully connected layer 2, and the Softmax activation function is used in the last fully connected layer. In the fully connected layer, L2 regularization will be used, that is, the Euclidean norm penalty with weakened weight. The L2 regularization operation will cause the objective function to input a penalty factor for the sum of squares of the weight data so that the weight parameter is closer to the origin.

Bottleneck layer: The bottleneck layer has the structure of a bottleneck block. Specifically, it uses

1 \times 1

convolution block, which is applied in the convolutional neural network and set between two convolution layers of different sizes and different channels.

The training process is as follows: in the first convolution layer, the size of the convolution kernel in this layer is set as

7 \times 7

, the stride length is 2, and the output image size is

512 \times 512

. The vector operated through the convolution layer is input into the

3 \times 3

maximum pooling layer, and the stride length in the pooling layer is 2. The same convolution operation is used in the second convolutional layer, but the bottleneck structure is added in this layer. The bottleneck structure is composed of three convolution blocks. In the bottleneck layer, a

1 \times 1

convolution block is first passed to change the number of channels, and then a

3 \times 3

convolution block is used to complete the convolution operation. Finally, a

1 \times 1

convolution block is used to restore the number of channels. There are multiple bottleneck blocks in the residual network, which can complete the task of calculating the residual value. The same operation is performed in the third convolutional layer, with the difference in the number of bottleneck blocks and the number of channels. The multi-head self-attention module is added to convolutional layer 4 and convolutional layer 5. The specific model architecture is shown in Table 2.

According to the model listed in Table 2, the residual network is divided into five convolutional layers. In addition to the first convolutional layer, the remaining four convolutional layers complete the operation of calculating the residual value through the bottleneck block, and there is no significant difference in parameters such as convolution kernel, convolution kernel movement step size, and pooling mode. Compared with ResNet50, ResNet152 thickens the third and fourth convolutional layers by increasing the number of bottleneck blocks based on its settings, and the number of output channels and the size of output images do not change.

The MHSA-ResNet model designed in this paper is based on ResNet152, and a multi-head attention module is added to the bottleneck block in the last two convolution layers so as to reduce the computation amount and improve the training accuracy of the neural network model without changing the number of channels and the size of each convolution layer.

3.4.2. Multi-Head Self-Attention Module Design

The attention mechanism adopted in this paper is the multi-head self-attention mechanism [30], which combines the characteristics of self-attention and multi-head attention, and its structure is shown in Figure 8. ⨁ means the sum of the matrix elements by elements, and ⨂ means the multiplication of each matrix. When the image is executed on the 2D feature map, the height and width of the image feature are input to calculate the range of the segmented receptive field so as to obtain the relative position codes h and w. The relative position code on the left side of the figure is the matrix calculated by calculating h and w. Through the addition of the relative position matrix, query matrix, and key matrix, the calculation of attention is completed. The multi-head self-attention formula is as:

M H S A = s o f t m a x (\sum (Q K^{T} + Q R^{T})) V

(4)

R in the formula is the relative position matrix of the width and height of the 2D feature map obtained by adding elements [31]. Using the relative position matrix to help the image to recognize attention features can effectively improve the efficiency of using attention. By location coding can effectively obtain the model’s ability to capture the order of sequence. In the multi-head self-attention, each matrix is carried out by the method of

1 \times 1

point-by-point convolution to ensure the recognition accuracy of attention.

In order to fuse the multi-head self-attention module with the bottleneck layer in the residual network, the MHSA attention module is designed in the bottleneck layer.

In the design of the MHSA module, it is necessary to generate the feature map according to the length and width of the image, and the dimension and other features are also used as the parameters of the feature map to calculate the location matrix. In order to combine the MHSA module with the residual network, this paper uses the MHSA module in the last two layers of the residual neural network embedded in the bottleneck layer of the residual network to complete the design of the neural network. Figure 9 shows the structure of the bottleneck layer.

The calculation of the residual value needs to go through the convolution layer, and the task of modifying the number of channels and the shape of the image is completed through the

1 \times 1

convolution block in the convolution layer. In the middle is the MHSA bottleneck block based on attention. After feature extraction through the attention mechanism, the last

1 \times 1

convolution block is used to restore the number of channels. The output function after the last operation of the bottleneck block is

F (x)

. At the same time, the part of the input shape that has not been processed by the convolution block is set as x, and the sum of the two is

F (x) + x

. After the activation function and the regularization operation, we obtain the operation to compute the residual shortcut.

3.4.3. Evaluating the Complexity of the Detection Model

In order to verify the effectiveness of the neural network detection model designed in this paper, this section evaluates the complexity of the neural network through experiments. The complexity of the model is generally evaluated by calculating quantity and parameter quantity. The algorithm complexity is as:

T i m e \sim O (\sum_{l = 1}^{D} M_{l}^{2} \cdot K_{l}^{2} \cdot C_{l - 1} \cdot C_{l})

(5)

S p a c e \sim O (\sum_{l = 1}^{D} K_{l}^{2} \cdot C_{l - 1} \cdot C_{l} + \sum_{l = 1}^{D} M_{l}^{2} \cdot C_{l})

(6)

Formula (5) describes the computational time complexity (computational quantity), and Formula (6) describes the computational space complexity (parameter quantity), where K represents the kernel parameter in the neural network, C represents the number of channels, L represents the number of layers in the neural network, and M represents the feature map parameter. The time complexity and space complexity of neural networks calculated by formulas still need to be evaluated by relevant indicators. The evaluation indicators include:

Params: The number of parameters of the model refers to the total number of parameters to be trained in the neural network, which directly determines the size of the model and also affects the memory usage during inference. The unit is generally M.

FLOPs: Floating-point operations per second (FLOPs) refers to the number of floating-point operations, which measures the time complexity of a network model, expressed in giga floating-point operations per second (GFLOPs).

In the convolution layer, because the weights in the convolution kernel are shared, the calculation formula for parameter number and the calculation formula for FLOPs are as:

P a r a m s = K_{i n} \times K_{o u t} \times C_{i n} \times C_{o u t} + C_{o u t}

(7)

F L O P s = 2 K_{i n} \times K_{o u t} \times C_{i n} \times C_{o u t} \times H_{o u t} \times W_{o u t}

(8)

The number of parameters in the convolution layer is calculated by multiplying the number of output and output channels by the input convolution kernel and output convolution kernel and finally adding the output feature map. When computing FLOPs, you need to output the feature map’s height, width, and the number of channels to complete the calculation.

In the fully connected layer, because there is no weight sharing, the FLOPs value of the layer is equal to the number of parameters in the layer. The fully connected layer is mainly used to compute the addition operation and the multiplication operation in each neuron. The calculation formula of parameter quantity and FLOPs are as:

P a r a m s = N_{i n} \times N_{o u t} + N_{o u t}

(9)

F L O P s = (2 N_{i n} - 1) \times N_{o u t}

(10)

Through the above formulas, this paper can count the number of module parameters of the neural network through the tensorboard package. In Table 3, the paper counts the number of parameters of the neural network designed in this paper.

Through experiments, it is found that the MHSA-ResNet neural network designed in this paper reduces the number of parameters needed to be calculated and improves the speed of network operation through the attention mechanism, which proves the optimization effect of the attention mechanism on the neural network, so that the recognition efficiency of COVID-19 images in this paper is higher.

3.5. Experimental Situation

The experimental environment is shown in Table 4. It uses Windows 10 system and Intel(R) Core(TM) i7-4790 CPU 3.6 GHz 16 GB. Each part is written in Python language, and the neural network model is realized with the help of the TensorFlow deep learning framework. In the experiment, NVIDIA GPU GeForce RTX 2070 is used to accelerate the training and operation of the neural network model.

3.6. Evaluation Metrics

Model performance is tested using performance metrics, and the results can be represented in a table called the confusion matrix, as shown in Table 5, which has four parameter types.

True positive example rate, also known as sensitivity or recall rate, refers to the proportion between the number of positive cases detected and the actual number of positive cases. The calculation formula is as:

T P R = \frac{T P}{P} = \frac{T P}{T P + F N}

(11)

False negative example rate, that is, the proportion of the number of positive samples with negative examples detected in the actual number of positive examples, is calculated using the following formula:

F N R = \frac{F N}{P} = \frac{F N}{T P + F N}

(12)

False positive example rate, that is, the proportion of the number of negative examples detected as positive examples in the actual number of negative examples, is calculated by the following formula:

F P R = \frac{F P}{N} = \frac{F P}{F P + T N}

(13)

True negative example rate, that is, the proportion of the number of negative examples detected as negative examples to the actual number of negative examples, is calculated using the following formula:

T N R = \frac{T N}{N} = \frac{T N}{F P + T N}

(14)

Accuracy refers to the ability of the model to detect samples in the whole dataset, that is, the ability of the model to detect positive samples as positive examples and negative samples as negative examples. The calculation formula is as:

A C C = \frac{T P + T N}{P + N} = \frac{T P + T N}{T P + F N + F P + T N}

(15)

Precision refers to the ratio between the number of actual positive samples and the number of detected positive samples. The calculation formula is as:

P R E = \frac{T P}{T P + F P}

(16)

Recall refers to the proportion of the number of positive samples with accurate prediction to the number of actual positive samples, in which the actual positive samples include the positive samples with accurate prediction and the negative samples with incorrect prediction. The calculation formula is as:

R E C = \frac{T P}{T P + F N}

(17)

F-measure is the weighted average of accuracy and recall and is closer to the smaller of the two. The formula for calculating the F1 metric is generally as:

F_{m e a s u r e} = \frac{(a^{2} + 1) \times P R E \times R E C}{a^{2} \times (P R E + R E C)}

(18)

F 1 = \frac{2 \times P R E \times R E C}{P R E + R E C} \times α = 1

(19)

Macro avg is the weighted sum average of precision, recall, and F1-score for each category in the confusion matrix. Calculations can be made for specific categories. The calculation formula is as:

P_{m i c r o} = \frac{\bar{T P}}{\bar{T P} + \bar{F P}} = \frac{\sum_{i = 1}^{n} T P_{i}}{\sum_{i = 1}^{n} T P_{i} + \sum_{i = 1}^{n} F P_{i}}

(20)

Micro avg is to establish a global confusion matrix for all types in the dataset regardless of category and then calculate the index. In Micro avg, the category with a large sample number dominates the class with a small sample number, and the Micro average of precision rate, recall rate, and F1-score can be calculated, respectively. The calculation formula is as:

R_{m i c r o} = \frac{\bar{T P}}{\bar{T P} + \bar{F N}} = \frac{\sum_{i = 1}^{n} T P_{i}}{\sum_{i = 1}^{n} T P_{i} + \sum_{i = 1}^{n} F N_{i}}

(21)

F_{m i c r o} = \frac{2 \times P_{m i c r o} \times R_{m i c r o}}{P_{m i c r o} + R_{m i c r o}}

(22)

Weighted avg: The proportion of the number of samples in each class to the total number of samples in all classes is used as the weight of calculation. It can effectively solve the problem of training data imbalance in classification.

4. Results and Discussion

4.1. Comparison Experiment of Different Hyperparameters

The design of the deep neural network model is the core part of the implementation of the COVID-19 image detection model. Based on the backbone network, this section adjusts the hyperparameters, optimizes the model, and evaluates the model’s accuracy on the experimental and validation datasets.

4.1.1. Size of the Convolution Kernel and Pooling Mode

The backbone network used in this paper is improved on the premise of the residual network, which is only improved on the bottleneck block of the residual network. Therefore, the size of the convolution kernel and pooling method are consistent with the settings of the ResNet residual network. The convolution kernel

[3, 3]

is used in the convolution layer, while the convolution kernel

[1, 1]

is used in the bottleneck layer. The pooling method is the maximum pooling layer.

4.1.2. Comparative Experiment of Activation Function

In this experiment, the neural network model is compared with three activation functions, and the evaluation results are shown in Table 6. Through the experiment, it can be found that, when the rectified liner uint (ReLU) activation function is used, the neural network can reach the highest accuracy of 95.52%. ReLu function is the most common activation function in the field of deep learning, which alleviates the problem of vanishing gradient. The most widely used ReLU activation function has been selected for this paper.

4.1.3. Comparative Experiment of the Optimization Algorithm

In this paper, three optimization algorithms are evaluated through experiments, and the experimental results are shown in Table 7. Momentum can enhance the stability and convergence speed of gradient descent and is a method to optimize gradient descent. Adam and root mean square prop (RMSProp) are adaptive learning rate algorithms, and their learning rate is not fixed but automatically adjusts the learning rate according to the task situation. The RMSProp algorithm uses an exponentially weighted moving average instead of a gradient sum of squares. Adam algorithm combines the characteristics of Momentum and RMSProp, which is a kind of driven RMSProp. It uses gradient first-order matrix evaluation and second-order matrix evaluation to automatically adjust the learning rate of model parameters. Through the experiment, it can be found that the accuracy of Adam optimization algorithm can reach 95.52%. Compared with the Momentum algorithm and RMSProp algorithm, it can achieve a better convergence rate, and the model learning effect is more ideal.

4.1.4. Learning Rate

We adopt the Adam adaptive learning rate optimization algorithm to adjust the learning rate according to the situation. In many cases, the ideal effect can be achieved by setting the default value of the learning rate to

0.001

. However, the selection of the learning rate should be adjusted according to the actual situation in the experiment. The experimental results of different learning rates through Adam optimization algorithm are shown in Table 8.

According to the experimental results, when the learning rate is set to

0.001

, the best detection effect can be achieved, achieving the best detection effect of 95.52%. When the learning rate is set too large, it will bring unsatisfactory results to the experiment. For example, when the learning rate in the experiment is

0.1

, the gradient cannot converge quickly, and the accuracy is only 48.03%. In the experiment, when the learning rate is set to

0.0001

, it will not contribute greatly to the improvement of the experimental results.

4.1.5. Number of Iterations

In this paper, the appropriate number of iterations is determined through comparative experiments of different iterations, and the performance of the neural network model under different times is analyzed. The experimental results are shown in Table 9. Because the dataset we used was not very large, the number of iterations was not very many, and the interval of iterations in each experiment was 5. Through experiments, it can be found that the highest accuracy is 95.52% when the number of iterations is set to 45, and the accuracy of the model decreases with the increase of the number of iterations.

4.2. Parameter Settings and Final Experimental Results of Neural Networks

Through the hyperparameter comparison experiment in the previous section, this paper determines the hyperparameter configuration of the neural network and also proves the effectiveness of adding a multi-head self-attention module. In this paper, the Adam algorithm is used as the optimization algorithm, the learning rate is set to

0.001

, the Dropout rate is set to

0.5

, the batch training size is set to process 16 images in each batch, the iteration times are set to 45, and the multi-head self-attention module is added on the bottleneck layer to determine the final neural network model. The model is trained by the training set data, and the model is evaluated by the validation set data. Finally, the accuracy curve and the numerical loss curve of the model are obtained. See Figure 10 and Figure 11.

The data discovery model finally measured in this paper achieved 95.52% accuracy on COVID-19 images, and the measured loss function value was

0.0044

.

As one of the most classical loss functions in classification models, cross-entropy is used in this paper. When the number of training iterations increases to more than 45, the loss value does not decrease or increase obviously, which indicates that the model converges to the relative optimal state well, and there is no overfitting phenomenon. It can be found from the change curve of loss value that, when the number of iterations reaches 45, the loss value converges to

0.0044

. Because the sample of the data set is relatively small, the loss value will not be reduced to a smaller value, but the loss value will not affect the effectiveness of this experiment.

4.3. Evaluation of Classification Effect

Classification effects were examined using 763 chest X-ray images from a test set, including 239 images of COVID-19, 256 images of normal lungs, and 268 images of common pneumonia. We use the confusion matrix to evaluate the classification effect. Through the confusion matrix, we can calculate the difference between the real value and the predicted value of the model so as to calculate the accuracy, recall, F1-score, and other parameters to evaluate the effectiveness of the deep learning model. The confusion matrix is shown in Table 10, and the visual confusion matrix is shown in Figure 12. In the figure, the rows represent the predictions and the columns represent the true labels. Both predicted and true label scores have three categories: “0” for COVID-19, “1” for normal lungs, and “2” for common pneumonia. On the right is a legend that visualizes quantities as their corresponding colors, with darker colors having smaller values and lighter colors having larger values.

The F1-score, recall rate, and accuracy index of the model can be calculated from the confusion matrix, and the effectiveness of the model can be effectively evaluated through the above indexes. The calculated values are shown in Table 11.

The curve of accuracy and recall increasing with the number of iterations is shown in Figure 13. In the figure, the blue curve represents COVID-19, the orange curve represents normal lungs, and the green curve represents common pneumonia. Due to the small sample size of the test set, the iterations have converged after more than 20 times. According to the classification effect of confusion matrix in Figure 12, macro average, micro average, and weight average of the confusion matrix can be calculated, and the calculated values are shown in Table 12.

To evaluate the overall effect of the model, we also use the average value of weight as the evaluation condition. According to statistics in Table 11 and Table 12, it is found that, in the classification process of the model, the average weight medium accuracy reaches 96.02%, the recall rate reaches 95.47%, and the F1-score reaches 95.93%. The accuracy of the model’s single classification for COVID-19 was 99.12%, the recall rate reached 97.38%, and the F1-score reached 98.32%.

4.4. Ablation Study Results and Analysis

To verify the effectiveness of each module, this paper conducted ablation research experiments on the scheme to verify the effectiveness of the multi-head self-attention module, PCA dimension reduction algorithm, and T-SNE dimension reduction algorithm. The ablation experiments included the following four, and the experimental results are shown in Table 13.

In Experiment 1, ResNet, a residual network without a multi-head self-attention module, was used for the experiment. ResNet also had the structure of a bottleneck layer, and the residual value was calculated through the bottleneck layer. It verifies the effectiveness of the multi-head self-attention module.

In Experiment 2, the PCA dimensionality reduction algorithm was removed from the feature extraction scheme, and the medical images were directly trained in the neural network after passing the gray level co-occurrence matrix.

In Experiment 3, the T-SNE dimensionality reduction algorithm was removed from the feature extraction scheme. After passing the gray level co-occurrence matrix, the medical images were processed by PCA dimensionality reduction and directly entered the neural network for training.

Experiment 4 uses all modules, that is, the method currently used in this paper.

Experimental results show that the neural network with a multi-head self-attention module can improve detection accuracy compared with ResNet. In the feature extraction scheme. After using the PCA algorithm for dimension reduction, the accuracy of detection can be significantly improved. In Experiment 3, T-SNE was added to the feature extraction scheme, which not only achieved data visualization but also slightly improved the detection accuracy.

4.5. Analysis of the Comparison between the Detection Results and the Latest Detection Methods

In order to evaluate the effectiveness of the method adopted in this paper, the network model proposed in other papers is referred to, and the comparison experiment is conducted with the convolutional neural network model COVID-19 Net, the long short-term memory LSTM with attention module, and the twin network with an attention mechanism.

In Experiment 1, the COVID-Net [3] is a model used to examine chest X-ray images. Based on the VGG network, this model uses the projection extension reprojection and reexpansion mode (PEPX) to reduce the dimensionality of the image through the convolutional neural network and finally carries out a classification detection through the Softmax layer so as to achieve the classification purpose.

The second experiment used the long short-term memory neural network with an added attention mechanism [16]. It is characterized by the computation of the relationship of each output vector after the last state in the LSTM passes through the linear filter. It is also used for Softmax to rescale, and the image information is obtained by combining the previous output vector. By combining the attention mechanism with LSTM, the effect of LSTM training is enhanced.

Experiment 3 used the twin network with added attention mechanism [32]. Through the mechanism of sharing weights of twin networks, pairs of data are input into neural networks for training. In this experiment, the images of common pneumonia were eliminated to avoid the problem of data imbalance in the training process. In the way of the pairwise combination of COVID-19 positive and normal lungs in the dataset, a pair of positive and negative samples of the twin network were input at the same time, and the images were put into the neural network model to compare and calculate by the feature vectors generated by the model.

In this paper, the organized datasets are respectively substituted into three different neural network models for training, and the experimental results are shown in Table 14.

Through comparative experiments, it is found that, compared with the other three neural network architectures, the neural network model designed in this paper has achieved high performance in the four indexes of accuracy, precision, recall, and F1-score, with the accuracy reaching 95.52%, precision reaching 96.02%, recall reaching 95.47% and F1-score reaching 95.93%.

5. Conclusions

In this paper, a feature processing scheme based on a gray level co-occurrence matrix is proposed. The multi-head self-attention mechanism and residual neural network are combined to detect lung X-ray medical images, which has achieved good experimental results and completed the detection of COVID-19 images.

The main work of this paper is summarized as follows:

A feature processing scheme based on a gray level co-occurrence matrix is proposed. Texture features can effectively describe the lesions on the image, and the gray level co-occurrence matrix is used to extract the texture features of the image. At the same time, the gray level co-occurrence matrix contains many features, so PCA and T-SNE algorithms should be used to reduce its dimension, so as to complete the feature extraction of medical image datasets.
The MHSA-ResNet neural network model is designed by combining a multi-head self-attention mechanism with a residual neural network. The multi-head self-attention module is replaced with the bottleneck layer of a residual neural network to complete the calculation of residual value so as to improve the accuracy of neural network detection.
The ablation experiment verifies the effectiveness of the multi-head attention module and the feature extraction scheme, which can effectively improve the accuracy of COVID-19 image detection.
Finally, the experimental results show that the accuracy of the neural network model is 95.52%, a precision of 96.02%. At the same time, the confusion matrix was used to verify the classification effect of the three normal images of COVID-19, common pneumonia, and normal lungs, and the accuracy of the classification image of COVID-19 reached 99.12%. The effectiveness and accuracy of the novel coronavirus pneumonia detection model were verified by comparative experiments on the latest detection methods.

Although this study has carried out a lot of exploration and research on the image detection of COVID-19, there are still some places for improvement. Combined with several problems in this paper, we can try to improve on the following aspects:

Realize the visualization of attention mechanism. At present, many neural networks applying deep learning for image detection and classification have added Transformer and other attention mechanisms. Attention mechanism can effectively improve the accuracy and accuracy of image recognition and classification. To better understand the neural network, it can be visualized and interpreted to make a better decision on the model. Functions such as gradient-weighted class activation mapping (Grad-CAM) [33] can be added to the neural network model to generate thermal maps and visualize the attention mechanism’s visualization.
Realize the extension of detection function. With the increase of variant strains of COVID-19, the functional requirements of detection models are also higher. By collecting lung X-ray images of different types of COVID-19 variant strains for neural network training, more types of pneumonia X-ray images can be detected, and the generalization ability of the protocol can be improved.
Implement model improvements. Due to the hardware conditions of the laboratory, hyperparameters are restricted and set based on neural networks. The problem of insufficient memory of batch training has appeared in the comparative experiment, so the number of each batch training is set as 16 images. If the experimental conditions permit, more improvements can be made to the model’s hyperparameters.

Author Contributions

Conceptualization, Z.W.; methodology, Z.W., K.Z. and B.W.; software, B.W.; validation, Z.W. and K.Z.; formal analysis, B.W.; investigation, K.Z.; resources, K.Z.; data curation, B.W.; writing—original draft preparation, Z.W., K.Z. and B.W.; writing—review and editing, Z.W. and K.Z.; visualization, B.W.; supervision, Z.W.; project administration, Z.W.; funding acquisition, Z.W. and B.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Key RD Program of China (No. 2018YFB08 03401), in part by the China Postdoctoral Science Foundation under Grant 2019M650606, in part by the First-Class Discipline Construction Project of Beijing Electronic Science and Technology Institute (No. 3201012).

Data Availability Statement

The dataset is publicly available at https://github.com/lindawangg/COVID-Net, (accessed on 2 September 2022), https://github.com/ieee8023/covid-chestxray-dataset, (accessed on 27 August 2022), https://data.mendeley.com/datasets/rscbjbr9sj/2/files/f12eaf6d-6023-432f-acc9-80c9d7393433, (accessed on 1 September 2022), and http://openi.nlm.nih.gov/imgs/collections/ChinaSet_AllFiles.zip, (accessed on 1 September 2022). The kaggle dataset is available at https://www.kaggle.com/tawsifurrahman/covid19-radiography-database, (accessed on 2 September 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ResNet	Deep Residual Neural Network
LSTM	Long Short Term Memory network
ReLU	Linear Rectification Function
PCA	Principal Component Analysis
DICOM	Digital Imaging and Communications in Medicine
GLCM	Gray-Level Co-occurrence Matrix
CT	Computed Tomography

References

Narin, A.; Kaya, C.; Pamuk, Z. Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks. Pattern. Anal. Appl. 2021, 24, 1207–1220. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Xie, Y.; Pang, G.; Liao, Z.; Verjans, J.; Li, W.; Sun, Z.; He, J.; Li, Y.; Shen, C.; et al. Viral pneumonia screening on chest X-rays using confidence-aware anomaly detection. IEEE Trans. Med. Imaging 2020, 40, 879–890. [Google Scholar] [CrossRef] [PubMed]
Wang, L.; Lin, Z.Q.; Wong, A. COVID-net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Sci. Rep. 2020, 10, 19549. [Google Scholar] [CrossRef] [PubMed]
Ghoshal, B.; Tucker, A. Estimating uncertainty and interpretability in deep learning for coronavirus (COVID-19) detection. arXiv 2020, arXiv:2003.10769. [Google Scholar]
Chen, J.; Wu, L.; Zhang, J.; Zhang, L.; Gong, D.; Zhao, Y.; Chen, Q.; Huang, S.; Yang, M.; Yang, X.; et al. Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography. Sci. Rep. 2020, 10, 19196. [Google Scholar] [CrossRef]
Zheng, C.; Deng, X.; Fu, Q.; Zhou, Q.; Feng, J.; Ma, H.; Liu, W.; Wang, X. Deep learning-based detection for COVID-19 from chest CT using weak label. MedRxiv 2020. [Google Scholar] [CrossRef] [Green Version]
Jin, C.; Chen, W.; Cao, Y.; Xu, Z.; Tan, Z.; Zhang, X.; Deng, L.; Zheng, C.; Zhou, J.; Shi, H.; et al. Development and Evaluation of an AI System for COVID-19. 2020. Available online: https://pesquisa.bvsalud.org/portal/resource/pt/ppmedrxiv-20039834 (accessed on 16 August 2022).
Jin, S.; Wang, B.; Xu, H.; Luo, C.; Wei, L.; Zhao, W.; Hou, X.; Ma, W.; Xu, Z.; Zheng, Z.; et al. AI-assisted CT imaging analysis for COVID-19 screening: Building and deploying a medical AI system in four weeks. MedRxiv 2020. [Google Scholar] [CrossRef] [Green Version]
Song, Y.; Zheng, S.; Li, L.; Zhang, X.; Zhang, X.; Huang, Z.; Chen, J.; Wang, R.; Zhao, H.; Chong, Y.; et al. Deep learning enables accurate diagnosis of novel coronavirus (COVID-19) with CT images. IEEE ACM Trans. Comput. Biol. Bioinform. 2021, 18, 2775–2780. [Google Scholar] [CrossRef]
Xu, X.; Jiang, X.; Ma, C.; Du, P.; Li, X.; Lv, S.; Yu, L.; Ni, Q.; Chen, Y.; Su, J.; et al. A deep learning system to screen novel coronavirus disease 2019 pneumonia. Engineering 2020, 6, 1122–1129. [Google Scholar] [CrossRef]
Li, L.; Qin, L.; Xu, Z.; Yin, Y.; Wang, X.; Kong, B.; Bai, J.; Lu, Y.; Fang, Z.; Song, Q.; et al. Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT. Radiology 2020, 296, E65–E71. [Google Scholar]
Shi, F.; Xia, L.; Shan, F.; Song, B.; Wu, D.; Wei, Y.; Yuan, H.; Jiang, H.; He, Y.; Gao, Y.; et al. Large-scale screening to distinguish between COVID-19 and community-acquired pneumonia using infection size-aware classification. Phys. Med. Biol. 2021, 66, 065031. [Google Scholar] [CrossRef] [PubMed]
Voulodimos, A.; Protopapadakis, E.; Katsamenis, I.; Doulamis, A.; Doulamis, N. A few-shot U-net deep learning model for COVID-19 infected area segmentation in CT images. Sensors. 2021, 21, 2215. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Wang, Y.; Wang, S.; Wang, J.; Liu, J.; Jin, Q.; Sun, L. Multiscale attention guided network for COVID-19 diagnosis using chest X-ray images. IEEE J. Biomed. Health Inform. 2021, 25, 1336–1346. [Google Scholar] [CrossRef]
Hu, K.; Huang, Y.; Huang, W.; Tan, H.; Chen, Z.; Zhong, Z.; Li, X.; Zhang, Y.; Gao, X. Deep supervised learning using self-adaptive auxiliary loss for COVID-19 diagnosis from imbalanced CT images. Neurocomputing 2021, 458, 232–245. [Google Scholar] [CrossRef] [PubMed]
Ter-Sarkisov, A. One Shot Model for COVID-19 Classification and Lesions Segmentation in Chest CT Scans Using Long Short-Term Memory Network With Attention Mechanism. IEEE Intell. Syst. 2022, 37, 54–64. [Google Scholar] [CrossRef]
Wang, J.; Bao, Y.; Wen, Y.; Lu, H.; Luo, H.; Xiang, Y.; Li, X.; Liu, C.; Qian, D. Prior-attention residual learning for more discriminative COVID-19 screening in CT images. IEEE Trans. Med. Imaging 2020, 39, 2572–2583. [Google Scholar] [CrossRef]
Ouyang, X.; Huo, J.; Xia, L.; Shan, F.; Liu, J.; Mo, Z.; Yan, F.; Ding, Z.; Yang, Q.; Song, B.; et al. Dual-sampling attention network for diagnosis of COVID-19 from community acquired pneumonia. IEEE Trans. Med. Imaging 2020, 39, 2595–2605. [Google Scholar] [CrossRef]
Han, Z.; Wei, B.; Hong, Y.; Li, T.; Cong, J.; Zhu, X.; Wei, H.; Zhang, W. Accurate screening of COVID-19 using attention-based deep 3D multiple instance learning. IEEE Trans. Med. Imaging 2020, 39, 2584–2594. [Google Scholar] [CrossRef]
Sharma, V.; Dyreson, C. COVID-19 screening using residual attention network an artificial intelligence approach. In Proceedings of the 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 14–17 December 2020; pp. 1354–1361. [Google Scholar] [CrossRef]
Liu, B.; Gao, X.; He, M.; Lv, F.; Yin, G. Online COVID-19 diagnosis with chest CT images: Lesion-attention deep neural networks. MedRxiv 2020. [Google Scholar] [CrossRef]
Cohen, J.P.; Morrison, P.; Dao, L. COVID-19 image data collection. arXiv 2020, arXiv:2003.11597. [Google Scholar]
Maguolo, G.; Nanni, L. A critic evaluation of methods for COVID-19 automatic detection from X-ray images. Inf. Fusion 2021, 76, 1–7. [Google Scholar] [CrossRef] [PubMed]
Tartaglione, E.; Barbano, C.A.; Berzovini, C.; Calandri, M.; Grangetto, M. Unveiling COVID-19 from chest X-ray with deep learning: A hurdles race with small data. Int. J. Environ. Res. Public Health 2020, 17, 6933. [Google Scholar] [CrossRef] [PubMed]
Wong, A.; Famuori, M.; Shafiee, M.J.; Li, F.; Chwyl, B.; Chung, J. Yolo nano: A highly compact you only look once convolutional neural network for object detection. In Proceedings of the 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing—NeurIPS Edition (EMC2-NIPS), Vancouver, BC, Canada, 13 December 2019; Volume 10, pp. 22–25. [Google Scholar] [CrossRef]
Haralick, R.M.; Shanmugam, K.; Dinstein, I. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 1973, 610–621. [Google Scholar] [CrossRef] [Green Version]
Löfstedt, T.; Brynolfsson, P.; Asklund, T.; Nyholm, T.; Garpebring, A. Gray-level invariant Haralick texture features. PLoS ONE 2019, 14, e0212110. [Google Scholar] [CrossRef]
Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef] [Green Version]
Laurens, V.D.M.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Srinivas, A.; Lin, T.; Parmar, N.; Shlens, J.; Abbeel, P.; Vaswani, A. Bottleneck transformers for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16519–16529. [Google Scholar]
Bello, I.; Zoph, B.; Vaswani, A.; Shlens, J.; Le, Q.V. Attention augmented convolutional networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 3286–3295. [Google Scholar]
Fontanellaz, M.; Ebner, L.; Huber, A.; Peters, A.; Löbelenz, L.; Hourscht, C.; Klaus, J.; Munz, J.; Ruder, T.; Drakopoulos, D.; et al. A deep-learning diagnostic support system for the detection of COVID-19 using chest radiographs: A multireader validation study. Investig. Radiol. 2021, 56, 348–356. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]

Figure 1. Number of COVID-19 cases according to WHO (as of 25 March 2019). This figure is acquired from the WHO database, which is protected by ‘CC BY 4.0’.

Figure 2. Detection process.

Figure 3. Gray histogram of lung X-ray images.

Figure 4. Gray matrix conversion diagram. The left side is the gray map, and the right side is matrix.

Figure 5. Example of feature map extracted from the gray level co-occurrence matrix.

Figure 6. Renderings after PCA and T-SNE treatment.

Figure 7. The neural network model.

Figure 8. Schematic diagram of the multi-headed self-attention structure.

Figure 9. Bottleneck block of MHSA.

Figure 10. Variation of accuracy curve.

Figure 11. Change curve of loss value.

Figure 12. Confusion matrix on the test set.

Figure 13. Change curve of recall and precision of the classification effect.

Table 1. Partitioning of datasets.

Category	Training Set	Test Set	Validation Set
Image of normal chest	1085	256	317
Image of COVID-19	961	239	115
Image of common pneumonia	1077	268	855
Total	3123	763	1287

Table 2. Model architecture and comparison.

Framework	Output	ResNet-50	ResNet-152	MHSA-ResNet
Convolution block 1	512 × 512		7 × 7, stride 2
		3 × 3 max pool	3 × 3 max pool	3 × 3 max pool
Convolution block 2	256 × 256	stride 2	stride 2	stride 2
		$[\begin{matrix} 1 \times 1, 64 \\ 3 \times 3, 64 \\ 1 \times 1, 256 \end{matrix}] \times 3$	$[\begin{matrix} 1 \times 1, 64 \\ 3 \times 3, 64 \\ 1 \times 1, 256 \end{matrix}] \times 3$	$[\begin{matrix} 1 \times 1, 64 \\ 3 \times 3, 64 \\ 1 \times 1, 256 \end{matrix}] \times 3$
Convolution block 3	128 × 128	$[\begin{matrix} 1 \times 1, 128 \\ 3 \times 3, 128 \\ 1 \times 1, 512 \end{matrix}] \times 4$	$[\begin{matrix} 1 \times 1, 128 \\ 3 \times 3, 128 \\ 1 \times 1, 512 \end{matrix}] \times 8$	$[\begin{matrix} 1 \times 1, 128 \\ 3 \times 3, 128 \\ 1 \times 1, 512 \end{matrix}] \times 8$
Convolution block 4	64 × 64	$[\begin{matrix} 1 \times 1, 256 \\ 3 \times 3, 256 \\ 1 \times 1, 1024 \end{matrix}] \times 6$	$[\begin{matrix} 1 \times 1, 256 \\ 3 \times 3, 256 \\ 1 \times 1, 1024 \end{matrix}] \times 36$	$[\begin{matrix} 1 \times 1, 256 \\ M H S A, 256 \\ 1 \times 1, 1024 \end{matrix}] \times 36$
Convolution block 5	32 × 32	$[\begin{matrix} 1 \times 1, 512 \\ 3 \times 3, 512 \\ 1 \times 1, 2048 \end{matrix}] \times 3$	$[\begin{matrix} 1 \times 1, 512 \\ 3 \times 3, 512 \\ 1 \times 1, 2048 \end{matrix}] \times 3$	$[\begin{matrix} 1 \times 1, 512 \\ M H S A, 512 \\ 1 \times 1, 2048 \end{matrix}] \times 3$

Table 3. Comparison of the number of neural network parameters.

Neural Network	ResNet152	MHSA-ResNet
Params (Unit: M)	60.19	52.35
FLOPs (Unit: G)	10.99	9.18

Table 4. Experimental environment.

Experimental Environment	Environment Configuration
OS	Windows10
Processor	Intel i7-8700
GPU	RTX 2070 8 GB Memory
Internal storage	32 G
Programming language	Python 3.7
Deep learning framework	Tensorflow 2.1

Table 5. Evaluation of classification results.

Category	Positive Example (Forecast Result)	Negative Example (Prediction Results)
Positive example (true situation)	Positive example (true positive example)	FN (false negative example)
Negative example (true situation)	FP (false positive example)	TN (true negative example)

Table 6. Comparison experiment of different activation functions.

Activation Function	Accuracy
ReLU	95.52%
Sigmoid	95.13%
TanH	95.24%

Table 7. Comparison of different optimization algorithms.

Optimization Algorithm	Accuracy
Momentum	94.92%
RMSProp	95.24%
Adam	95.52%

Table 8. Comparison of different learning rates.

Learning Rate	Accuracy
0.1	48.03%
0.01	95.13%
0.005	95.42%
0.001	95.52%
0.0001	95.46%

Table 9. Comparison of different learning rates.

Iterations	Accuracy
35	95.45%
40	95.47%
45	95.52%
50	95.48%
55	95.48%
60	95.46%

Table 10. Classification of data sets by the confusion matrix.

Confusion Matrix		Predicted Value
Confusion Matrix		COVID-19	Normal Lungs	Common Pneumonia
True value	COVID-19	231	2	6
	Normal lungs	1	232	23
	Common pneumonia	1	6	261

Table 11. Comparison of recall, accuracy, and F1-score in different groups.

Category	Precision	Recall	F1-Score
COVID-19	99.12%	97.38%	98.32%
Normal lungs	90.87%	91.32%	94.87%
Common pneumonia	97.66%	97.26%	94.12%

Table 12. Macro average, micro average, and weight average of each value of confusion matrix.

Type	Precision	Recall	F1-Score
Macro avg	95.88%	95.32%	95.77%
Micro avg	95.64%	95.14%	95.71%
Weighted avg	96.02%	95.47%	95.93%

Table 13. Comparison results of different feature extraction schemes.

Experiment	Accuracy	Precision	Recall	F1-Score
Do not use multi-headed self-attention mechanisms	86.04%	86.78%	85.84%	85.78%
Do not use PCA	73.23%	73.21%	72.16%	73.40%
Do not use T-SNE	94.23%	94.62%	94.16%	94.40%
Method of this paper	95.52%	96.02%	95.47%	95.93%

Table 14. Comparison results between this paper and the latest detection methods.

Neural Network Architecture	Accuracy	Precision	Recall	F1-Score
MHSA-ResNet	95.52%	96.02%	95.47%	95.93%
Covid-Net	90.89%	90.33%	90.11%	90.21%
LSTM-attention	90.12%	89.52%	89.54%	89.75%
Siamese+attention	92.32%	92.64%	91.89%	91.96%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Z.; Zhang, K.; Wang, B. Detection of COVID-19 Cases Based on Deep Learning with X-ray Images. Electronics 2022, 11, 3511. https://doi.org/10.3390/electronics11213511

AMA Style

Wang Z, Zhang K, Wang B. Detection of COVID-19 Cases Based on Deep Learning with X-ray Images. Electronics. 2022; 11(21):3511. https://doi.org/10.3390/electronics11213511

Chicago/Turabian Style

Wang, Zhiqiang, Ke Zhang, and Bingyan Wang. 2022. "Detection of COVID-19 Cases Based on Deep Learning with X-ray Images" Electronics 11, no. 21: 3511. https://doi.org/10.3390/electronics11213511

APA Style

Wang, Z., Zhang, K., & Wang, B. (2022). Detection of COVID-19 Cases Based on Deep Learning with X-ray Images. Electronics, 11(21), 3511. https://doi.org/10.3390/electronics11213511

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detection of COVID-19 Cases Based on Deep Learning with X-ray Images

Abstract

1. Introduction

2. Related Work

2.1. Image Detection of COVID-19 Based on the Convolutional Neural Network

2.2. Image Detection of COVID-19 Based on an Attention Mechanism

3. Materials and Methods

3.1. Dataset

3.2. Image Preprocessing

3.3. Feature Selection

3.3.1. Gray Level Co-Occurrence Matrix (GLCM)

3.3.2. Principal Component Analysis (PCA)

3.3.3. T-Distributed Random Neighborhood Embedding Algorithm (T-SNE)

3.3.4. Design of the Feature Extraction Scheme

3.4. Classification Methods

3.4.1. Design of Detection Model

3.4.2. Multi-Head Self-Attention Module Design

3.4.3. Evaluating the Complexity of the Detection Model

3.5. Experimental Situation

3.6. Evaluation Metrics

4. Results and Discussion

4.1. Comparison Experiment of Different Hyperparameters

4.1.1. Size of the Convolution Kernel and Pooling Mode

4.1.2. Comparative Experiment of Activation Function

4.1.3. Comparative Experiment of the Optimization Algorithm

4.1.4. Learning Rate

4.1.5. Number of Iterations

4.2. Parameter Settings and Final Experimental Results of Neural Networks

4.3. Evaluation of Classification Effect

4.4. Ablation Study Results and Analysis

4.5. Analysis of the Comparison between the Detection Results and the Latest Detection Methods

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI