Integrated Generative Adversarial Networks and Deep Convolutional Neural Networks for Image Data Classification: A Case Study for COVID-19

Khalif, Ku Muhammad Naim Ku; Chaw Seng, Woo; Gegov, Alexander; Bakar, Ahmad Syafadhli Abu; Shahrul, Nur Adibah

doi:10.3390/info15010058

Open AccessArticle

Integrated Generative Adversarial Networks and Deep Convolutional Neural Networks for Image Data Classification: A Case Study for COVID-19

¹

Centre for Mathematical Sciences, Universiti Malaysia Pahang Al-Sultan Abdullah, Kuantan 23600, Malaysia

²

Centre of Excellence for Artificial Intelligence & Data Science, Universiti Malaysia Pahang Al-Sultan Abdullah, Kuantan 23600, Malaysia

³

Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur 50603, Malaysia

⁴

School of Computing, Faculty of Technology, University of Portsmouth, Portsmouth PO1 3HE, UK

⁵

English Faculty of Engineering, Technical University of Sofia, 1756 Sofia, Bulgaria

⁶

Mathematics Division, Centre for Foundation Studies in Science, University of Malaya, Kuala Lumpur 50603, Malaysia

⁷

Centre of Research for Computational Sciences and Informatics in Biology, Bioindustry, Environment, Agriculture and Healthcare (CRYSTAL), University of Malaya, Kuala Lumpur 50603, Malaysia

⁸

Negeri Sembilan State Health Department, Ministry of Health Malaysia, Seremban 70300, Malaysia

^*

Author to whom correspondence should be addressed.

Information 2024, 15(1), 58; https://doi.org/10.3390/info15010058

Submission received: 22 December 2023 / Revised: 5 January 2024 / Accepted: 12 January 2024 / Published: 18 January 2024

(This article belongs to the Special Issue Feature Papers in Information in 2023)

Download

Browse Figures

Versions Notes

Abstract

:

Convolutional Neural Networks (CNNs) have garnered significant utilisation within automated image classification systems. CNNs possess the ability to leverage the spatial and temporal correlations inherent in a dataset. This study delves into the use of cutting-edge deep learning for precise image data classification, focusing on overcoming the difficulties brought on by the COVID-19 pandemic. In order to improve the accuracy and robustness of COVID-19 image classification, the study introduces a novel methodology that combines the strength of Deep Convolutional Neural Networks (DCNNs) and Generative Adversarial Networks (GANs). This proposed study helps to mitigate the lack of labelled coronavirus (COVID-19) images, which has been a standard limitation in related research, and improves the model’s ability to distinguish between COVID-19-related patterns and healthy lung images. The study uses a thorough case study and uses a sizable dataset of chest X-ray images covering COVID-19 cases, other respiratory conditions, and healthy lung conditions. The integrated model outperforms conventional DCNN-based techniques in terms of classification accuracy after being trained on this dataset. To address the issues of an unbalanced dataset, GAN will produce synthetic pictures and extract deep features from every image. A thorough understanding of the model’s performance in real-world scenarios is also provided by the study’s meticulous evaluation of the model’s performance using a variety of metrics, including accuracy, precision, recall, and F1-score.

Keywords:

generative adversarial networks; deep convolutional neural network; VGG16; COVID-19

1. Introduction

In the realm of computer vision, one of the most sought-after capabilities is the effective classification of image data. As the availability of image-capture devices and the increasing adoption of digital platforms drive the exponential growth of digital data, there is an urgent need for robust and sophisticated models to make sense of this massive influx of visual information. The utilisation of deep learning in image classification has been suggested due to its capacity to provide a more profound comprehension of the response of a subject to specific visual stimuli upon exposure. According to [1], deep learning-based techniques have produced amazing results recently. The efficacy of a classification system is contingent upon the quality of the features extracted from an image. The accuracy of the results will increase proportionally with the quality of the extracted features. Despite the significant advancements demonstrated by numerous deep learning-based approaches in image classification, the extraction of all crucial information from images remains a challenge for these methods. As a consequence, there is a decrease in the overall accuracy of classification. Image classification, object identification and recognition from images, and image segmentation activities can be generically classed as computer vision (CV) tasks. One of the most prevalent CV issues is image classification. These are frequently utilised, particularly in the medical field, and are sometimes stated to be supervised learning, where a group of features X (typically taken from the image) are used to forecast a specific result y, or label. Before the use of deep learning became widespread in 2012, some common machine learning models included support vector machines, random forests, and artificial neural networks and these traditional approaches were the most common way of handling computer vision tasks including classification, object detection and tracking [2].

Deep Convolutional Neural Networks (DCNNs) are advanced deep learning models used for image and video processing. Convolutional layers, unlike completely connected layers, only link neurons to a limited input region. This localised filtering method creates a feature map from input data. The max-pooling layer, which takes the maximum value from a segment of the image, is often used after these convolutional layers to down-sample the data’s spatial dimensions. DCNNs support image categorization, object detection, semantic segmentation, face recognition, and artistic style transfer. In 2014, Oxford University’s Visual Geometry Group (VGG) created VGGNet [3]. VGGN’s architectural simplicity is beneficial. However, it had three times more parameters than AlexNet. VGG architecture supports advanced object identification models. On many tasks and datasets outside ImageNet, the Deep Neural Network VGG outperforms baselines. However, one of the most popular image recognition architectures remains. AlexNet, VGG, ResNet, and Inception are among the DCNN designs that have improved efficiency and performance using new structures and concepts. In those battling with COVID-19, respiratory symptoms are one of the most common symptoms, and they may be detected via chest X-ray imaging. Additionally, a condition with modest symptoms may be diagnosed using chest CT scans. Normally, detection is accomplished by analysing signal data [4]. The existing deep learning models require a higher number of training parameters, which not only increases the computational complexity of classification but also leads to over-fitting issues due to the scarcity of COVID-19 X-ray images [5]. Recent years have shown that deep learning models are a promising tool in the field of medicine for the diagnosis of pathologies, including lung pathologies, and have also shown highly promising results in the diagnosis of other medical disorders. There are many deep learning models which have been developed and presented to diagnose the presence of COVID-19 and pneumonia in chest X-ray and computerised tomography (CT) scan pictures using various deep learning models and techniques.

The Generative Adversarial Network (GAN) is a type of generative modelling which uses deep learning techniques in convolutional neural networks. Generative modelling is an unsupervised learning task that involves automatically detecting and learning patterns in input data such that the model could be used to produce new instances that might have been drawn from the original dataset. GAN models were proposed by [6] and have been widely used in the area of image processing for translating an input picture into its matching output image. They pit the Generator against the Discriminator. Starting from random noise, the Generator iteratively creates real-looking data to trick the Discriminator. Conversely, the Discriminator must identify real data from fakes. GANs are a delicate dance which the Generator improves its counterfeits and the Discriminator improves its discernment. The Generator should create data that are so realistic that the Discriminator can no longer distinguish it from real samples as their training advances. GANs may generate lifelike representations of faces and objects, transfer styles between images, enhance data sets, especially when authentic samples are scarce, improve image resolutions, and even create molecular designs for potential medications. GANs have drawbacks. Mode collapse—when the Generator produces limited or repeating outputs—can be problematic. Training might be unpredictable due to the dynamic tension between the two networks, requiring cautious balancing. Traditional GANs have trouble in precisely guiding the Generator’s output. However, DCGANs, WGANs, and GANs have improved stability, quality, and controllability, stretching GANs’ limits.

The coronavirus (COVID-19) outbreak is a severe repository disease which was first identified in Wuhan, China, at the end of 2019. In this situation, the main priority is creating more efficient and faster approaches for diagnosis approaches to reduce the transmission of a severe acute respiratory syndrome such as COVID-19. According to the World Health Organization (WHO), it is reported that this corresponds to just under 9.5 million new cases and over 41,000 new deaths at end of 2019. As of 2 January 2022, a total of nearly 289 million cases and just over 5.4 million deaths have been reported globally. We now have several vaccine options, but it would take a long time for the vaccine to reach every region of the world. As a result, visual markers can be employed as an alternate way quickly screen infected individuals. The typical common symptom of the virus is a lung infection, for which chest radiography pictures, like X-ray and computed tomography (CT) images, are extensively used as a visual signal [7]. Reverse Transcription Polymerase Chain Reaction (RT-PCR) is frequently utilised for COVID-19 detection. Expert laboratory personnel and testing equipment are necessary for testing. The time and costs associated with testing a sample vary from two hours to several days. In addition, RT-PCR produces inaccurate results in some circumstances due to its high false-negative rate (39–60 per cent) [8]. New variations of the SARS-CoV-2 virus have made it more difficult to identify using current diagnostic methods. Conventionally, radiologists use chest X-ray images to interpret the images to discover some visual patterns which can confirm a COVID-19 infection. While this approach has become more accurate with time, it is still disposed to medical staff risk. It is also more expensive because diagnostic test kits are required for each patient. In comparison, medical imaging procedures such as X-rays and CT scans, which are considerably quicker, safer, and more widely available, can be employed for screening. For COVID-19 screening, X-ray image screening is preferable over CT scans since it is more widely available and less expensive [9]. Still, manually diagnosing the virus using X-ray images might be time-consuming. If there is no or little prior experience, it can lead to uncertain events such as inaccuracies and human errors. As a result, there is a robust necessity to broadly automate such operations, and they should be available to everyone so that diagnosis may become more efficient, accurate, and rapid.

To address the aforementioned challenges, researchers and practitioners have developed an automated detection system for coronavirus infections using artificial intelligence (AI) techniques [10]. In the past decade, the combination of AI with medical imaging has aided several industries, including healthcare, in diagnosing and treating a variety of conditions. Deep Neural Networks [11] are being utilised in the healthcare sector as of late. The Convolutional Neural Network (CNN) is a well-known Deep Neural Network. Deep learning models were successfully applied in a variety of fields, including medical data segmentation, classification, and lesion identification [12]. In battling with COVID-19, respiratory symptoms are one of the most common COVID-19 symptoms, and they may be detected through chest X-ray imaging. A condition with modest symptoms may also be diagnosed using chest CT scans. Typically, detection is accomplished by analysing indicator data (Mohsin Ahmed and Wael Abdullah, 2021). For the CNN to detect a coronavirus infection from chest X-ray images, massive amounts of training data are required. However, adequate chest X-ray image datasets with equal COVID-19 and normal chest images are unavailable. The absence of supervised data may contribute to the class imbalance issue [13]. As a result, there is a strong need to broadly automate such operations, and they should be available to everyone so that diagnosis may become more efficient, accurate, and rapid. Imaging is the fastest and most accurate way to detect COVID-19. Researchers use X-ray images for COVID-19 detection because of their benefits. Low cost and wide availability are its main advantages over other imaging methods. Furthermore, X-ray imaging uses less radiation than CT scan imaging. It detects lung cancer and cardiac diseases. X-ray images are widely used, especially in poor countries. CT scans are better than X-rays [14]. CT scans provide more accurate diagnoses. CT scans are expensive and expose patients to more radiation. CT and X-ray images are popular for COVID-19 identification. Ground-glass opacification in the upper right lung is seen in X-rays. CT scans use ground-glass areas in the lower lung and halo signs and consolidation areas in the lower lobes [15,16,17,18]. X-ray and CT imaging features of COVID-19 and non-COVID cases are shown in Figure 1 [19].

One deep learning model may not classify X-ray images well. Machine learning and ensemble learning improve classification and regression predictions. Researchers combine deep learning models to improve results, but having too few datasets, especially images, can cause overfitting. To extract patterns, training requires many of parameters, which is computationally expensive. This requires tricky hyperparameter tuning [5]. In dealing with disease diagnosis, Hyperspectral imaging (HSI) captures and processes broad electromagnetic spectrum data. Hyperspectral cameras can capture images in dozens or hundreds of spectral bands, unlike traditional cameras that only capture red, green, and blue. One wavelength range of the electromagnetic spectrum is represented by each band [20]. Deep Convolutional Neural Networks (DCNNs) and Generative Adversarial Networks (GANs) have emerged as two of the most prominent and transformative deep learning techniques for image processing tasks. While DCNNs have been widely used for image classification due to their ability to learn the spatial hierarchies of features automatically and adaptively, GANs have gained popularity due to their ability to generate new data samples that are coherent and often indistinguishable from real data. The combination of these two robust architectures offers a novel method for improving the performance and robustness of image classification tasks. An integrated model that leverages the strengths of both GANs and DCNNs has the potential to overcome the challenges encountered when using them separately. GANs, for example, can be used to augment the training dataset, addressing issues such as data scarcity or class imbalance, while DCNNs can use this enriched dataset to perform more accurate classifications. There are a lot of practitioners and researchers who combine several deep learning models in COVID-19 X-ray images problem to obtain better results in terms of reducing the variation as well, such as, the authors [21] who proposed a DCGAN-based CNN model that generates synthetic CXR images using different datasets as references; the authors of [5] who examined the novel attention-based deep learning model using the attention module with VGG16 by considering Computer-aided diagnosis (CAD), and the authors of [9] who developed the automatic detection of COVID-19 cases using a combination of GANs and Deep Convolutional Neural Networks models. Likewise, the authors of [22] improved COVID-19 detection using GAN-based data augmentation and a novel QuNet-based classification. This study delves into the integration of GANs and DCNNs, to determine whether such a fusion can improve image data classification performance, both qualitatively and quantitatively. The investigation of methodologies, challenges, and breakthroughs in this domain will provide a comprehensive understanding of the benefits and drawbacks of combining these two cutting-edge technologies. In this study, the authors proposed the development of GAN and VGG16 models to generate synthetic datasets of COVID-19 diseases and analysing the X-ray images, respectively.

This section provides some background on the emergence of deep learning approaches related to COVID-19 issues. In Section 2, a review of the theoretical preliminaries of GAN and VGG16 is discussed, including the confusion matrix for the evaluation of the development of deep learning models. Section 3 illustrates the development of the proposed integrated Generative Adversarial Networks and Deep Convolutional Neural Networks for image data classification. Section 4 discusses the application of the proposed integrated deep learning model on the COVID-19 image dataset and presents an evaluative analysis, with some important remarks being discussed. Finally, Section 5 concludes this paper.

2. Preliminaries

2.1. Generative Adversarial Networks

According to Sharmila and Jemi (2021), two neural networks, which are a generator (G) and a discriminator (D), are trained under a generative model [21] as follows.

A generator (G) is a network that uses random noise, Z, to generate images. Gaussian noise, representing a random point in the latent space, serves as the generator’s input data.
The discriminator (D) determines if the image in question represents a real or synthetic distribution. It receives the input image x and yields the output as D(x). The creation of the output depends on the chance that x is a component of the real distribution. If the output of the discriminator is one, the image is found to be real; otherwise, the image is determined to be synthetic.

The min–max equation of an adversarial network can be denoted as:

M i n_{G} M a x_{D} V (D, G) = E_{x \sim P_{d a t a} (x)} [\log D (x)] + E_{z \sim P_{z} (z)} [\log (1 - D (G (x)))]

(1)

D is calculated based on the log function, where

D (x) = 1

is real. Based on the min–max game theory, the data are maximized or minimized by the discriminator D(G(z)). Afterward, a more complex deep convolutional GAN network is updated to improve the GAN’s performance. After 50 epochs, they resemble the original image, and the generator creates noisy images during the initial training progress.

A standard layout of GAN architecture is illustrated in Figure 2, where the input to the generator is a random sample from the latent space z. Next, a sample from the real distribution and the generator’s output G(z) are fed into the discriminator. Each sample is given a value by the discriminator based on its assessment of whether it is real (1) or synthetic (number 0). The performance of the two models is then examined using these two outputs, and the generator is trained to minimise the function

\log (1 - D (G (z)))

. Thus, the generator is trained to generate images that the discriminator is unable to recognise as being artificial (i.e.,

D (G (z)) \approx 1

). In addition to the generator’s training, the discriminator is trained to maximise the function

\log (D (x)) + \log (1 - D (G (z)))

, which aims to maximise the probability of the discriminator of correctly identifying both the synthetic samples

D (G (z))

and the real samples

D (x)

. Figure 2 depicts the generic architecture of the Deep Convolutional GAN.

The training of Generative Adversarial Networks is illustrated in Algorithm 1 using a minibatch version of stochastic gradient descent. One example of a hyperparameter is the number of steps applied to the discriminator, which is denoted by k. The option with k = 1 is chosen because it has the lowest cost.

Algorithm 1 Minibatch stochastic gradient descent training of GANs
for number of training iterations do
	(# part 1: update the discriminator)
	for k steps do
	Sample minibatch of m noise samples ${z^{(1)}, \dots, z^{(m)}}$ from noise prio $p_{g} (z)$ . Sample minibatch of m examples ${x^{(1)}, \dots, x^{(m)}}$ from data generating distribution $p_{d a t a} (x)$ . Update the discriminator by ascending its stochastic gradient: $\nabla_{θ_{d}} \frac{1}{m} \sum_{i = 1}^{m} [\log D (x^{(i)} + \log (1 - D (G (z^{i})))]$ (2)
	end for
	(# part 2: update the generator)
	Sample minibatch of m noise samples {z(1), …, z(m)} from noise prio p_g(z). Update the generator by descending its stochastic gradient: $\nabla_{θ_{g}} \frac{1}{m} \sum_{i = 1}^{m} l o g (1 - D (G (z^{i})))$ (3)
	end for
the gradient-based updates can use any standard gradient-based learning rule.

2.2. VGG16 Convolutional Neural Network

A Convolutional Neural Network’s architecture has three fundamental layers including a convolutional layer, a pooling layer, and a fully connected layer with a rectified linear activation function.

Convolutional layer

Convolution is performed by sliding the incompatible convolution kernel on the X-ray images input matrix. In order to form a local connection network, only one neuron in the local window will be connected to a neuron in the convolutional layer. Each connection will learn the weight and overall bias. The convolution operation can be expressed as:

a_{i j} = φ (b_{i} + \sum_{k = 1}^{3} w_{i k} x_{j + k - 1}) = φ (b_{i} + w_{i}^{T} x_{j})

(4)

where

a_{i j}

is the activation or output of the j’th neuron of the i’th filter in the hidden layer,

φ

is the neural activation function,

b_{i}

is the shared overall bias of filter i,

w_{i} = {[w_{i 1} w_{i 2} w_{i 3}]}^{T}

is the vector with a shared weight, and

x_{i} = {[x_{j} x_{j + 1} x_{j + 2}]}^{T}

. The output of this layer is called a feature map which contains information about the input. It is a filtered and learned weights input sequence. When multiple localised features need to be extracted, multiple convolution kernels are required to compute additional feature maps.

2.: Pooling layer

This layer performs a sub-sampling to simplify and summarize the feature map. The max-pooling only takes the maximum number from each kernel to reduce the feature map’s size as well as reduce the computational cost while retaining the characteristics features X-ray images.

3.: Fully connected layer

This layer is also known as a softmax layer (softmax activation function) since the output has the binary condition. In this proposed research, the number of categories to be classified is binary (either COVID-19 or normal). Therefore, a vector with two output neurons will be the output.

VGG16 is a convolutional neural network model for large-scale image recognition. The model can achieve 92.7% top-five test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes [3]. The VGG16 architecture is a Convolution Neural Network (CNN) architecture which won the ILSVR (Imagenet) competition in 2014. It is considered to be one of the most outstanding vision model architectures to date [5].

In the VGG16 model’s training phase, the performance is obtained by implementing fine-tuning. During fine-tuning, the authors adjust the deeper layers of the model, as the initial layer capture universal features like edges and textures that are common across most image-recognition task. Fine-tuned hyperparameters include learning rate, batch size, and number of epochs. A smaller learning rate is usually preferred to make only small updates to the weights, ensuring the pre-learned features are not drastically altered. The convolutional features improve the generalization performance of the model. In this phase, VGG16 CNN uses Max-Pooling and ReLU activation function. All the hidden layers use ReLU activation and the last Dense layer uses Softmax activation. Max-Pooling is performed over a 4 × 4-pixel window with a stride of two. VGG 16 has five convolutional blocks and three fully connected layers. Finally, all images will be classified as binary or multiclass classifications.

2.3. Confusion Matrix

The confusion matrix is a table used to evaluate the performance of a classification method. A confusion matrix is a graphical representation and summary of the performance of a classification algorithm. The proposed model and the established one were evaluated by computing four quantitative measures: accuracy, precision, recall, and area under the curve (AUC). Compared to other performance metrics, accuracy, precision, and recall provide sufficient information to validate the deep learning model effectively [21].

The developed models are evaluated by accuracy, precision, recall, and F1-Score. Accuracy, precision, recall, and F1 score are the performance metrics of an algorithm, which are calculated based on the True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). The formulation of the evaluation index parameters is as follows [24]:

Accuracy = \frac{TP + TN}{TN + FP + TP + FN}

(5)

Precision = \frac{TP}{FP + TP}

(6)

Recall = \frac{TP}{FN + TP}

(7)

F 1 - Score = \frac{2 \times Precision \times Recall}{Precision + Recall}

(8)

3. Proposed Methodology

A new deep learning model has been innovatively designed by merging a Generative Adversarial Network (GAN) and VGG16 Convolutional Neural Networks (CNN). This model adeptly handles imbalances in datasets. GANs, falling under unsupervised learning, can discern patterns in data and create synthetic outputs resembling the original data. They use a two-pronged approach: the generator creates new data while the discriminator identifies if the data are real or generated. This expands and diversifies the dataset for better generalization. Meanwhile, CNNs are one-directional networks that process images as tensors in an array. The highlighted VGG16 CNN model, with its layered structure, translates basic data into high-end features. This CNN model is trained using GAN’s synthetic images. The combined GAN-CNN model shows potential as an effective tool for COVID-19 diagnosis.

Figure 3 illustrates a flowchart for the structure of this research. Altogether, there are three main phases presented in this study. A detailed description of each phase is briefly explained.

3.1. Phase 1: Data Acquisition and Preprocessing

X-ray image dataset was gathered from a real patient dataset made available from https://www.kaggle.com/alifrahman/covid19-chest-xray-image-dataset (accessed on 12 December 2023). This dataset contains two folders; COVID and normal, and there are combined and modified versions available from https://www.kaggle.com/bachrr/covid-chest-xray (accessed on 12 December 2023) for COVID-19 chest X-rays and https://www.kaggle.com/alifrahman/chestxraydataset (accessed on 12 December 2023) for the chest-X-ray dataset. These two folders represent patients who became infected with COVID-19 and those who did not. This dataset contains 69 and 25 chest X-ray images data for COVID and normal. The main motivation behind creating this dataset was to identify COVID-19-affected patients efficiently. Once the datasets have been put together in one folder, the images from these two folders are imported into Jupyter Notebook for the analysis process. In dealing with unstructured data for images, related libraries such as OpenCV, Tensorflow, Keras and so forth are imported.

3.1.1. Re-Scale the Images

Since the images in the dataset may come from different sources, the image acquisition parameters are also different, and each image has a different pixel size. Therefore, there are considerable changes in the intensity and size of the image. Thus, we resized all the images to the dimensions 224 × 224 pixels.

Image normalization

Some of the images in the chest X-ray-image dataset used may come from different acquisition devices, and the devices’ parameters may be different. The pixel intensity of each image may vary considerably. Therefore, we normalised the intensity values of all images to be between [0, 1]. The benefit of normalization is that the model is less sensitive to small changes in weights and is easy to optimize.

3.1.2. Data Partitioning

As the model’s network deepens, the parameters to be learned will also increase, which will easily lead to overfitting. In this case, to solve the overfitting problem caused by the small number of training images, we have split the data into training and testing sets of 80% and 20%, respectively.

3.2. Phase 2: Apply the Generative Adversarial Network (GAN)

GANs generates synthetic images through the architectures of two main components which are generator and discriminator. GAN is designed to be an unsupervised learning tool that learns the distribution of data classes. It has better data distribution modelling and can train any generator network where CNN is used as the generator. The primary objective of the discriminator is to determine whether a given sample is part of the synthetic or actual distribution. It calculates samples according to the probability value. The discriminator in GAN learned to differentiate the actual images and synthetically generated images which enhanced the ability of the generator to learn about the actual image. Another point is the training process, where the generator and discriminator create images form a random noise input and are trained to distinguished between real images and fake images produced by generator, respectively. Adversarial training provides that the generator and discriminator are trained simultaneously in a competitive manner. Through this process, the generator learns to produce more convincing images. The training will continue until a point of convergence, where the generator produces images that the discriminator can no longer easily distinguish between real images.

The quality of these images can be evaluated through various methods, including visual inspection, quantitative metrics like Inception Score (IS), which measures how diverse (in terms of classes) the generated images are and how confidently a separate model (like Inception) can classify; and Fréchet Inception Distance (FID), compares the distribution features in real images to synthetic images. A lower FID indicates closer resemblance to real images. These evaluations are crucial in determining the effectiveness and applicability of GANs in different domains, particularly in fields requiring high-fidelity image generation.

3.3. Phase 3: Train the VGG16 Convolutional Neural Network

During the training phase of the VGG16 model, the fine-tuning technique is employed to enhance the model’s performance. The utilisation of convolutional features has been shown to enhance the overall generalisation performance of the model. During this stage, the VGG16 convolutional neural network (CNN) employs Max-Pooling and the rectified linear unit (ReLU) activation function. The ReLU activation function is utilised in all of the hidden layers, whereas the Softmax activation function is employed in the final Dense layer. Max-Pooling is conducted using a 4 × 4 pixel window and a stride of 2. The VGG 16 architecture consists of five convolutional blocks and three fully connected layers. Ultimately, the entirety of the X-ray images was categorised as either instances of COVID-19 or as normal cases. By simplifying the model’s architecture, incorporating dropout and batch normalization, using pre-trained layers, and adopting a training process that includes data augmentation, early stopping, regularization, fine-tuning with a lower learning rate, and cross-validation, the risk of overfitting can be significantly minimized. These strategies are particularly crucial when working with small datasets, as they help to ensure that the model learns generalizable patterns rather than memorizing the specific details of the limited data available.

The computational complexity of a modified VGG16 model is determined by factors such as its architectural depth, the degree of fine-tuning, the use of data augmentation techniques, and the selected regularisation methods. Through strategic simplification of the model, optimisation of the training process, and efficient utilisation of computing resources, this complexity can be effectively managed. The objective is to achieve a balance where the model remains computationally feasible while maintaining its capacity to effectively learn from small datasets and exhibit strong generalisation to new data.

3.4. Phase 4: Model Performance Comparison

This phase evaluates the GAN and VGG16 deep learning models using confusion matrix. Accuracy is the easiest classification metric. A confusion matrix or error matrix illustrates the number of correct and incorrect model predictions relative to the test set classifications or the type of errors. A confusion matrix with True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) would summarise the results.

The most crucial element, FN, should be the smallest and TP the largest to fit the model. The classification model’s accuracy, recall, precision, and F1-score will be assessed. Comparing these numbers amongst categorization models will determine the deployment model which various classification models are evaluated and compared to select the most suitable one for deployment to the end user. The metric graph area under curve (AUC) is used for binary classification. AUC is the likelihood that the classifier scores a random positive example higher than a random negative example. Model performance improves with higher values than [0, 1].

Enhancing the interpretability of deep learning models is crucial in numerous medical imaging applications. Selvaraju et al. [25] have developed the Gradient Weighted Class Activation Mapping (Grad-CAM) technique, which offers a clear and interpretable representation of deep learning models. The Grad-CAM technique generates visual explanations for neural networks with strong connections, providing insights into the model’s behaviour during detection or prediction tasks. Grad-CAM based colour visualisation approach [26] is used to accurately interpret the detection of radiology images and determine the subsequent course of action.

4. Results and Discussion

This section discusses and interprets the results of developing an integrated GAN and VGG16 and its application in X-ray images for COVID-19 detection. As mentioned earlier, the details of those points are extensively discussed in this section.

In data preparation, the input data, which currently have a range of [0, 1], are scaled to have a range of [−1, 1]. The purpose of this rescaling is that, the utilisation of the tanh activation function in the generator output often produces better results. Using a sigmoid activation function in the generator output is also prominent, which would not require rescaling the image further. This is due to the mathematical representation of the maps input values in an output range between 0 and 1. Rescaling in terms of resolution typically involves resizing the image and changing its width and height, which is a different process not directly related to the activation functions in neural networks. The resolution being used in this study is set to 64 × 64 pixels. This prevents the model from learning noise and specific orientations, making it more robust. One-hot encoding is performed on the labels in the normalisation process in order to create the label or dependent attribute for labelling images as either COVID or normal. One-hot encoding is the representation of categorical variables as binary vectors. Label Encodings convert labels or words into numeric form. Then, the data are partitioned into training and testing sets of 80 per cent and 20 per cent, respectively. The implementation of the scaling (augmentation) and splitting processes is crucial for assessing how well the model generalized new, unseen data, which is a key indicator of whether overfitting has occurred.

Once the data preparation has been completed, the development of the GAN model will start with the generator. The standard VGG16 architecture is designed for three channel RGB images. However, it can be modified to accept one channel images by adjusting the input layer. This modification is sometimes made when dealing exclusively with grayscale images. For the COVID dataset, the author starts with a 100-node latent vector, then reshapes it to 8 × 8 × 69 before connecting it to the Dense layer’s 4416 nodes. The data are then up-sampled to an output size of 64 × 64 by passing it via Transposed Convolutions. As we limited the number of filters from 207 to just 3, which represent the various colour channels, observe that we also employed ordinary convolution in the output layer. The process is similar for the normal dataset, it starts with a 100-node latent vector and reshapes to 8 × 8 × 25 before connecting it to the Dense layer’s 1600 nodes. The same goes for the COVID dataset, the data are up-sampled to an output size of 64 × 64 by passing it via Transpose Convolutions and limiting number of filters from 75 to 3. The generator and discriminator will be combined where the weight of the discriminator is not being trained for both datasets. They will be combined and compiled with the parameter setting loss function: binary_crossentropy, optimizer: Adam with learning rate: 0.0002. The combination of generator and discriminator models will become trainable GAN as depicted in Figure 4.

In contrast to the generator model, the discriminator model performs the opposite function. In other words, a 64 × 64 image is passed through many convolutional layers to provide a real or fake binary classification output. Then, the two models are combined to generate a Deep Convolutional GAN. The author makes the discriminator model non-trainable. This is carried out because the author needs to train the discriminator using a combination of real and created data.

To assist with the sampling and generation of data for the two models, the author develops three straightforward functions. In the first, real images are taken as samples from the training data, in the second, random vectors are drawn from the latent space, and in the third, latent variables are fed into a generator model to produce generated fake examples. Fake (generated) images are created once the models have been trained on the COVID and normal datasets, as shown in Figure 5 and Figure 6, and they are saved in a folder created for the further computational process.

The GAN models we developed are evaluated using accuracy and other measurements from the confusion matrix. Figure 5 and Figure 6 show some fake (generated) images created after training the model for 2000 epochs on the COVID and normal datasets.

The authors conducted a comparison study to predict COVID-19 by using the established VGG16 CNN model and integrated VGG16 CNN and deep convolutional GAN (VGG16 CNN + GAN) model. Ideally, the integrated VGG16 CNN + GAN would use the dataset to combine the real data and synthetic (generated) data from the GAN phase. From the original dataset, the COVID and normal datasets have 69 and 25 images, respectively. As the synthetics (generated) datasets have been developed by a deep convolutional GAN model for both datasets, the COVID and normal datasets become 117 and 108, respectively. The imbalance issue has been solved in conducting the classification model by creating the synthetic images. From the inputs, the VGG16 CNN + GAN model is developed with the same setting in hyperparameter tuning as the original VGG16 CNN model. The input mentioned uses VGG16 CNN + GAN with 224 × 224 pixel, Max-Pooling size is 2 × 2 and stride = 2. Figure 7 presents the images predicted by the proposed VGG16 CNN + GAN training data.

While running the code, when the epoch reaches 10, the validation loss is the lowest, and the training is stopped at this time. The author set an early stop during training to prevent the deterioration of the model’s generalization performance caused by continued training. Primarily, this choice could be justified if the model shows early convergence, achieving satisfactory performance within the first 10 epochs with no significant improvement in key metrics like accuracy or loss thereafter. GANs and CNNs may exhibit rapid convergence, particularly when the model designs are optimised, and the dataset is highly compatible with the objective. If the models exhibited satisfactory performance within 10 epochs, this could serve as a legitimate justification for implementing early stopping. This early stopping is crucial in preventing overfitting, where the model performs well on training data but poorly on unseen data, a common risk in prolonged training. Additionally, stopping at 10 epochs can be a strategic choice for resource optimization, saving computational time and power, particularly important in resource-intensive deep learning tasks.

Table 1 depicts the evaluation results using the confusion matrix for the proposed integrated deep leaning model (VGG16 CNN + GAN) and established VGG16 CNN. The accuracy, sensitivity, specificity, and F1-Score for the prediction of a test set by the proposed integrated deep learning model shows the results of 0.9655, 1, 0.8750, 0.96, respectively. The COVID-19 samples from a large chest X-ray dataset from GAN were fed into the proposed deep learning model, each with a same selection settings and hidden layers as the original VGG16 CNN. Additionally, the author selects model’s greatest performance on its own, and then assessed both models against the test set. The proposed VGG16 CNN + GAN model presents better results with perfect classification metrics, as presented in Table 1, compared to VGG16 CNN with accuracy, sensitivity, specificity, and F1-scores of 0.9474, 0.9286, 1 and 0.95, respectively. It shows that the proposed integrated deep learning model is promising. From the results gathered, the proposed integrated deep learning model (VGG16 CNN + GAN) solves problems like lack of data and uneven distribution by adding more and different types of data to the dataset. This makes the model more accurate, applicable, and strong. The method helps make diagnoses more accurate, works with new cases, can be scaled up, and does not cost a lot of money. This makes it especially useful in medical research and diagnosis.

The model is additionally assessed on test sets that were not exposed to the model during the training process. The proposed VGG16 CNN + GAN, with 224 × 224 pixel, Max-Pooling size of 4 × 4 and stride = 2, provides as AUC (Area Under the Curve) for COVID-19 of 0.98, indicating that the model can accurately differentiate between COVID-19 and normal instances for these conditions. However, for the established model, the AUC value is 0.95. This demonstrates that the proposed model exhibits greater consistency and resilience across all classes, even in the presence of non-uniform sample distribution. Figure 8 depicts the AUC scores for both models’ comparison.

Figure 9 depicts the application of Grad-CAM on a trained model to visualise the impact of COVID-19 on an infected individual. The original COVID-19 image (a) and the class activation map for COVID-19 (b) are shown to highlight the areas of interest in our model’s prediction. These areas are represented by high-intensity visuals in blue and green. The utilisation of Grad-CAM in this study amplifies the interpretability and explanatory capacity of the proposed deep learning model.

The results indicate that the patient diagnosed with COVID-10 is more likely to receive a False Positive result when tested using the proposed model. Hence, in order to achieve precise identification of COVID-19 cases with improved recall, it is recommended to train the model using radiology images that exhibit symptoms of COVID-19. This will enable us to accurately identify COVID-19 patients who were previously misdiagnosed as False Positives. This leads to an impartial identification of COVID-19 cases in a live situation.

The authors were also concerned regarding the generalisation issue. The ability of the model to adapt and appropriately respond to novel, previously unobserved data obtained from the same distribution as that used to develop the model is referred to as generalisation. Generalisation, in other words, evaluates how effectively a model can take in new data and produce accurate predictions after being trained on a training set. The success of a model depends on how effectively it can generalise. A model cannot generalise if it is trained on training data too thoroughly. When presented with new evidence in such circumstances, it will ultimately make incorrect predictions. Even though the model is capable of producing accurate predictions for the training data set, this would render it useless. Overfitting is the term for this. The proposed integrated VGG16 CNN and deep convolutional GAN provide almost 97 per cent accuracy, it may be the overfitting issue here. Some more activities can reduce the overfitting and make the model robust such as adding more data, use data augmentation, using architectures that generalize well, add regularization (mostly dropout, L1/L2 regularization are also possible) and reducing architecture complexity.

5. Conclusions

The World Health Organization has classified COVID-19 as a global pandemic, and with the infection rate rising quickly globally, a reliable disease detection system is required. In this study, a modified and extended of VGG16 CNN model with GAN presentation is developed to propose an efficient diagnostic method for identifying and distinguishing COVID-19 cases in chest X-rays. In developing the GAN model, the author has developed the discriminator to determine whether a given sample is part of the synthetic or actual distribution. It calculates samples according to the probability value. GAN has better data distribution modelling and can train any generator network where CNN is used as the generator. The discriminator in GAN learned to differentiate the actual images and synthetically generated images which enhanced the ability of the generator to learn about the actual image. In generating synthetic dataset, other than GANs, simulation is one of the good approaches to enlarge the dataset [27,28].

The development of the VGG16 CNN used the real and synthetic (generated) dataset from GAN and implemented two different settings of hyperparameter tuning process with VGG16 CNN with 150 × 159 pixel, Max-Pooling size 4 × 4 and no stride and VGG16 CNN with 224 × 224 pixel, Max-Pooling size 4 × 4 and stride = 2. This proposed integrated deep learning model is compared with the established VGG16 model. This best tuning achieved an accuracy of 96.55%. Compared with the proposed deep learning (VGG16 CNN + GAN) model, which was the starting point of this work, not only has the diagnostic accuracy of this model been improved but several other evaluation indicators have also been significantly improved to be almost perfect. As mentioned in the previous section, the concern towards overfitting must be considered in reducing it to make sure the model provides reliability and robust in predicting COVID-19 cases from X-ray images.

The effective deep learning models that were used to identify COVID-19 in X-ray pictures suggest that deep learning still has a lot of untapped potential and may be able to contribute more to the fight against this epidemic. However, this study has certain drawbacks; in particular, there is undoubtedly still potential for improvement, which may be achieved through additional procedures like expanding the number of photos and putting preprocessing methods (such as data augmentation and/or image enhancement) into practice for classification problems. A more thorough analysis necessitates more patient data, particularly COVID-19 data. In normal practice, efficient deep learning models necessitate training on a dataset comprising a lot of image datasets, a task that poses a significant challenge in the medical sector. Additionally, it’s possible that training deep neural networks on a small dataset will lead to overfitting and prevent generalisation. Deep transfer learning can be used in conjunction with visual ablation studies to significantly increase the ability to detect COVID-19 symptoms in chest X-ray images. The outcomes of the proposed integrated deep learning model are promising, and the pattern demonstrates the development of VGG16 CNN + GAN architecture. While the proposed study does not provide the ideal automated system for the detection of the novel COVID-19, the model can be tested on larger datasets to ensure that the methodology would work as intended in real-world situations.

In this research study involving GANs and the VGG16 model, several limitations are present, such as limited data issues, in which limited or undiversified data may hinder GAN and VGG16 learning. GAN’s synthetic images may not accurately represent real-world situations, affecting model accuracy. Another limitation is an advanced model requires a computing power that some researchers cannot afford or access. Overfitting issues are prominent issues in developing machine learning and deep learning models in which the model may perform well on trained data but poorly on new data. Challenges in validating a synthetic dataset mean that it can be tough to judge how good the synthetic data are, and the metrics used to evaluate the model might not fully capture its effectiveness in real-world scenarios.

To broaden the utility of the proposed model beyond COVID-19 diagnosis, several key areas for future work can be considered. First, adapting the model to diagnose lung cancer and pneumonia from data case studies. This would require retraining the model with diverse datasets relevant to these conditions. Second, adding multiclass classification would make the model more useful for diagnosing multiple diseases from medical images. Thirdly, integrating the model with electronic health records (EHRs) could improve diagnostics by combining image analysis and patient history. The model’s versatility could be increased by adding cross-modality image analysis like magnetic resonance imaging (MRI) and ultrasound.

Author Contributions

Main text, K.M.N.K.K., W.C.S. and A.G.; GAN analysis, K.M.N.K.K., A.G. and A.S.A.B., VGG16 analysis, K.M.N.K.K., W.C.S. and A.G.; COVID-19 analysis, N.A.S.; supervision, W.C.S. and A.G.; funding acquisition, K.M.N.K.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Universiti Malaysia Pahang Al-Sultan Abdullah under UMPSA Internal Research Grants (RDU) No. RDU230375 and UMPSA Postgraduate Research Grants Scheme (PGRS) No. PGRS220301.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors would like to thank Universiti Malaysia Pahang Al-Sultan Abdullah for laboratory facilities as well as additional financial support under the UMPSA Internal Research Grants (RDU) No. RDU230375 and UMPSA Postgraduate Research Grants Scheme (PGRS) No. PGRS220301.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

CNNs	Convolutional Neural Networks
DCNNs	Deep Convolutional Neural Networks
VGG	Visual Geometry Group
GANs	Generative Adversarial Networks
CV	Computer Vision
CT	Computerised Tomography
RT-PCR	Reverse Transcription Polymerase Chain Reaction
CXR	Chest X-ray
ReLU	Rectified Linear Unit
TP	True Positive
TN	True Negative
FP	False Positive
FN	False Negative
EHR	Electronic Health Records
MRI	Magnetic Resonance Imaging
HIS	Hyperspectral Imaging
IS	Inception Score

References

Bansal, M.; Kumar, M.; Sachdeva, M.; Mittal, A. Transfer learning for image classification using VGG19: Caltech-101 image data set. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 3609–3620. [Google Scholar] [CrossRef]
Elyan, E.; Vuttipittayamongkol, P.; Johnston, P.; Martin, K.; McPherson, K.; Moreno-García, C.F.; Jayne, C.; Sarker, M.K. Computer vision and machine learning for medical image analysis: Recent advances, challenges, and way forward. Artif. Intell. Surg. 2022, 2, 24–45. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
Ahmed, H.M.; Abdullah, B.W. Overview of deep learning models for identification COVID-19. Mater Today Proc. 2021. online ahead of print. [Google Scholar] [CrossRef]
Sitaula, C.; Hossain, M.B. Attention-based VGG-16 model for COVID-19 chest X-ray image classification. Appl. Intell. 2021, 51, 2850–2863. [Google Scholar] [CrossRef] [PubMed]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Adva. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar] [CrossRef]
Chung, M.; Bernheim, A.; Mei, X.; Zhang, N.; Huang, M.; Zeng, X.; Cui, J.; Xu, W.; Yang, Y.; Fayad, Z.A.; et al. CT imaging features of 2019 novel coronavirus (2019-NCoV). Radiology 2020, 295, 202–207. [Google Scholar] [CrossRef] [PubMed]
Narin, A.; Kaya, C.; Pamuk, Z. Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks. Pattern Anal. Appl. 2021, 24, 1207–1220. [Google Scholar] [CrossRef]
Bhattacharyya, A.; Bhaik, D.; Kumar, S.; Thakur, P.; Sharma, R.; Pachori, R.B. A deep learning based approach for automatic detection of COVID-19 cases using chest X-ray images. Biomed. Signal Process. Control. 2022, 71, 103182. [Google Scholar] [CrossRef]
DeGrave, A.J.; Janizek, J.D.; Lee, S.I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 2021, 3, 610–619. [Google Scholar] [CrossRef]
Wang, L.; Lin, Z.Q.; Wong, A. COVID-Net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Sci. Rep. 2020, 10, 19549. [Google Scholar] [CrossRef]
Mahmood, S.; Saud, L. An Efficient Approach for Detecting and Classifying Moving Vehicles in a Video Based Monitoring System. Eng. Technol. J. 2020, 38, 832–845. [Google Scholar] [CrossRef]
Nodules, L.; Zhao, D.; Zhu, D.; Lu, J.; Luo, Y. Synthetic Medical Images Using F & BGAN for Improved. Symmetry 2018, 10, 519. [Google Scholar] [CrossRef]
Ai, T.; Yang, Z.; Hou, H.; Zhan, C.; Chen, C.; Lv, W.; Tao, Q.; Sun, Z.; Xia, L. Correlation of Chest CT and RT-PCR Testing for Coronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases. Radiology 2020, 296, E32–E40. [Google Scholar] [CrossRef] [PubMed]
Jin, C.; Chen, W.; Cao, Y.; Xu, Z.; Tan, Z.; Zhang, X.; Deng, L.; Zheng, C.; Zhou, J.; Shi, H.; et al. Development and evaluation of an artificial intelligence system for COVID-19 diagnosis. Nat. Commun. 2020, 11, 5088. [Google Scholar] [CrossRef]
Li, X.; Zeng, X.; Liu, B.; Yu, Y. COVID-19 infection presenting with ct halo sign. Radiol. Cardiothorac. Imaging 2020, 2, 230022. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; Han, R.; Rao, Y. A new feature pyramid network for object detection. In Proceedings of the 2019 International Conference on Virtual Reality and Intelligent Systems, ICVRIS 2019, Jishou, China, 14–15 September 2019; pp. 428–431. [Google Scholar] [CrossRef]
Jiang, N.; Cao, Y.; Alwalid, O.; Gu, J.; Fan, Y.; Zheng, C. Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: A descriptive study. Lancet Infect. Dis. 2020, 20, 425–434. [Google Scholar] [CrossRef]
Mukherjee, H.; Ghosh, S.; Dhar, A.; Obaidullah, S.M.; Santosh, K.C.; Roy, K. Deep neural network to detect COVID-19: One architecture for both CT Scans and Chest X-rays. Appl. Intell. 2021, 51, 2777–2789. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Shen, F.; Hu, L.; Lang, Z.; Liu, Q.; Cai, F.; Fu, L. A Stare-Down Video-Rate High-Throughput Hyperspectral Imaging System and Its Applications in Biological Sample Sensing. IEEE Sens. J. 2023, 23, 23629–23637. [Google Scholar] [CrossRef]
Sharmila, V.J.; Florinabel, D.J. Deep Learning Algorithm for COVID-19 Classification Using Chest X-ray Images. Comput. Math. Methods Med. 2021, 2021, 9269173. [Google Scholar] [CrossRef]
Asghar, U.; Arif, M.; Ejaz, K.; Vicoveanu, D.; Izdrui, D.; Geman, O. An Improved COVID-19 Detection using GAN-Based Data Augmentation and Novel QuNet-Based Classification. Biomed. Res. Int. 2022, 2022, 8925930. [Google Scholar] [CrossRef]
Vint, D.; Anderson, M.; Yang, Y.; Ilioudis, C.; Di Caterina, G.; Clemente, C. Automatic target recognition for low resolution foliage penetrating sar images using cnns and gans. Remote Sens. 2021, 13, 596. [Google Scholar] [CrossRef]
Ji, D.; Zhang, Z.; Zhao, Y.; Zhao, Q. Research on Classification of COVID-19 Chest X-ray Image Modal Feature Fusion Based on Deep Learning. J. Healthc. Eng. 2021, 2021, 6799202. [Google Scholar] [CrossRef] [PubMed]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]
Panwar, H.; Gupta, P.K.; Khubeb, M.; Morales-menendez, R. A deep learning and grad-CAM based color visualization approach for fast detection of COVID-19 cases using chest X-ray and CT-Scan images. Chaos Solitons Fractals 2020, 140, 110190. [Google Scholar] [CrossRef]
Rutkowski, D.R.; Roldán-Alzate, A.; Johnson, K.M. Enhancement of cerebrovascular 4D flow MRI velocity fields using machine learning and computational fluid dynamics simulation data. Sci. Rep. 2021, 11, 10240. [Google Scholar] [CrossRef] [PubMed]
Hu, P.; Cai, C.; Yi, H.; Zhao, J.; Feng, Y.; Wang, Q. Aiding Airway Obstruction Diagnosis with Computational Fluid Dynamics and Convolutional Neural Network: A New Perspective and Numerical Case Study. J. Fluids Eng. Trans. ASME 2022, 144, 081206. [Google Scholar] [CrossRef]

Figure 1. Images for chest X-ray and CT images for COVID-19 [19].

Figure 2. The general architecture of deep convolutional GAN [23].

Figure 3. Flowchart of the proposed research.

Figure 4. Trainable GAN model diagram.

Figure 5. GAN model being trained for COVID dataset.

Figure 6. GAN model being trained for normal dataset.

Figure 7. The proposed VGG16 CNN-GAN model is being trained for COVID dataset.

Figure 8. AUC scores for both models.

Figure 9. Visualisation of original COVID-19 (a) and a class activation map of COVID-19 (b).

Table 1. Evaluation metrics results for accuracy, sensitivity, specificity, and F1-score.

Method	Accuracy	Sensitivity	Specificity	F1-Score
Proposed − VGG16 CNN + GAN with 224 × 224 pixel, Max-Pooling size 4 × 4 and stride = 2	0.9655	1	0.8750	0.96
VGG16 CNN with 224 × 224 pixel, Max-Pooling size 4 × 4 and stride = 2	0.9474	0.9286	1	0.95

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Khalif, K.M.N.K.; Chaw Seng, W.; Gegov, A.; Bakar, A.S.A.; Shahrul, N.A. Integrated Generative Adversarial Networks and Deep Convolutional Neural Networks for Image Data Classification: A Case Study for COVID-19. Information 2024, 15, 58. https://doi.org/10.3390/info15010058

AMA Style

Khalif KMNK, Chaw Seng W, Gegov A, Bakar ASA, Shahrul NA. Integrated Generative Adversarial Networks and Deep Convolutional Neural Networks for Image Data Classification: A Case Study for COVID-19. Information. 2024; 15(1):58. https://doi.org/10.3390/info15010058

Chicago/Turabian Style

Khalif, Ku Muhammad Naim Ku, Woo Chaw Seng, Alexander Gegov, Ahmad Syafadhli Abu Bakar, and Nur Adibah Shahrul. 2024. "Integrated Generative Adversarial Networks and Deep Convolutional Neural Networks for Image Data Classification: A Case Study for COVID-19" Information 15, no. 1: 58. https://doi.org/10.3390/info15010058

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integrated Generative Adversarial Networks and Deep Convolutional Neural Networks for Image Data Classification: A Case Study for COVID-19

Abstract

1. Introduction

2. Preliminaries

2.1. Generative Adversarial Networks

2.2. VGG16 Convolutional Neural Network

2.3. Confusion Matrix

3. Proposed Methodology

3.1. Phase 1: Data Acquisition and Preprocessing

3.1.1. Re-Scale the Images

3.1.2. Data Partitioning

3.2. Phase 2: Apply the Generative Adversarial Network (GAN)

3.3. Phase 3: Train the VGG16 Convolutional Neural Network

3.4. Phase 4: Model Performance Comparison

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI