Evaluating Deep Learning Resilience in Retinal Fundus Classification with Generative Adversarial Networks Generated Images

Di Giammarco, Marcello; Santone, Antonella; Cesarelli, Mario; Martinelli, Fabio; Mercaldo, Francesco

doi:10.3390/electronics13132631

Open AccessEditor’s ChoiceArticle

Evaluating Deep Learning Resilience in Retinal Fundus Classification with Generative Adversarial Networks Generated Images

by

Marcello Di Giammarco

^1,2,*,

Antonella Santone

³,

Mario Cesarelli

⁴,

Fabio Martinelli

¹ and

Francesco Mercaldo

^1,2,*

¹

Institute for Informatics and Telematics (IIT), National Research Council of Italy (CNR), 56124 Pisa, Italy

²

Department of Information Engineering, University of Pisa, 56124 Pisa, Italy

³

Department of Medicine and Health Sciences “Vincenzo Tiberio”, University of Molise, 86100 Campobasso, Italy

⁴

Department of Engineering, University of Sannio, 82100 Benevento, Italy

^*

Authors to whom correspondence should be addressed.

Electronics 2024, 13(13), 2631; https://doi.org/10.3390/electronics13132631

Submission received: 3 June 2024 / Revised: 26 June 2024 / Accepted: 3 July 2024 / Published: 4 July 2024

(This article belongs to the Special Issue Human-Computer Interactions in E-health)

Download

Browse Figures

Versions Notes

Abstract

The evaluation of Generative Adversarial Networks in the medical domain has shown significant potential for various applications, including adversarial machine learning on medical imaging. This study specifically focuses on assessing the resilience of Convolutional Neural Networks in differentiating between real and Generative Adversarial Network-generated retinal images. The main contributions of this research include the training and testing of Convolutional Neural Networks to evaluate their ability to distinguish real images from synthetic ones. By identifying networks with optimal performances, the study ensures the development of better models for diagnostic classification, enhancing generalization and resilience to adversarial images. Overall, the aim of the study is to demonstrate that the application of Generative Adversarial Networks can improve the resilience of the tested networks, resulting in better classifiers for retinal images. In particular, a network developed by authors, i.e., Standard_CNN, reports the best performance with accuracy equal to 1.

Keywords:

deep learning; Generative Adversarial Networks; retinal images; classification; machine learning

1. Introduction

In recent years, Generative Adversarial Networks (GANs) have emerged as a potent instrument within the domain of medical imaging and healthcare data analysis. In 2014, GANs, as introduced by Goodfellow et al. [1], have transformed the generative modeling field by setting two neural networks, namely the generator and the discriminator, in opposition within a game-theoretic framework. This adversarial training mechanism empowers GANs to grasp intricate data distributions and produce authentic samples that mirror the training data.

Within the medical domain, GANs exhibit substantial potential for a variety of applications, encompassing medical image synthesis [2,3], data augmentation [4,5], anomaly detection [6,7], and disease diagnosis [8,9]. Through the utilization of extensive datasets of medical images, GANs can generate synthetic images with diverse anatomical variances, thereby facilitating the augmentation of training data for resilient deep learning models. Furthermore, GANs facilitate the generation of realistic pathological images, contributing to the study of rare diseases and abnormalities. Moreover, GANs have played a crucial role in tasks related to medical image reconstruction and enhancement. Employing methodologies like image-to-image translation [10], GANs can transform substandard or noisy medical images into high-fidelity, artifact-free representations, consequently enhancing the precision of diagnostic procedures and treatment strategies. GANs have exhibited potential in expediting cross-modality medical image synthesis [11], whereby images from one modality (e.g., MRI) are transmuted into another modality (e.g., CT scans). This capability carries substantial implications for multimodal medical image analysis and fusion, thereby enabling comprehensive evaluations of patient health status through complementary imaging modalities. Models with high performances after the adversarial machine learning application can be able to better distinguish images with perturbations, like data poisoning attacks [12,13] to misclassify biomedical images, incurring in diagnosis errors and worsening the health of patients with the wrong therapeutic procedure.

Taking into account these GAN applications, this paper focuses on the resilience evaluation of Convolutional Neural Networks CNNs after the classification between real retinal images and GAN-generated images. In other words, this paper’s motivation concerns the network evaluation to distinguish real images and synthetic ones; so, to understand if this approach can be used for malicious purposes, for instance, for altering the diagnosis process.

The main contributions of the paper are below reported:

Training and testing of seven CNNs (one designed and developed by the authors), to evaluate the ability to distinguish real and GAN-generated images;
In this way, some networks that provide optimal performances, guarantee better models for further diagnostic classification, within a better generalization and good resilience to adversarial images;
The experimental analysis was conducted considering three input sizes of images, starting from 28 × 28 pixels of retinal images, going through 64 × 64 and finally resized to 128 × 128 pixels, from a validated and certified public dataset. This means the analysis was enlarged to different study cases, enhancing the initial variability.

The paper proceeds as follows: in the next section, we provide a state-of-the-art overview related to the adoption of GANs for medical image analysis, Section 3 presents the proposed method, the experimental analysis is shown in Section 4 and, finally, in the last section, conclusions and future research plans are drawn.

2. Related Work

In this section, a literature review on GAN application in medical contests is exploited. Starting from the survey of [14], a general overview of the Generative Adversarial networks in ophthalmology was presented. This paper identifies various applications of GAN in ophthalmology, including segmentation, data augmentation, denoising, domain transfer, super-resolution, post-intervention prediction, and feature extraction. It emphasizes that the adoption of GAN in ophthalmology is still in the early stage of clinical validation compared to deep learning classification techniques, but proper selection of GAN technique and statistical modeling of ocular imaging can greatly improve image analysis performance. Another interesting work is of Zekuan Yu and his collogues [15]. They introduce a new preprocessing pipeline called multiple channels–multiple landmarks (MCML) for synthesizing color fundus images from a combination of vessel tree, optic disc, and optic cup images. Authors design a new Pix2pix structure with a ResU-net generator, which achieves superior PSNR and SSIM performance compared to other GANs, demonstrating that high-resolution paired images are beneficial for improving the performance of each GAN. The paper of Chen et al. [16] reports that the majority of experts (59%) were unable to discern real fundus images from synthetic ones generated by a GAN. The GAN used in the study was trained on RetCam images from North American infants screened for retinopathy of prematurity (ROP). The study suggests that synthetic fundus images generated by GANs have potential applications in deep learning data augmentation, medical education, and addressing privacy concerns. Costa et al. [17] proposed another way of retinal picture synthesis. They trained an adversarial technique on vascular networks and their accompanying retinal fundus pictures. In other words, they study how the vascular trees change into the retinal fundus. The primary shortcoming of their strategy is the reliance on an independent algorithm to segment the boats. Costa et al. published another study that improved on their prior work. Costa et al. [18] trained an autoencoder with the original vessel trees rather than learning a transformation between them and the accompanying retinal picture. Then, the synthetic vessel trees are sent into the retinal image synthesizer. Although the latter technique presented by the authors is a significant advance over their prior work, both systems rely on how successfully the independent method extracts the vessels. The quality of the segmented vessel tree influences the synthetic vessel trees and, ultimately, the final retinal picture. Another interesting work is in [19]. The authors’ approach consists of training on a significantly larger dataset (86,926 retinal images) than prior works, leading to higher-quality synthetic images. The proposed SS-DCGAN classifier was tested on a dataset split into 70% training and 30% testing, demonstrating superior performance compared to existing methods such as those by Chen et al. [20] and Alghamdi et al. [21], despite having a simpler architecture (4 layers compared to 6 and 10 layers, respectively). These contributions highlight the efficacy of using GANs and semi-supervised learning for both generating high-quality medical images and enhancing the performance of medical image classifiers, particularly in the context of glaucoma assessment. The paper [22] explored the use of DCGANs for generating synthetic Magnetic Resonance Imaging (MRI) data, particularly focusing on brain tumor images. The study emphasizes the role of synthetic data generation in addressing the problem of limited medical datasets. The training process, illustrated through loss function graphs at various epochs (1K, 5K, 10K, 20K), shows a stable convergence between the generator and discriminator models. This stability is crucial for the generation of high-quality synthetic images.

In [23], authors introduce a defensive model against the adversarial speckle-noise attack and a feature fusion strategy for preserving correct labeling in the classification of retinal fundus images for diabetic retinopathy recognition. The proposed defensive model is robust and achieves 99% accuracy in classifying retinal fundus images.

Looking at other field applications, such as brain tumors MRI detection, Shin et al. [24] utilized supervised GAN with and without data augmentation for generating synthetic MR tumor images from respective segmentation masks using BRATS 2015 dataset for 200 epochs on NVIDIA DGX systems. Researchers observed an enhanced throughput with the inclusion of synthetic data on dice score evaluations.

Concerning other GAN-related works, in the paper of Kwon et al. [25], the authors proposed a new image generation system by using GAN techniques, called CAPTCHA. To evaluate the performance of the proposed schemes, they used a script aimed to break the Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) controls as a CAPTCHA solver. Then, the authors compared the resistance of the original source images and the generated CAPTCHA images against the CAPTCHA solver. The results show that the proposed schemes improve the resistance to the CAPTCHA solver by over 67.1% and 89.8% depending on the system.

Regarding possible attack using GAN, [26] was presented Jekyll, a deep generative model framework that can manipulate biomedical images to indicate an attacker-chosen disease condition while preserving the patient’s identity. The framework successfully generates “fake” medical images that can mislead both medical professionals and algorithmic detection schemes. The authors demonstrate successful attacks using Jekyll on X-rays and retinal fundus images, which are commonly used for medical diagnostics.

The adoption of GAN is emerging also in other contexts, different from the medical one for instance, in cybersecurity. As a matter of fact, recently, the research topic related to the use of GANs for malware generation has been expanding. Some notable advancements in this field include the following: (i) the creation of new GAN architectures that are more effective at generating realistic malware samples; (ii) the development of innovative training techniques that enable GANs to be trained on larger and more complex datasets of malware samples; and (iii) the development of new methods for evaluating the quality of generated malware samples.

For instance, Nguyen et al. [27] explore the possibility of considering a GAN for the generation of malicious software, while Renjith et al. [28] presented a method designed to generate feature vectors for creating evasive Android malware and subsequently modifying the malware accordingly. Their proposal offers a twofold contribution: firstly, it can be used to create datasets for validating detectors of GAN-based malware, and secondly, it can augment training and testing datasets to enhance the robustness of malware classifiers. Authors in [29,30] consider the possibility of training a GAN to generate images related to Android malware, with the aim to understand whether a supervised machine learning classifier is able to distinguish between images obtained from real-world applications and GAN-generated ones.

Nagaraju et al. [31] proposed a method focused on generating counterfeit malware images using GANs and assessing the efficacy of various techniques for classifying these generated images. Their findings showed that the resulting multiclass classification problem presents challenges, but they achieved compelling results when limiting the problem to distinguishing between real and fake samples. The primary conclusion from the paper is that while the GAN-generated images may closely resemble authentic malware images, they do not reach the level of deep fake malware images from a deep learning perspective.

Researchers in [32] introduced a GAN named MalGAN, aimed at generating adversarial malware applications. In this setup, a neural network-based detector was used to fit the black-box detector, while a generator was trained to produce adversarial examples capable of deceiving the substitute detector.

Yuan and colleagues [33] designed and developed GAPGAN, a GAN specifically intended to generate adversarial padding bytes. In their attack framework, they converted the input discrete malware binaries into a continuous space and then input them into the generator of GAPGAN to produce adversarial payloads. By appending these payloads to the original binaries, they created adversarial samples that maintain their functionality.

3. The Method

In this section, we exploit the entire procedure based on the CNN’s resilience during the GAN application over the retinal image classification contest.

3.1. GAN: Operating Principles

The proposed method is focused on the GAN, so a block schematization of how the GAN works is reported in Figure 1.

The primary operational concept of a GAN is centered on a competitive learning framework involving two neural networks: the generator and the discriminator. The generator accepts random noise (latent vectors) as input and produces synthetic data samples, learning to map the latent space to the data space to create data resembling the actual data distribution. Initially, the generated samples are of low quality, but with training, it enhances its capacity to generate more realistic samples. The discriminator functions as a binary classifier that distinguishes between real and GAN-generated data samples produced by the generator, learning to assign high probabilities to real samples and low probabilities to fake ones. At the outset, the discriminator’s performance may be random, yet with training, it becomes proficient in discriminating real from fake samples. Throughout the training process, both the generator and discriminator are concurrently trained competitively. The primary goal of the generator is to create samples that are undetectable from real data, aiming to deceive the discriminator. In contrast, the discriminator aims to accurately classify real and fake samples. As the training advances, both networks progressively enhance: the generator improves in generating authentic samples, while the discriminator enhances its ability to differentiate original samples from synthetic ones. The interaction between the generator and discriminator is adversarial, where the generator attempts to minimize the discriminator’s capability to distinguish real and synthetic samples by producing increasingly authentic samples, while the discriminator endeavors to optimize its capacity to discern real from fake samples. This adversarial procedure leads to a Nash equilibrium, where the generator produces samples that are statistically similar to real data, and the discriminator struggles to reliably differentiate between original and GAN-generated samples. Ideally, the GAN reaches convergence when the generator generates samples that are identical to real data, and the discriminator cannot distinguish real and synthetic samples better than chance. Achieving convergence can be complex and necessitates meticulous adjustment of hyperparameters, network structures, and training methodologies. In essence, the central operational concept of a GAN lies in the adversarial training dynamics between the generator and discriminator, culminating in the production of high-quality synthetic data. With generator loss, it identifies a measure of how well the generator is performing at generating realistic data. Typically, this loss is computed based on the discrepancy between the generated data and the real data. Instead, discriminator loss refers to a measure of how well the discriminator can differentiate real data from generated data. It is computed based on the discriminator’s ability to correctly classify samples as real or fake [30].

After this general background in our work, we take into account a specific type of GAN, called DCGAN (Deep Convolutional Generative Adversarial Network [34]). The operating principles are the same as already illustrated, but DCGAN represents an extension of the GAN concept by incorporating deep convolutional neural networks. DCGANs have been widely used for tasks such as image generation, super-resolution, and style transfer due to their ability to produce high-quality images compared to earlier GAN architectures. In a DCGAN, the discriminator’s output consists of a loss indicating its ability to distinguish real images from those generated by the generator (discriminator loss) and a loss reflecting how well the generator can fool the discriminator (generator loss). During backpropagation, the discriminator’s weights are updated to minimize the discriminator loss, improving its accuracy, while the generator’s weights are updated to minimize the generator loss, enhancing its capability to produce realistic images. The choice of DCGAN architecture significantly enhances the quality of generated images through its use of convolutional layers, hierarchical feature learning, and architectural enhancements like batch normalization and Leaky ReLU. This high image quality, in turn, impacts the CNN’s resilience by providing a challenging and rich training environment. CNNs must develop sophisticated feature extraction and discrimination techniques to differentiate between real and DCGAN-generated images, leading to improved robustness and generalization capabilities.

3.2. The Approach

In this subsection, we illustrate our GAN-based approach to the CNNs evaluation in the case of fundus retinal images. A possible schematization of the proposed method with the main steps is shown in Figure 2.

As shown in Figure 2, the starting point is the datasets. The two datasets are downloaded in the npz file format from the MedMNIST website (https://medmnist.com/, (accessed on 2 July 2024)). These datasets, i.e., retinalmnist and retinalmnist_64, consist of retinal bioimages obtained by a fundus camera with dimensions of 28 × 28 and 64 × 64 pixels, respectively. After the extraction, the considered dataset (train_test.npy file) counts 1080 images for each dataset. The second phase includes several pre-processing steps for the preparation of the sample sets for the GAN application.

With 50 epochs training for generation fake images, very similar images are generated without any visible initial random noise. At the end of the adversarial image generation, the images are transferred into the correct folder in Google Drive. For each generation epoch, 1000 images are obtained, starting from the first epoch in which the distortion is completely superimposed on the input image, ending in the final epoch in which the distortion is imperceptible by medics. In this way, the core of our research concerns the outcome of the main DL network classification related to the ability of these networks to distinguish real and GAN images. In other words, like the final step, the result evaluation, in terms of metrics (accuracy, precision, recall, and loss), reports the resilience after the GAN application in medical imaging. The DL network taking into account are the following: ResNet50 [35], DenseNet [36], VGG19 [37], Standard_CNN [38,39], Inception-V3 [40], EfficientNet [41] and MobileNet [42]. Figure 2 is reported also as the study case of a resized dataset. These networks are evaluated also for the 128 × 128 retinal dataset, resized from the original dataset with a Python script. This step is to have better discrimination considering the image size input as a discriminant feature. This operating choice is since the previous datasets, i.e., 28 × 28 and 64 × 64 pixels, resulting small for qualitative evaluation by medics; on the other hand, the 128 × 128 represents a more realistic instance. The Standard_CNN is the CNN designed and developed by the authors and this network is successfully used in [38,39] for the diabetic retinopathy diagnosis, with retinal fundus images. Table 1 shows the layers architecture of the network.

The Standard_CNN model, developed by authors, comprises 14 layers including:

Conv2D: this layer performs spatial convolution on 2D input data using learnable filters to extract features relevant to the task, such as image processing;
MaxPooling2D: this layer is a fundamental component in CNNs, primarily used for down-sampling feature maps. It operates by sliding a window (often referred to as a “kernel” or “pooling window”) over the input feature map and selecting the maximum value within each window. This maximum value becomes the output for that particular region, thus summarizing the presence of certain features in that region;
Flatten: its primary function is to convert the input data into a one-dimensional array, also known as a vector. This transformation is crucial for connecting the output of one layer to the input of another layer with a different shape, such as when transitioning from convolutional layers to fully connected layers;
Dropout: this layer aims to prevent overfitting by randomly deactivating neurons during training. It randomly sets input units to 0 with a frequency of rate at each step during training time. It is aimed to prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 − rate) such that the sum over all inputs is unchanged;
Dense: this layer, also known as a fully connected layer, is a type of layer that connects every neuron in one layer to every neuron in the next layer. Dense layers in CNNs are used towards the end of the network architecture to transform the high-level features extracted by convolutional and pooling layers into predictions or decisions. It provide a way for the network to learn complex patterns and relationships in the data, making them a critical component of CNN architectures.

These layers collectively form a comprehensive architecture for tasks like image processing and classification, leveraging convolutional, pooling, and fully connected layers efficiently.

4. Experimental Analysis

In the experimental analysis section, the cited networks are trained and tested in a binary classification that aims to discriminate real retinal images from fake retinal images generated through GANs.

After the GAN image generation, both of the classes, i.e., original and GAN, are split at an 80-10-10 rate for the training, validation, and testing sets, respectively. Figure 3 reveals the samples of the three binary classifications. The datasets are trained and tested on the previous seven networks with the following optimal hyper-parameters combination: 50 epochs, 6 batch size, 0.0001 learning rate, and the input size as size image. This combination derives from several tests and results as the best average combination for all the networks.

Table 2, Table 3 and Table 4 show the network metrics performances for the different image size inputs; 28 × 28, 64 × 64 and 128 × 128.

From the tables, it is possible to observe how the tested network reports two possible results: (1) good resilience, so an optimal capacity to distinguish fake images from real ones, and (2) the classification fails to discriminate the images, classifying all images in the same class.

Table 2 shows that Standard_CNN, MobileNet, DenseNet and ResNet50 models have a good performance in all the metrics; instead, the VGG19, Inception V3 and EfficientNet cannot distinguish real images and fake ones. In Table 3, in which the input size images is 64 × 64 pixels, the performance behavior is almost the same. Only the EfficientNEt and Inception V3 model drastically improves their performances. DenseNet, ResNet50 and Inception present the best metrics. Finally, for the resized dataset, shown in Table 4, the best models are MobileNet, Standard_CNN, ResNet50 and DenseNet. The other three models follow the behavior of the first dataset; in fact, the resized dataset is a resized version of the first dataset.

From these results, some considerations are evident:

From a general point of view, when the dataset changes to 64 × 64 pixels, a model performance improves in terms of accuracy, precision, and recall;
On the other hand, an increasing loss appears, but this behavior represents not relevant overfitting because this value depends also on the training loss; in all of cases the difference is negligible;
The main contributions remain the improvement in the resilience of the networks, resulting in models with a greater generalization during the training, so in this case, the GAN application makes networks better classifiers for this kind of medical image.
VGG19 model represents the worst model for this problem. The model allocates all images in the original class and completely does not distinguish real and fake retinal images.

Regarding the Standard_CNN, this network confirms the best performances for all the dataset sets. The Standard_CNN model guarantees optimal resilience response to adversarial images, improving the generalization in diabetic retinopathy diagnosis. The optimal performance of the Standard_CNN is confirmed by the epoch accuracy and epoch loss plots shown in Figure 4 and Figure 5. The blue line represents the trends during the training phase, while the dotted red refers to the trends during the validation phase.

From these trends, it is possible to observe the convergence of loss, that is when the loss curve converges to a relatively stable value over epochs. This suggests that the model has learned the underlying patterns in the data and is not overfitting or underfitting. From both plots, it is present the alignment of training and validation curves. Indeed, ideally, the training and validation curves should follow a similar trend. This indicates that the model is generalizing well to unseen data. The resolution of images significantly impacts the performance of CNNs in differentiating real from GAN-generated images. Higher resolution images provide more detailed features, enhancing the CNN’s ability to learn and distinguish between subtle differences, but at the cost of increased computational demands and potential overfitting. Lower-resolution images simplify the training process and reduce resource requirements but may lack the necessary detail to achieve high accuracy. Balancing resolution with the specific requirements and constraints of the application is key to optimizing CNN performance. In our work, the balance is 64 × 64 pixels study case. Moreover, Standard_CNN reports the best speed and reduction in computation cost during the training and testing phase, with only 1.06, 3.49 and 17.32 min for the 28 × 28, 64 × 64 and 128 × 128 input pixel size, respectvely.

5. Conclusions and Future Work

In this paper, we focus on the adversarial application of the GAN medical imaging environment, in particular fundus retinal images of the eye. After the generation of synthetic images based on real ones, using 50 epochs from the DCGAN training, we trained and tested CNNs to evaluate resilience and improve the output models for further classification, such as diabetic retinopathy disease detection. We observe networks like Standard_CNN guarantee the best performances(100 in accuracy and 2.38 × 10

^{- 9}

in loss) and correctly classify the images, making the model perfect for further analysis in the ophthalmology field. Other network reports a strong decrease in performance, highlighting an incapacity to discriminate the images. Future work could focus on further improving the performance of CNN models in retinal fundus classification with GAN-generated images. This could involve exploring different network architectures, optimizing hyperparameters, and investigating the impact of image size on classification accuracy.

Author Contributions

Conceptualization, M.D.G., A.S., M.C., F.M. (Fabio Martinelli), F.M. (Francesco Mercaldo); methodology, M.D.G., A.S., M.C., F.M. (Fabio Martinelli), F.M. (Francesco Mercaldo); software, M.D.G., F.M. (Francesco Mercaldo); validation, M.D.G., A.S., F.M. (Francesco Mercaldo); formal analysis, M.D.G.; investigation, M.D.G., F.M. (Francesco Mercaldo); data curation, M.D.G., F.M. (Francesco Mercaldo); writing—original draft preparation, M.D.G., F.M. (Francesco Mercaldo); writing—review and editing, M.D.G., A.S., M.C., F.M. (Fabio Martinelli), F.M. (Francesco Mercaldo); visualization, M.D.G., A.S., F.M. (Francesco Mercaldo); supervision, F.M. (Francesco Mercaldo). All authors have read and agreed to the published version of the manuscript.

Funding

This work has been partially supported by EU DUCA, EU CyberSecPro, SYNAPSE, PTR 22-24 P2.01 (Cybersecurity) and SERICS (PE00000014) under the MUR National Recovery and Resilience Plan funded by the EU—NextGenerationEU projects, by MUR - REASONING: foRmal mEthods for computAtional analySis for diagnOsis and progNosis in imagING—PRIN, e-DAI (Digital ecosystem for integrated analysis of heterogeneous health data related to high-impact diseases: innovative model of care and research), Health Operational Plan, FSC 2014-2020, PRIN-MUR-Ministry of Health, the National Plan for NRRP Complementary Investments D^3 4 Health: Digital Driven Diagnostics, prognostics and therapeutics for sustainable Health care, Progetto MolisCTe, Ministero delle Imprese e del Made in Italy, Italy, CUP: D33B22000060001 and FORESEEN: FORmal mEthodS for attack dEtEction in autonomous driviNg systems CUP N.P2022WYAEW.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Suganthi, K. Review of medical image synthesis using GAN techniques. ITM Web Conf. 2021, 37, 01005. [Google Scholar]
Sun, L.; Chen, J.; Xu, Y.; Gong, M.; Yu, K.; Batmanghelich, K. Hierarchical amortized GAN for 3D high resolution medical image synthesis. IEEE J. Biomed. Health Inform. 2022, 26, 3966–3975. [Google Scholar] [CrossRef] [PubMed]
Chlap, P.; Min, H.; Vandenberg, N.; Dowling, J.; Holloway, L.; Haworth, A. A review of medical image data augmentation techniques for deep learning applications. J. Med Imaging Radiat. Oncol. 2021, 65, 545–563. [Google Scholar] [CrossRef] [PubMed]
Farahanipad, F.; Rezaei, M.; Nasr, M.S.; Kamangar, F.; Athitsos, V. A survey on GAN-based data augmentation for hand pose estimation problem. Technologies 2022, 10, 43. [Google Scholar] [CrossRef]
Vyas, B.; Rajendran, R.M. Generative Adversarial Networks for Anomaly Detection in Medical Images. Int. J. Multidiscip. Innov. Res. Methodol. 2023, 2, 52–58. [Google Scholar]
Xia, X.; Pan, X.; Li, N.; He, X.; Ma, L.; Zhang, X.; Ding, N. GAN-based anomaly detection: A review. Neurocomputing 2022, 493, 497–535. [Google Scholar] [CrossRef]
Yu, W.; Lei, B.; Wang, S.; Liu, Y.; Feng, Z.; Hu, Y.; Shen, Y.; Ng, M.K. Morphological feature visualization of Alzheimer’s disease via multidirectional perception GAN. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 4401–4415. [Google Scholar] [CrossRef] [PubMed]
Lamba, S.; Baliyan, A.; Kukreja, V. GAN based image augmentation for increased CNN performance in Paddy leaf disease classification. In Proceedings of the 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Piscataway, NJ, USA, 28–29 April 2022; pp. 2054–2059. [Google Scholar]
Alotaibi, A. Deep generative adversarial networks for image-to-image translation: A review. Symmetry 2020, 12, 1705. [Google Scholar] [CrossRef]
Yu, B.; Zhou, L.; Wang, L.; Shi, Y.; Fripp, J.; Bourgeat, P. Ea-GANs: Edge-aware generative adversarial networks for cross-modality MR image synthesis. IEEE Trans. Med. Imaging 2019, 38, 1750–1762. [Google Scholar] [CrossRef]
Martinelli, F.; Mercaldo, F.; Di Giammarco, M.; Santone, A. Data Poisoning Attacks over Diabetic Retinopathy Images Classification. In Proceedings of the 2023 IEEE International Conference on Big Data (BigData), Sorrento, Italy, 15–18 December 2023; pp. 3698–3703. [Google Scholar]
Rosenblatt, M.; Scheinost, D. Data poisoning attack and defenses in Connectome-Based predictive models. In Workshop on the Ethical and Philosophical Issues in Medical Imaging; Springer: Cham, Switzerland, 2022; pp. 3–13. [Google Scholar]
You, A.; Kim, J.K.; Ryu, I.H.; Yoo, T.K. Application of generative adversarial networks (GAN) for ophthalmology image domains: A survey. Eye Vis. 2022, 9, 6. [Google Scholar] [CrossRef]
Yu, Z.; Xiang, Q.; Meng, J.; Kou, C.; Ren, Q.; Lu, Y. Retinal image synthesis from multiple-landmarks input with generative adversarial networks. Biomed. Eng. Online 2019, 18, 62. [Google Scholar] [CrossRef] [PubMed]
Chen, J.S.; Coyner, A.S.; Chan, R.P.; Hartnett, M.E.; Moshfeghi, D.M.; Owen, L.A.; Kalpathy-Cramer, J.; Chiang, M.F.; Campbell, J.P. Deepfakes in ophthalmology: Applications and realism of synthetic retinal images from generative adversarial networks. Ophthalmol. Sci. 2021, 1, 100079. [Google Scholar] [CrossRef] [PubMed]
Costa, P.; Galdran, A.; Meyer, M.I.; Niemeijer, M.; Abràmoff, M.; Mendonça, A.M.; Campilho, A. End-to-end adversarial retinal image synthesis. IEEE Trans. Med. Imaging 2017, 37, 781–791. [Google Scholar] [CrossRef] [PubMed]
Costa, P.; Galdran, A.; Meyer, M.I.; Abramoff, M.D.; Niemeijer, M.; Mendonça, A.M.; Campilho, A. Towards adversarial retinal image synthesis. arXiv 2017, arXiv:1701.08974. [Google Scholar]
Diaz-Pinto, A.; Colomer, A.; Naranjo, V.; Morales, S.; Xu, Y.; Frangi, A.F. Retinal image synthesis and semi-supervised learning for glaucoma assessment. IEEE Trans. Med. Imaging 2019, 38, 2211–2218. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Xu, Y.; Yan, S.; Wong, D.W.K.; Wong, T.Y.; Liu, J. Automatic feature learning for glaucoma detection based on deep learning. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Cham, Switzerland, 2015; pp. 669–677. [Google Scholar]
Alghamdi, H.S.; Tang, H.L.; Waheeb, S.A.; Peto, T. Automatic optic disc abnormality detection in fundus images: A deep learning approach. In Ophthalmic Medical Image Analysis International Workshop; University of Iowa: Iowa, IA, USA, 2016; Volume 3. [Google Scholar]
Divya, S.; Suresh, L.P.; John, A. Medical mr image synthesis using dcgan. In Proceedings of the 2022 First International Conference on Electrical, Electronics, Information and Communication Technologies (ICEEICT), Trichy, India, 16–18 February 2022; pp. 1–4. [Google Scholar]
Lal, S.; Rehman, S.U.; Shah, J.H.; Meraj, T.; Rauf, H.T.; Damaševičius, R.; Mohammed, M.A.; Abdulkareem, K.H. Adversarial attack and defence through adversarial training and feature fusion for diabetic retinopathy recognition. Sensors 2021, 21, 3922. [Google Scholar] [CrossRef] [PubMed]
Shin, H.C.; Tenenholtz, N.A.; Rogers, J.K.; Schwarz, C.G.; Senjem, M.L.; Gunter, J.L.; Andriole, K.P.; Michalski, M. Medical image synthesis for data augmentation and anonymization using generative adversarial networks. In Proceedings of the Simulation and Synthesis in Medical Imaging: Third International Workshop, SASHIMI 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 16 September 2018; Proceedings 3. Springer: Cham, Switzerland, 2018; pp. 1–11. [Google Scholar]
Kwon, H.; Kim, Y.; Yoon, H.; Choi, D. Captcha image generation systems using generative adversarial networks. IEICE Trans. Inf. Syst. 2018, 101, 543–546. [Google Scholar] [CrossRef]
Mangaokar, N.; Pu, J.; Bhattacharya, P.; Reddy, C.K.; Viswanath, B. Jekyll: Attacking medical image diagnostics using deep generative models. In Proceedings of the 2020 IEEE European Symposium on Security and Privacy (EuroS&P), Genoa, Italy, 7–11 September 2020; pp. 139–157. [Google Scholar]
Nguyen, H.; Di Troia, F.; Ishigaki, G.; Stamp, M. Generative adversarial networks and image-based malware classification. J. Comput. Virol. Hacking Tech. 2023, 19, 579–595. [Google Scholar] [CrossRef]
Renjith, G.; Laudanna, S.; Aji, S.; Visaggio, C.A.; Vinod, P. GANG-MAM: GAN based enGine for Modifying Android Malware. SoftwareX 2022, 18, 100977. [Google Scholar]
Martinelli, F.; Mercaldo, F.; Santone, A. Evaluating the Impact of Generative Adversarial Network in Android Malware Detection. In Proceedings of the ENASE, Angers, France, 28–29 April 2024; pp. 590–597. [Google Scholar]
Mercaldo, F.; Martinelli, F.; Santone, A. Deep Convolutional Generative Adversarial Networks in Image-Based Android Malware Detection. Computers 2024, 13, 154. [Google Scholar] [CrossRef]
Nagaraju, R.; Stamp, M. Auxiliary-classifier GAN for malware analysis. In Artificial Intelligence for Cybersecurity; Springer: Cham, Switzerland, 2022; pp. 27–68. [Google Scholar]
Hu, W.; Tan, Y. Generating adversarial malware examples for black-box attacks based on GAN. In Proceedings of the International Conference on Data Mining and Big Data, Beijing, China, 21–24 November 2022; pp. 409–423. [Google Scholar]
Yuan, J.; Zhou, S.; Lin, L.; Wang, F.; Cui, J. Black-box adversarial attacks against deep learning based malware binaries detection with GAN. In ECAI 2020; IOS Press: Amsterdam, The Netherlands, 2020; pp. 2536–2542. [Google Scholar]
Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Wen, L.; Li, X.; Li, X.; Gao, L. A new transfer learning based on VGG-19 network for fault diagnosis. In Proceedings of the 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design (CSCWD), Porto, Portugal, 6–8 May 2019; pp. 205–209. [Google Scholar]
Di Giammarco, M.; Iadarola, G.; Martinelli, F.; Mercaldo, F.; Ravelli, F.; Santone, A. Explainable Deep Learning for Alzheimer Disease Classification and Localisation. In Proceedings of the International Conference on Applied Intelligence and Informatics, Reggio Calabria, Italy, 1–3 September 2022. in press. [Google Scholar]
Di Giammarco, M.; Iadarola, G.; Martinelli, F.; Mercaldo, F.; Santone, A. Explainable Retinopathy Diagnosis and Localisation by means of Class Activation Mapping. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; pp. 1–8. [Google Scholar]
Xia, X.; Xu, C.; Nan, B. Inception-v3 for flower classification. In Proceedings of the 2017 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, China, 2–4 June 2017; pp. 783–787. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]

Figure 1. Block schematization of GAN.

Figure 2. The workflow of the proposed method.

Figure 3. Retinal image samples.

Figure 4. Epoch accuracy trends for the Standard_CNN model.

Figure 5. Epoch loss trends for the Standard_CNN model.

Table 1. The Standard_CNN architecture.

Layer	Type	Output Shape	Parameters
1	InputLayer	(256, 256, 3)	0
2	Conv2D	(254, 254, 32)	896
3	MaxPooling2D	(127, 127, 32)	0
4	Conv2D	(125, 125, 64)	18,496
5	MaxPooling2D	(62, 62, 64)	0
6	Conv2D	(60, 60, 128)	73,856
7	MaxPooling2D	(30, 30, 128)	0
8	Flatten	(115, 200)	0
9	Dropout	(115, 200)	0
10	Dense	(512)	58,982,912
11	Dropout	(512)	0
12	Dense	(256)	131,328
13	Dropout	(256)	0
14	Dense	(2)	514

Table 2. Metrics evaluation for tested DL models with image size input 28 × 28.

CNN	Accuracy	Precision	Recall	F-Measure	AUC.	Loss
ResNet 50	0.99	0.99	099	0.99	0.99	0.04
DenseNet	0.99	0.99	0.99	0.99	0.99	0.01
VGG19	0.5	0.5	0.5	0.5	0.5	0.69
Standard_CNN	1.0	1.0	1.0	1.0	1.0	5.65 × 10 $^{- 5}$
Inception V3	0.5	0.5	0.5	0.5	0.5	0.69
MobileNet	1.0	1.0	1.0	1.0	1.0	0.12
EfficientNet	0.5	0.5	0.5	0.5	0.5	32.1

Table 3. Metrics evaluation for tested DL models with image size input 64 × 64.

CNN	Accuracy	Precision	Recall	F-Measure	AUC.	Loss
ResNet 50	0.99	0.99	099	0.99	0.99	0.15
DenseNet	0.99	0.99	0.99	0.99	0.99	0.07
VGG19	0.5	0.5	0.5	0.5	0.5	0.69
Standard_CNN	1.0	1.0	1.0	1.0	1.0	2.38 × 10 $^{- 9}$
Inception V3	0.98	0.98	0.98	0.98	0.99	0.07
MobileNet	1.0	1.0	1.0	1.0	1.0	2.65 × 10 $^{- 6}$
EfficientNet	0.99	0.99	0.99	0.99	0.99	0.51

Table 4. Metrics evaluation for tested DL models with image size input 128 × 128.

CNN	Accuracy	Precision	Recall	F-Measure	AUC.	Loss
ResNet 50	1.0	1.0	1.0	1.0	1.0	2.326 × 10 $^{- 5}$
DenseNet	0.97	0.97	0.97	0.97	0.97	2.47
VGG19	0.5	0.5	0.5	0.5	0.5	0.69
Standard_CNN	1.0	1.0	1.0	1.0	1.0	0.002
Inception V3	0.89	0.89	0.89	0.89	0.90	0.48
MobileNet	1.0	1.0	1.0	1.0	1.0	0.001
EfficientNet	0.5	0.5	0.5	0.5	0.5	19.21

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Di Giammarco, M.; Santone, A.; Cesarelli, M.; Martinelli, F.; Mercaldo, F. Evaluating Deep Learning Resilience in Retinal Fundus Classification with Generative Adversarial Networks Generated Images. Electronics 2024, 13, 2631. https://doi.org/10.3390/electronics13132631

AMA Style

Di Giammarco M, Santone A, Cesarelli M, Martinelli F, Mercaldo F. Evaluating Deep Learning Resilience in Retinal Fundus Classification with Generative Adversarial Networks Generated Images. Electronics. 2024; 13(13):2631. https://doi.org/10.3390/electronics13132631

Chicago/Turabian Style

Di Giammarco, Marcello, Antonella Santone, Mario Cesarelli, Fabio Martinelli, and Francesco Mercaldo. 2024. "Evaluating Deep Learning Resilience in Retinal Fundus Classification with Generative Adversarial Networks Generated Images" Electronics 13, no. 13: 2631. https://doi.org/10.3390/electronics13132631

APA Style

Di Giammarco, M., Santone, A., Cesarelli, M., Martinelli, F., & Mercaldo, F. (2024). Evaluating Deep Learning Resilience in Retinal Fundus Classification with Generative Adversarial Networks Generated Images. Electronics, 13(13), 2631. https://doi.org/10.3390/electronics13132631

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluating Deep Learning Resilience in Retinal Fundus Classification with Generative Adversarial Networks Generated Images

Abstract

1. Introduction

2. Related Work

3. The Method

3.1. GAN: Operating Principles

3.2. The Approach

4. Experimental Analysis

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI