Optimizing Glaucoma Diagnosis with Deep Learning-Based Segmentation and Classification of Retinal Images

Alkhaldi, Nora A.; Alabdulathim, Ruqayyah E.

doi:10.3390/app14177795

Open AccessArticle

Optimizing Glaucoma Diagnosis with Deep Learning-Based Segmentation and Classification of Retinal Images

by

Nora A. Alkhaldi

^*

and

Ruqayyah E. Alabdulathim

Computer Science Department, College of Computer Science and Information Technology, King Faisal University, Al Ahsa 36291, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(17), 7795; https://doi.org/10.3390/app14177795

Submission received: 7 July 2024 / Revised: 26 August 2024 / Accepted: 27 August 2024 / Published: 3 September 2024

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Glaucoma, a leading cause of permanent blindness worldwide, necessitates early detection to prevent vision loss, a task that is challenging and time-consuming when performed manually. This study proposes an automatic glaucoma detection method on enhanced retinal images using deep learning. The system analyzes retinal images, generating masks for the optic disc and optic cup, and providing a classification for glaucoma diagnosis. We employ a U-Net architecture with a pretrained residual neural network (ResNet34) for segmentation and an EfficientNetB0 for classification. The proposed framework is tested on publicly available datasets, including ORIGA, REFUGE, RIM-ONE DL, and HRF. Our work evaluated the U-Net model with five pretrained backbones (ResNet34, ResNet50, VGG19, DenseNet121, and EfficientNetB0) and examined preprocessing effects. We optimized model training with limited data using transfer learning and data augmentation techniques. The segmentation model achieves a mean intersection over union (mIoU) value of 0.98. The classification model shows remarkable performance with 99.9% training and 100% testing accuracy on ORIGA, 99.9% training and 99% testing accuracy on RIM-ONE DL, and 98% training and 100% testing accuracy on HRF. The proposed model outperforms related works and demonstrates potential for accurate glaucoma classification and detection tasks.

Keywords:

glaucoma; fundus images; deep learning; medical image segmentation; convolutional neural networks; transfer learning; U-Net architecture; ResNet34; EfficientNetB0

1. Introduction

Glaucoma, the second most common cause of blindness worldwide, is projected to affect approximately 111.8 million individuals by 2040 [1]. It is an eye disease that damages the optic nerve, often due to an imbalance of intraocular pressure (IOP) within the eye. The damage typically results in the enlargement of the optic disc (OD) and the optic cup (OC), which are specific regions of the retina where the optic nerve exits the eye [2]. If left untreated, glaucoma can lead to irreversible vision loss [3]. Early detection and treatment are crucial to prevent further deterioration and ensure optimal eye care.

The diagnosis of glaucoma is based on various imaging modalities, such as optical coherence tomography (OCT), magnetic resonance imaging (MRI), and fundus images. Fundus images show a clear view of the fovea, OD, macula, OC, veins, arteries, and other parts of the retina. They have demonstrated potential in diagnosing eye diseases [4].

Artificial intelligence (AI) learning techniques, including deep learning (DL) and convolutional neural networks (CNNs), are being applied to various medical solutions, including medical image analysis. Current computer-aided diagnosis (CAD) systems based on AI have the ability to examine clinical images and provide a precise diagnosis comparable to that of experienced ophthalmologists. The existing literature on glaucoma detection encounters multiple challenges, including imbalance datasets, model complexity, computational efficiency, and clinical integration challenges.

The automation of glaucoma detection would enable early detection and preventing vision loss. This study aims to investigate the optimal performance of state-of-the-art DL models for automated glaucoma detection from enhanced retinal images. The techniques conduct segmentation and classification tasks that are tested on four distinct public datasets, ORIGA, HRF, RIM-ONE DL, and REFUGE, to assess the model’s capacity to generalize. Medical image segmentation splits an input image into pixel groups to extract information about lesions or organs, known as the region of interest [5,6]. This study explores medical image segmentation using the widely used CNN architecture, U-Net [1], to segment both OD and OC. Furthermore, medical applications extensively apply transfer learning, a technique that reuses pretrained models trained on large-scale datasets [7]. This approach is particularly beneficial due to the limited availability of datasets and computational resources, allowing developers to save time and modify existing models to suit specific needs [8].

Despite advancements in deep learning algorithms for glaucoma detection, considerable challenges remain, including achieving accurate segmentation, ensuring model generalizability across varied datasets, and adjusting data augmentation techniques. Developing automated glaucoma detection systems is needed, with a focus on improving segmentation accuracy and adapting models to diverse clinical settings. Our research evaluated the U-Net model utilizing five pretrained backbones, including ResNet34, ResNet50, VGG19, DenseNet121, and EfficientNetB0, and examined the preprocessing effect. The classification phase assessed the EfficientNetB0 model’s efficiency on diverse datasets by batch size and epoch.

The contributions of our work are outlined as follows:

We proposed an integrated model that combines U-Net with ResNet34 for robust segmentation and EfficientNet-B0 for precise classification. The key innovation in our study lies in the integration of U-Net with ResNet34 to enhance segmentation accuracy and the use of EfficientNet-B0 for precise classification, a novel application specifically tailored for glaucoma detection. While U-Net has been widely used for medical image segmentation and combined with various backbones in other domains such as tumor detection and organ segmentation, our approach uniquely addresses the specific challenges of glaucoma detection with this combination.
We addressed the challenge of training CNNs from scratch with limited data and hardware constraints by adopting transfer learning like ImageNet and data augmentation techniques on enhancing model performance in glaucoma detection.
Our model shows remarkable performance across ORIGA, RIM-ONE DL, and HRF datasets in terms of accuracy, AUC, precision, recall, F1 score, specificity, and (mIoU).
Our model demonstrates superior performance compared to the state of the art in detecting glaucoma across various datasets.
The implemented method enabled efficient network training, reduced computational time, and enhanced model accuracy, promoting medical field applications and clinical practices for early glaucoma detection.

The remainder of this paper is organized as follows: Section 2 explores the previous research conducted in the field of glaucoma detection from retinal images. Section 3 explains the datasets, the proposed methods, and the measurement parameters used in this work. Section 4 presents the results and discussion. Finally, Section 5 provides the conclusion and future research directions.

2. Literature Review

Recently, many deep learning-based algorithms for medical image analysis have been presented to detect glaucoma and deployed due to advances in computer technology. Sudhan et al. [1] proposed a glaucoma detection model using the ORIGA dataset. They applied the U-Net architecture for segmenting the OD and OC regions from the fundus images, and then used a pretrained DenseNet-201 architecture to extract features from the segmented regions. They also used deep convolutional neural networks (DCNNs) to classify the images as glaucomatous or normal, achieving an accuracy rate of 98.82% in training and 96.90% in testing. Nazir et al. [2] developed Mask-RCNN for clustering OD and OC lesions from fundus images using DenseNet-77 as a backbone in the Mask-RCNN architecture. The glaucoma segmentation approach was effective, but improvements were needed, with average precision, F-measure, recall, and IOU values of 0.96, 0.97, 0.96, and 0.97, respectively.

Latif et al. [3] proposed the ODGNet method for OD localization and glaucoma classification. They used a saliency map for OD localization and a shallow CNN to separate OD and non-OD regions. They fed segmented OD regions into transfer learning models like AlexNet, ResNet, and VGGNet for glaucoma diagnosis. The combination improved ODGNet’s performance, achieving 95.75% accuracy. Nawaz et al. [9] developed a network for glaucoma detection that used the OD and OC lesions as indicators. The EfficientNet-B0 model was used to extract features from fundus images, with a unique bidirectional feature module for multiple feature integration, and the EfficientDet-D0 model for simultaneous glaucoma localization and class prediction. Their approach achieved an average accuracy of 97.2% on the ORIGA dataset and 97.96% and 98.21% on the RIM-ONE and high-resolution fundus (HRF) datasets, demonstrating its robustness in classifying glaucoma.

Maheshwari et al. [10] used the local binary pattern (LBP) approach to extract features from fundus images, retrained the AlexNet model for glaucoma diagnosis, and increased the training data size using an LBP-based data augmentation strategy to avoid overfitting. The model achieved 98.90% accuracy on the RIM-ONE fundus image database. Mallick, Siddhartha, et al. [11] proposed a deep learning framework for detecting glaucoma from fundus images. The framework involves three steps: locating the OD, segmenting the OD and OC, and classifying the glaucoma risk. Using models like U-Net, SegFormer, and MobileNet V2, the framework achieves high dice scores of 0.98 and 0.91 for segmentation and 95.47% accuracy for glaucoma detection. The framework also shows high sensitivity and area under the curve (AUC) scores for glaucoma detection using ResNet-18 and VGG-16 on different datasets using ResNet-18 and VGG-16 on different datasets.

Manassakorn et al. [12] developed GlauNet, a CNN that consisted of three convolutional layers to extract features and five fully connected layers for classification. The collected dataset comprised OCTA, OD, and OCT testing. The model was trained using a dataset of 258 eyes with glaucoma and 439 eyes without glaucoma, achieving a sensitivity of 88.9% and a specificity of 89.6% with an AUC of 0.89. Yi et al. [13] presented a glaucoma classification framework called MTRA-CNN and conducted four experiments to assess its effectiveness. They determined that ResNet50 was the most effective CNN for diagnosing glaucoma. Additionally, they found that the residual attention (RA) block improved performance, and the multi-scale transfer learning strategy outperformed other approaches. The framework achieved an accuracy of 86.8% in classifying normal eyes and different stages of glaucoma, demonstrating its superiority.

Virbukaitė et al. [14] introduced a method that uses an ensemble CNN to precisely identify glaucoma by segmenting the optic disc (OD) and optic cup (OC). The approach utilizes a modified attention U-Net structure with pretrained ResNet34, Inceptionv3, and DenseNet121 backbones. These backbones were trained on diverse datasets, including REFUGE, Drishti-GS, and RIM-ONE, and were evaluated individually. Their approach achieved better results than individual models in multiple datasets, demonstrating its superior performance. Shyamalee et al. [15] employed U-Net with attention and ResNet50 for segmentation, along with a modified Inception V3 for classification, achieving high accuracy on the RIM-ONE dataset. They utilized gradient-weighted class activation mapping (Grad-CAM) and Grad-CAM++ techniques for enhanced interpretability, producing heatmaps to assist in automated glaucoma detection. The study reported high accuracy, sensitivity, and specificity for glaucoma detection.

The above methods offered a solution to glaucoma detection, but with some limitations. While these deep learning techniques have shown promising results in glaucoma detection, there are still areas for improvement, particularly in terms of segmentation results, generalizability to other datasets, and potential issues with data augmentation strategies. Despite these limitations, this work makes significant contributions to the field of automated glaucoma detection using DL methods.

We proposed an approach that combines the U-Net architecture with pretrained ResNet34 for segmentation tasks and a pretrained EfficientNetB0 model for classification tasks. It segments the OD and OC regions in retinal images, crucial areas for glaucoma detection, then uses these regions for classification by the EfficientNetB0 model. The project involves data preprocessing, OD and OC segmentation, region of interest (ROI) extraction, segmented image postprocessing, and glaucoma presence classification. Utilizing the pretrained EfficientNetB0 model in classification aims to enhance precision and efficiency, potentially accelerating glaucoma detection for early intervention and treatment.

3. Materials and Methods

3.1. Datasets Description

In this study, we use several publicly available ophthalmological imaging datasets for glaucoma screening. The ORIGA dataset [16], consisting of 168 glaucoma patient images and 482 healthy images, offers ground truth annotations for OC and OD segmentation. The HRF dataset [17], which comprises 15 images of glaucoma patients and healthy individuals, is widely utilized in glaucoma diagnosis. The RIM-ONE DL dataset [18], comprising 172 glaucoma patient images and 313 healthy images, offers ground truth annotations for OD and OC segmentation. Lastly, the REFUGE dataset [19] includes a total of 1200 fundus images, each with ground truth segmentations.

A subset of this dataset, referred to as the REFUGE-TS, is used to train a model based on 400 images, with 40 labeled as glaucoma and 340 as normal. ORIGA, RIM-ONE DL, and REFUGE datasets are used for segmentation, while ORIGA, RIM-ONE DL, and HRF datasets are utilized for classification. Table 1 enumerates the datasets utilized in our study.

3.2. Proposed Methodology

The proposed system framework is illustrated in Figure 1, comprises multiple stages: data preprocessing, segmentation, ROI extraction, postprocessing of segmented images, and classification. Each stage plays a significant role in the system’s operation, ensuring the accurate and efficient processing of data. The following sections provide detailed information on each stage.

3.2.1. Data Preprocessing

Data preprocessing is an important step to ensure the quality of the images before they are analyzed by the model. In this study, we applied two main preprocessing approaches, including data augmentation and filtering. Data augmentation is a technique that prevents overfitting by increasing the number of samples in a dataset [20], especially beneficial for small datasets [21].

Our aim is to utilize data augmentation techniques to enhance the quality and alignment of images and their corresponding masks. This work applied various augmentation methods and evaluated their impact on the image quality and mask alignment. We used rotations, flips, and brightness adjustments for data augmentation. Rotations and flips help the model handle different image orientations, while brightness adjustments enhance its adaptability to varying lighting conditions in retinal images. These techniques collectively bolster the model’s robustness by simulating real-world scenarios and mitigating overfitting. Data imbalance can bias the model towards the majority class, affecting performance. This issue is addressed using under-sampling, which reduces the majority class samples to match the minority class. Table 2 presents the data used in the segmentation phase.

The dataset is enhanced by horizontal flips and rotations of images resized to 128 × 128 from the ORIGA, REFUGE-TS, and RIM-ONE DL datasets. This resizing step ensures consistency in the dimensions of the images and masks, preparing them for subsequent processing. An augmentation pipeline defined using the Albumentations library [22], to augment images and masks for image segmentation. Techniques like random rotations, brightness/contrast adjustments, and horizontal flips were applied to ORIGA and REFUGE datasets, as well as ORIGA and RIM-ONE DL datasets. The augmented images and masks were saved for further processing. Table 3 shows the data counts before and after augmentation and the techniques used.

A denoising process was also implemented to enhance diagnostic accuracy by removing noise from images. Noise can degrade image quality and obscure critical details, leading to incorrect diagnoses [20]. The Gaussian and median filters were tested on RGB and grayscale images, including the green channel. The filters were evaluated using the peak signal-to-noise ratio (PSNR), and the best one was selected based on the results. The goal was to enhance image quality for further analysis.

3.2.2. Segmentation

Image segmentation is a technique used in various fields, such as medical imaging, to separate an image into distinct regions or objects [1]. This project investigates the use of the U-Net architecture, renowned for its effectiveness in semantic segmentation tasks, with various backbones for image segmentation. Utilizing the Segmentation Models library [23] with user-friendly interfaces, we construct diverse segmentation models supporting various architectures and backbones.

The proposed image segmentation model for this study employs a U-Net architecture with ResNet34 as its backbone as shown in Figure 2, pretrained on ImageNet. We chose ResNet34 for precise segmentation of structures like the optic disc and optic cup. It consistently outperformed other backbones. We utilized SoftMax activation and compiled the model with an Adam optimizer with a learning rate of 0.0001 to optimize its performance. The used loss function is a combination of dice loss and categorical focal loss, and the models are evaluated based on the mIoU metric.

This study assessed the performance of the U-Net model with five pretrained backbones, ResNet34, ResNet50, VGG19, DenseNet121, and EfficientNetB0, aiming to identify the most effective architecture. During this phase, we conducted experiments to analyze the impact of various batch sizes on segmentation accuracy, as measured by mIoU. Additionally, we evaluated the performance of the U-Net model by considering various numbers of epochs. The method used here yields a segmentation mask that shows the retinal image’s OC and OD boundaries.

Extraction of ROI

This method employs the resulting segmentation mask from the previous step to isolate ROI in the retinal image and retain only the relevant pixels. The bitwise AND operation is used to keep the pixels in the original image that match the segmented regions (OC and OD) marked by non-zero pixels in the mask [24]. In this study, we applied the bitwise AND operation between the output mask and the original image to extract the ROI.

Postprocessing of Segmented Images

This stage involves cropping and resizing segmented images using bounding box coordinates of the largest contour [24], to enhance model accuracy. Prior to the classification stage, the image dimensions are modified to match the EfficientNetB0 model input size of 224 × 224. Furthermore, under-sampling and image augmentation techniques are utilized to address class imbalance and enhance the model’s generalization ability.

3.2.3. Classification

The classification stage aims to label images as normal or glaucoma based on the results of the previous stage. Table 4 outlines the data used in the classification stage. The segmented images are subjected to augmentation procedures such as brightness modification, arbitrary rotation, and random horizontal flips. These methods are implemented to increase the size of the dataset and improve the ability of the model to generalize to new or unseen data. Table 5 illustrates the augmentation methods and the total count of training images post-augmentation for the ORIGA, RIM-ONE DL, and HRF datasets.

Training a CNN from scratch requires a large, labeled dataset and powerful GPU hardware, but with limited data, this is impractical. As a solution, we adopt the transfer learning technique, as used by Castiglioni et al., to overcome this limitation [21]. This study aims to classify retinal fundus images for glaucoma disease using a classification network, examining the adoption of a pretrained model based on recent research findings.

This study utilized the pretrained EfficientNetB0 model to enhance accuracy in glaucoma classification, reducing computational time and exploring alternative models like DenseNet121, VGG19, and ResNet50. The images were resized to 224 × 224 pixels to meet the input size of the pretrained models and normalized by dividing pixel values by 255. The proposed model architecture utilizes the pretrained EfficientNetB0 model on the ImageNet dataset, acting as the convolutional base for extracting features from input images, as shown in Figure 3 [25]. EfficientNetB0 was chosen for classification due to its efficient model architecture that balances performance and computational cost, making it suitable for accurate glaucoma classification with limited data. Among the models tested, EfficientNetB0 excelled in detecting glaucoma.

We excluded the final fully connected layer responsible for classification by setting the “include top” parameter to False. The model includes a global average pooling 2D layer, a dropout layer with a rate of 0.3 to prevent overfitting, and a fully connected layer with a single unit and a sigmoid activation function, as shown in Figure 4.

The model was trained using the Adam optimizer with a learning rate of 0.0001 and binary cross-entropy loss function for binary classification. The (train_convolution_base) parameter determined if the EfficientNetB0 base model weights were updated during training, or frozen. Alternative models used pretrained models as the base, with added layers of batch normalization, dropout, and dense layers. Batch normalization was used for regularization and reducing overfitting, while the dropout layer randomly excluded neurons during training to prevent overfitting.

3.2.4. Evaluation

Various metrics, such as accuracy, precision, specificity, recall, F1 score, AUC, PSNR, mIoU, and confusion matrix are used to assess the performance of a proposed model architecture. The AUC measured the overall performance of the binary classifier, indicating its ability to distinguish between positive and negative classes.

The mathematical formulations for each metric are presented in Equations (1)–(8) [26,27]:

P S N R = 10 \log_{10} [\frac{255 \times 255}{M S E}],

(1)

where MSE represents the cumulative squared error between the two images. It is calculated using the following equation:

M S E = \frac{1}{n} \sum {(a c t u a l v a l u e - p r e d i c t e d v a l u e)}^{2}

(2)

I o U = \frac{T P}{(T P + F P + F N)}

(3)

A c c u r a c y = \frac{T P + T N}{(T P + F N) + (F P + T N)}

(4)

P r e c i s i o n = \frac{T P}{T P + F P}

(5)

R e c a l l (S e n s i t i v i t y) = \frac{T P}{T P + F N} = T P R

(6)

where TPR is true positive rate.

S p e c i f i c i t y = \frac{T N}{T N + F P} = T N R

(7)

where TNR is true negative rate.

F 1 - s c o r e = 2 \times \frac{(P r e c i s i o n \times R e c a l l)}{(P r e c i s i o n + R e c a l l)}

(8)

where TP is true positive, TN is true negative, FP is false positive, and FN is false negative.

3.2.5. Experimental Design

This study used eight experimental conditions to analyze segmentation and classify glaucoma. The segmentation experiments started with determining the optimal backbone architecture for the U-Net model, then went on to assessing the influence of different augmentation techniques on segmentation performance using ResNet34, followed by exploring the impact of various batch sizes on segmentation accuracy (mIoU), and lastly analyzing the performance of the U-Net model with different numbers of epochs.

The rest four experiments conducted in the classification phase comprised evaluating the efficiency of the EfficientNetB0 model on various datasets (ORIGA, RIM-ONE DL, and HRF) by varying the batch sizes and numbers of epochs. Additionally, the experiments assessed the influence of data augmentation techniques on classification accuracy and examined how varying epochs affected the performance of the EfficientNetB0 model when applied to augmented datasets (ORIGA and RIM-ONE DL).

Overall, this study conducted diverse experiments, evaluating segmentation, backbone architectures, and classification metrics, summarized in Table 6, showcasing comprehensive insights into model performance.

Hyperparameter Selection

This section provides an overview of the key hyperparameters considered in our research, including the learning rate, batch size, number of epochs, dropout rates, and optimizer types. These hyperparameters have significant implications for optimizing the performance of the model and attaining accurate outcomes. Table 7 presents a concise overview of the parameters used in the segmentation model, whilst Table 8 summarizes the parameters employed in the classification phase.

This study utilized Python-based models trained on Google Colab Pro on a Windows 10 operating system [28].

4. Results and Discussion

4.1. Data Preprocessing

After applying data augmentation methods to images and their masks, the results are shown in Figure 5. It presents an image and its mask with augmentations. The top row shows the original image and mask, the middle row shows a rotated image and mask, and the bottom row shows an image and mask after a horizontal flip.

This study utilized Gaussian and median filters to improve image quality, with the Gaussian filter showing superior PSNR performance for grayscale and RGB images. This study underscores the effectiveness of the Gaussian filter in image denoising tasks.

4.2. U-Net with Pretrained Backbones for Segmentation

During the segmentation phase, we performed four experiments, each consisting of comparable sub-experiments. In sub-experiments (1.1, 2.1, 3.1, and 4.1), we used images and masks from ORIGA and REFUGE, while in sub-experiments (1.2, 2.2, 3.2, and 4.2), we employed images and masks from ORIGA and RIM-ONE DL. To identify the best-performing backbone architecture for the U-Net model, we evaluated it using different datasets. ResNet34 was the primary architecture, but ResNet50, VGG19, DenseNet121, and EfficientNetB0 were also explored.

In subsequent experiments, we utilized images and their masks sourced from ORIGA and REFUGE for Experiment 1.1, and ORIGA and RIM-ONE DL for Experiment 1.2. Images were resized to 128 × 128 pixels and denoised with Gaussian and median filters. Masks were encoded to represent background, OD, and OC classes. Images and masks were split into training and test sets (80:20) and normalized by scaling pixel values from 0–255 to 0–1. All models were trained for the same number of epochs and batch size. ResNet34 outperformed other pretrained backbones, achieving the highest mIoU in Experiment 1.1, as indicated in Table 9, and Experiment 1.2, as shown in Table 10.

Experiments 2.1 and 2.2 evaluated the effect of augmentation techniques on image segmentation using ResNet34 in the U-Net model based on similar datasets. Techniques included random rotations, brightness, contrast adjustments, and horizontal flips. Experiment 2.1 applied these to 832 images and explored the impact of increasing the dataset to 1248 images. Measured mIoU assesses technique effectiveness and the impact of removing brightness and contrast adjustments.

Experiment 2.1 results are in Table 11. In Experiment 2.2, the same techniques augmented 1360 images, investigating the impact of increasing to 2040 images. Measured mIoU, exploring the effect of excluding random brightness and contrast adjustments, is presented in Table 12. This comprehensive approach allowed for us to determine which augmentation techniques provided the most improvement to the ResNet34-based image segmentation model.

Experiments 3.1 and 3.2 tested batch sizes (8, 16, 32, 64, and 128) for mIoU impact. Batch size 8 yielded the highest mIoU during training. However, when evaluating on the HRF dataset, a batch size of 16 performed better. Table 13 and Table 14 display mIoU values for different batch sizes, underscoring the importance of proper selection.

In Experiments 4.2 and 4.2, training utilized varied epochs: 30, 40, and 50 for Experiment 4.1, and extended to 60 for Experiment 4.2. Experiment 4.1 achieved the highest mIoU (0.953595) at 50 epochs, but a comparable mIoU (0.950539) was obtained at 40 epochs. Opting for improved visual results on the (HRF) dataset, the model trained with 40 epochs was selected. Table 15 presents mIoU values for different epoch counts, while Figure 6 illustrates the training and validation IOU and loss for this model.

Experiment 2.4 achieved its highest mIoU (0.9826489) at 60 epochs, but a comparable mIoU (0.98035127) was attained at 50 epochs. Choosing the latter for improved HRF dataset results, Figure 7 and Table 16 illustrate training details.

The IOU score measures the overlap between predicted and actual segmentation masks, with a higher score indicating better performance. The loss function quantifies the difference between predicted and actual masks, with lower loss signifying a better fit. The curve monitors the metrics and loss changes over epochs. The model in Figure 6 demonstrates an impressive IOU score of approximately 0.9 on both the training and validation sets, indicating precise image segmentation. Additionally, the model exhibits a low loss value of around 0.1 on both sets, suggesting effective error minimization between predictions and labels. The smooth and rapidly converging curve signifies efficient learning without overfitting or underfitting.

In Figure 7, the model achieves a high IOU score (>0.98) on both training and validation sets, indicating accurate segmentation and good generalization. The model consistently improves over epochs without overfitting or underfitting. The significant reduction in loss function on both sets suggests effective error minimization and convergence to an optimal solution. Denoising filters showed no performance improvement, but augmentation techniques, larger datasets, and fewer epochs had a notably positive impact on mIoU values in our experiments. Table 17 presents a comparison between Experiment 1 and 2.

Upon comparing the findings, we determined the model from sub-experiments (1.2, 2.2, 3.2, and 4.2) with the highest mIoU (0.98035127) for segmentation model. It demonstrated effective segmentation on the HRF dataset, accurately identifying ROI like OC and OD. Figure 8 shows an example of the segmentation results on the HRF dataset using the model trained for 50 epochs.

Following the segmentation process, we executed a bitwise AND operation between the output mask and the original image to isolate the ROI. Cropping was employed to extract a specific ROI from the image. The results of the segmentation and the extracted ROI on the HRF dataset are depicted in Figure 9.

Following the postprocessing step, significant transformations were observed after applying augmentation techniques to the segmented images. Figure 10 demonstrates these results, featuring the augmented images from the ORIGA, RIM-ONE DL, and HRF datasets. Figure 10 is organized as follows: the first column displays the original images, the second column shows the images after rotation, the third column presents the images post-brightness adjustment, and the fourth column exhibits the images after a horizontal flip.

4.3. Pretrained Models for Classification

The first step in the classification process is to test how well ResNet50, VGG19, DenseNet121, and EfficientNetB0 models work on the ORIGA, RIM-ONE DL, and HRF datasets, with different batch sizes. Several experiments were conducted on the ORIGA [16], HRF [17], and RIM-ONE DL [18] datasets. The ORIGA dataset was trained for 15 epochs with a batch size of 8, where metrics are shown in Table 18. The RIM-ONE DL dataset experiment followed a similar approach, with results detailed in Table 19. In the HRF dataset, initially comprising 30 images, models were trained and tested on an augmented dataset. Data augmentation generated eight images for each original one. Performance was evaluated based on several metrics provided in Table 20. The performance of four pretrained models for glaucoma detection was evaluated, with EfficientNetB0 outperforming DenseNet121, VGG19, and ResNet50 across all metrics and datasets.

Experiments 5.1, 5.2, and 5.3 evaluated EfficientNetB0 on ORIGA, RIM-ONE DL, and HRF datasets, varying batch sizes from 4 to 64, as indicated in Table 21, Table 22 and Table 23, respectively. This study found optimal performance with batch sizes of 8 for ORIGA and RIM-ONE DL and 4 and 8 for HRF, but performance declined with larger batch sizes. Early stopping was used to prevent overfitting and optimize resource use.

Based on the preceding results, Experiment 6.1 on ORIGA used a batch size of 8 and epochs of 10, 15, 20, and 30 (Table 24). Experiment 6.2 on RIM-ONE DL used a batch size of 8 and epochs of 7, 11, 15, and 20 (Table 25). Experiment 6.3 on HRF used a batch size of 4 and epochs of 7, 15, and 20 (Table 26). These experiments evaluated the model’s performance under different conditions.

The model was trained on ORIGA and RIM-ONE DL datasets with a batch size of 8, and on the HRF dataset with a batch size of 4. It performed best at 20 epochs for ORIGA, 15 epochs for RIM-ONE DL, and 7 epochs for HRF. However, for HRF, increasing epochs to 15 improved performances, but further increase to 20 epochs decreased performance, indicating no further improvement beyond 15 epochs.

This study used a pretrained EfficientNetB0 model to assess the impact of data augmentation on image classification. Experiments 7.1, 7.2, and 7.3 applied random rotation, brightness adjustment, and horizontal flips to original images from the ORIGA, RIM-ONE DL, and HRF datasets, resulting in eight times more images for each dataset. Performance metrics for the ORIGA and RIM-ONE DL datasets, trained on augmented images with various batch sizes, are presented in Table 27 and Table 28, respectively. Results show that EfficientNetB0 performed optimally on ORIGA with batch size 32 and on RIM-ONE DL with batch size 8. However, overfitting was observed in larger batch sizes, prompting early stopping to improve test accuracy.

Experiments 8.1 and 8.2 assessed the impact of varying epochs on the performance of an EfficientNetB0 model trained on augmented ORIGA and RIM-ONE DL with batch sizes of 32 and 8, respectively. Table 29 and Table 30 illustrate performance metrics for various epochs, providing insights into the model’s classification accuracy under various conditions. The EfficientNetB0 model was tested on ORIGA, RIM-ONE DL, and HRF datasets, showing optimal performance at 15 epochs, but potential overfitting was observed beyond 15 epochs.

Figure 11, Figure 12 and Figure 13 display the corresponding training and validation loss, along with accuracy over 15 epochs for the ORIGA, RIM-ONE DL, and HRF datasets, respectively. In all cases, both curves decreased and converged, indicating effective learning and generalization. High accuracy on both sets suggests reliable real-world predictions. For HRF, the model’s error decreased, and correct predictions increased over time, indicating improvement.

Table 31 summarizes the optimal classification results achieved by the proposed EfficientNetB0 model across the three datasets, highlighting its effectiveness and potential for improved diagnostic accuracy in clinical settings.

Figure 14, Figure 15 and Figure 16 display the true and predicted values on randomly selected images from the test datasets of ORIGA, RIM-ONE DL, and HRF, respectively. Figure 14 pertains to 10 images from ORIGA, Figure 15 corresponds to 10 images from RIM-ONE DL, and Figure 16 relates to 4 images from HRF.

We used Gradio [29], a Python library, to examine our model on unseen images. Gradio is designed interfaces for real-time interaction. Figure 17 shows our model successfully identifying glaucoma in a fundus image.

In summary, the comprehensive evaluation of image preprocessing and segmentation techniques reveals valuable insights. Augmentation methods and larger datasets significantly enhance model generalization, with the removal of random adjustments further refining segmentation quality. Despite the negligible impact of denoising filters, ResNet34 consistently outperforms other pretrained backbones in segmentation tasks. The strategic extraction of ROI reduces computational load, ensuring efficient resource utilization. Postprocessing techniques contribute to improved compatibility, generalization, and feature extraction. The proposed EfficientNetB0 model showcases exemplary performance across diverse datasets, affirming its efficacy in glaucoma classification and detection. The findings of this study collectively contribute to advancing the understanding and optimization of image processing methodologies in medical imaging applications.

The confusion metrics in Figure 18 reveal the accuracy of the classification model, with only one false negative on the RIM-ONE DL dataset, where a glaucoma case was misclassified as normal. Overall, the EfficientNetB0 model demonstrated exceptional performance across all three datasets, validating its effectiveness in glaucoma detection through high evaluation metric scores.

The ROC curves of the EfficientNetB0 model on the ORIGA, RIM-ONE DL, and HRF datasets confirm its robust glaucoma detection performance, showcasing a steep upward trajectory and exceptional discriminatory power with high sensitivity and low false positive rates, as shown in Figure 19.

Our proposed model, combining a U-Net with ResNet34 encoder for feature extraction and EfficientNet-B0 for classification, outperforms the state-of-the-art methods on the same datasets, showcasing its superior performance in glaucoma detection.

In comparison to [1], which used a pretrained DenseNet-201/U-Net and achieved 98.82% training and 96.90% testing accuracy on ORIGA, our model reached 99.9% training and 100% testing accuracy on the same dataset. For [2], using DenseNet-77/Mask-RCNN, our model achieved 100% accuracy on ORIGA and HRF datasets compared to their 96.3% average. Ref. [3], utilizing a saliency map, reached 95.75% accuracy on ORIGA, while our model achieved 100%. Lastly, Ref. [9], with EfficientNet-B0/EfficientDet-D0, achieved 97.2%, 98.21%, and 97.96% on ORIGA, HRF, and RIM ONE DL datasets, while our model achieved 100%, 100%, and 99% on the respective datasets.

Overall, our proposed model outperforms existing methods across various metrics, suggesting its potential for future applications in glaucoma detection. Further studies could investigate its potential in other domains. Table 32 summarizes the comparison of the proposed model with the related works.

5. Conclusions and Future Work

This project showcases the potential of deep learning, specifically U-Net with the ResNet34 encoder and EfficientNetB0, for automated glaucoma detection from retinal images. The models demonstrated remarkable performance across ORIGA, RIM-ONE DL, and HRF datasets, excelling in accuracy, AUC, precision, recall, F1 score, specificity, and mIoU. This study explores the impact of image preprocessing and postprocessing techniques, revealing the positive influence of augmentation techniques and a larger dataset, while denoising filters showed no enhancement.

This investigation has important practical implications for the medical field. The implementation of automated glaucoma detection can enhance early diagnosis, contributing to enhanced patient experiences and reduced healthcare costs. Our model provides a high level of accuracy and reliability, indicating its potential integration into clinical workflows. This integration would support ophthalmologists and minimize the load of manual image analysis.

Future work could investigate alternative preprocessing methods, testing other deep learning structures, using a more extensive and varied dataset. We also plan to apply our model in private clinics to assess its practical application in real-world settings. Developing a user-friendly software tool for clinical use could further facilitate the practical application of these models. The project advances medical imaging for glaucoma detection, but further research is needed to optimize clinical application.

Author Contributions

Conceptualization, N.A.A.; Data curation, R.E.A.; Formal analysis, N.A.A. and R.E.A.; Funding acquisition, N.A.A.; Investigation, R.E.A.; Methodology, N.A.A. and R.E.A.; Project administration, N.A.A.; Resources, R.E.A.; Software, R.E.A.; Supervision, N.A.A.; Validation, N.A.A. and R.E.A.; Visualization, N.A.A. and R.E.A.; Writing—original draft, N.A.A. and R.E.A.; Writing—review and editing, N.A.A. and R.E.A. All authors have read and agreed to the published version of the manuscript.

Funding

The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number INST218.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets that support the findings of this study are openly available in the following repositories: https://www.kaggle.com/datasets/arnavjain1/glaucoma-datasets (accessed on 4 July 2024); https://www.kaggle.com/datasets/chetanpediredla/glaucoma-dataset (accessed on 4 July 2024); https://www.kaggle.com/datasets/dasa7753912/glaucoma-detection (accessed on 4 July 2024). The source code is publicly available at: https://github.com/Ruqayyah-alabdulathim/Optimizing-Glaucoma-Diagnosis-with-Deep-Learning-Based-Segmentation-and-Classification.git (accessed on 26 August 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sudhan, M.B.; Sinthuja, M.; Pravinth Raja, S.; Amutharaj, J.; Charlyn Pushpa Latha, G.; Sheeba Rachel, S.; Anitha, T.; Rajendran, T.; Waji, Y.A. Segmentation and Classification of Glaucoma Using U-Net with Deep Learning Model. J. Healthc. Eng. 2022, 2022, 1601354. [Google Scholar] [CrossRef] [PubMed]
Nazir, T.; Irtaza, A.; Starovoitov, V. Optic Disc and Optic Cup Segmentation for Glaucoma Detection from Blur Retinal Images Using Improved Mask-RCNN. Int. J. Opt. 2021, 2021, 6641980. [Google Scholar] [CrossRef]
Latif, J.; Tu, S.; Xiao, C.; Ur Rehman, S.; Imran, A.; Latif, Y. ODGNet: A Deep Learning Model for Automated Optic Disc Localization and Glaucoma Classification Using Fundus Images. SN Appl. Sci. 2022, 4, 98. [Google Scholar] [CrossRef]
Veena, H.N.; Muruganandham, A.; Senthil Kumaran, T. A Novel Optic Disc and Optic Cup Segmentation Technique to Diagnose Glaucoma Using Deep Learning Convolutional Neural Network over Retinal Fundus Images. J. King Saud. Univ. Comput. Inf. Sci. 2022, 34, 6187–6198. [Google Scholar] [CrossRef]
Camara, J.; Neto, A.; Pires, I.M.; Villasana, M.V.; Zdravevski, E.; Cunha, A. Literature Review on Artificial Intelligence Methods for Glaucoma Screening, Segmentation, and Classification. J. Imaging 2022, 8, 19. [Google Scholar] [CrossRef] [PubMed]
Hesamian, M.H.; Jia, W.; He, X.; Kennedy, P. Deep Learning Techniques for Medical Image Segmentation: Achievements and Challenges. J. Digit. Imaging 2019, 32, 582–596. [Google Scholar] [CrossRef] [PubMed]
Kim, M.; Han, J.C.; Hyun, S.H.; Janssens, O.; Van Hoecke, S.; Kee, C.; De Neve, W. Medinoid: Computer-Aided Diagnosis and Localization of Glaucoma Using Deep Learning †. Appl. Sci. 2019, 9, 3064. [Google Scholar] [CrossRef]
Norouzifard, M.; Nemati, A.; GholamHosseini, H.; Klette, R.; Nouri-Mahdavi, K.; Yousefi, S. Automated Glaucoma Diagnosis Using Deep and Transfer Learning: Proposal of a System for Clinical Testing. In Proceedings of the 2018 International Conference on Image and Vision Computing (IVCNZ), Auckland, New Zealand, 19–21 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
Nawaz, M.; Nazir, T.; Javed, A.; Tariq, U.; Yong, H.-S.; Khan, M.A.; Cha, J. An Efficient Deep Learning Approach to Automatic Glaucoma Detection Using Optic Disc and Optic Cup Localization. Sensors 2022, 22, 434. [Google Scholar] [CrossRef] [PubMed]
Maheshwari, S.; Kanhangad, V.; Pachori, R.B. CNN-Based Approach for Glaucoma Diagnosis Using Transfer Learning and LBP-Based Data Augmentation. arXiv 2020, arXiv:2002.08013. [Google Scholar]
Mallick, S.; Saha, N.; Paul, J.; Ganguli, I.; Debnath, S.; Sil, J. An Efficient Deep Learning Framework for Glaucoma Diagnosis Using Convolution Mixed Transformer Network. In Proceedings of the TENCON 2023—2023 IEEE Region 10 Conference (TENCON), Chiang Mai, Thailand, 31 October–3 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1111–1116. [Google Scholar]
Manassakorn, A.; Auethavekiat, S.; Sa-Ing, V.; Chansangpetch, S.; Ratanawongphaibul, K.; Uramphorn, N.; Tantisevi, V. GlauNet: Glaucoma Diagnosis for OCTA Imaging Using a New CNN Architecture. IEEE Access 2022, 10, 95613–95622. [Google Scholar] [CrossRef]
Yi, S.; Zhou, L.; Ma, L.; Shao, D. MTRA-CNN: A Multi-Scale Transfer Learning Framework for Glaucoma Classification in Retinal Fundus Images. IEEE Access 2023, 11, 142689–142701. [Google Scholar] [CrossRef]
Virbukaitė, S.; Bernatavičienė, J.; Imbrasienė, D. Glaucoma Identification Using Convolutional Neural Networks Ensemble for Optic Disc and Cup Segmentation. IEEE Access 2024, 12, 82720–82729. [Google Scholar] [CrossRef]
Shyamalee, T.; Meedeniya, D.; Lim, G.; Karunarathne, M. Automated Tool Support for Glaucoma Identification with Explainability Using Fundus Images. IEEE Access 2024, 12, 17290–17307. [Google Scholar] [CrossRef]
Zhang, Z.; Yin, F.S.; Liu, J.; Wong, W.K.; Tan, N.M.; Lee, B.H.; Cheng, J.; Wong, T.Y. ORIGA^-light: An Online Retinal Fundus Image Database for Glaucoma Analysis and Research. In Proceedings of the 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, Buenos Aires, Argentina, 31 August–4 September 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 3065–3068. [Google Scholar]
Budai, A.; Bock, R.; Maier, A.; Hornegger, J.; Michelson, G. Robust Vessel Segmentation in Fundus Images. Int. J. Biomed. Imaging 2013, 2013, 154860. [Google Scholar] [CrossRef] [PubMed]
Fumero Batista, F.J.; Diaz-Aleman, T.; Sigut, J.; Alayon, S.; Arnay, R.; Angel-Pereira, D. RIM-ONE DL: A Unified Retinal Image Database for Assessing Glaucoma Using Deep Learning. Image Anal. Stereol. 2020, 39, 161–167. [Google Scholar] [CrossRef]
Orlando, J.I.; Fu, H.; Barbosa Breda, J.; van Keer, K.; Bathula, D.R.; Diaz-Pinto, A.; Fang, R.; Heng, P.-A.; Kim, J.; Lee, J.; et al. REFUGE Challenge: A Unified Framework for Evaluating Automated Methods for Glaucoma Assessment from Fundus Photographs. Med. Image Anal. 2020, 59, 101570. [Google Scholar] [CrossRef] [PubMed]
Juneja, M.; Thakur, N.; Thakur, S.; Uniyal, A.; Wani, A.; Jindal, P. GC-NET for Classification of Glaucoma in the Retinal Fundus Image. Mach. Vis. Appl. 2020, 31, 38. [Google Scholar] [CrossRef]
Castiglioni, I.; Rundo, L.; Codari, M.; Di Leo, G.; Salvatore, C.; Interlenghi, M.; Gallivanone, F.; Cozzi, A.; D’Amico, N.C.; Sardanelli, F. AI Applications to Medical Images: From Machine Learning to Deep Learning. Phys. Medica 2021, 83, 9–24. [Google Scholar] [CrossRef] [PubMed]
Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and Flexible Image Augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef]
Segmentation Models with Pretrained Backbones. Keras and TensorFlow Keras. Available online: https://github.com/qubvel/segmentation_models (accessed on 4 July 2024).
Ansari, S. Building Computer Vision Applications Using Artificial Neural Networks; Apress: Berkeley, CA, USA, 2020; ISBN 978-1-4842-5886-6. [Google Scholar]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar] [CrossRef]
Goutte, C.; Gaussier, E. A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation. In Proceedings of the European Conference on Information Retrieval, Santiago de Compostela, Spain, 21–23 March 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 345–359. [Google Scholar]
Bharati, S.; Khan, T.Z.; Podder, P.; Hung, N.Q. A Comparative Analysis of Image Denoising Problem: Noise Models, Denoising Filters and Applications. In Cognitive Internet of Medical Things for Smart Healthcare: Services and Applications; Springer: Berlin/Heidelberg, Germany, 2021; pp. 49–66. [Google Scholar]
Colab Pro. Available online: https://colab.research.google.com/signup (accessed on 4 July 2024).
Gradio. Available online: https://www.gradio.app/ (accessed on 6 May 2024).

Figure 1. Proposed framework of the system.

Figure 2. The proposed image segmentation model for this study.

Figure 3. The structure of baseline EfficientNet-B0.

Figure 4. The proposed classification model for this study.

Figure 5. The results of an image and its mask with augmentations.

Figure 6. The training and validation IOU and loss of model trained with 40 epochs in Experiment 1.4.

Figure 7. The training and validation IOU and loss of model trained with 50 epochs in Experiment 2.4.

Figure 8. An example of the segmentation results on the HRF dataset.

Figure 9. The results of the segmentation and the extracted ROI on the HRF dataset.

Figure 10. The results of the augmentation techniques that applied to the segmented images.

Figure 11. The training and validation loss, as well as accuracy over 15 epochs for the ORIGA dataset.

Figure 12. The training and validation loss, as well as accuracy over 15 epochs for the RIM-ONE DL dataset.

Figure 13. The training and validation loss, as well as accuracy over 15 epochs for the HRF dataset.

Figure 14. The true and predicted values on randomly selected 10 images from the test datasets of ORIGA.

Figure 15. The true and predicted values on randomly selected 10 images from the test datasets of RIM-ONE DL.

Figure 16. The true and predicted values on 4 images of test HRF dataset.

Figure 17. Illustration of one result generated by our model.

Figure 18. The confusion matrices on the three test datasets of our proposed EfficientNetB0 model: (a): ORIGA; (b): RIM-ONE DL; (c): HRF.

Figure 19. The area under the ROC (AUROC) for the three-test dataset of our proposed EfficientNetB0 model: (a): ORIGA; (b): RIM-ONE DL; (c): HRF datasets.

Table 1. The datasets utilized in our study.

Dataset	Segmentation/Classification	Number of Images		Total
Dataset	Segmentation/Classification	Glaucoma	Normal	Total
ORIGA [16]	Both	168	482	650
HRF [17]	Classification	15	15	30
RIM-ONE DL [18]	Both	172	313	485
REFUGE [19]	Segmentation	40	360	400

Table 2. The total quantity of data used in the segmentation phase.

Dataset	No. of Images/Masks		Balanced Dataset		Total
Dataset	Glaucoma	Normal	Glaucoma	Normal	Total
ORIGA and REFUGE-TS	208	842	208	208	416
ORIGA and RIM-ONE DL	340	795	340	340	680

Table 3. Total numbers of data before and after augmentation, along with the augmentation techniques used: random rotation (RR), random brightness (RB), contrast adjustment (CA), and random horizontal flip (RHF).

Stage	Dataset	Total Numbers of Data		Augmentation Technique
Stage	Dataset	Before Augmentation	After Augmentation	Augmentation Technique
Segmentation	ORIGA and REFUGE-TS	416	832	RR, RB, CA, RHF
	ORIGA and REFUGE-TS	416	1248	RR, RHF
	ORIGA and RIM-ONE DL	680	1360	RR, RB, CA, RHF
	ORIGA and RIM-ONE DL	680	2040	RR, RHF

Table 4. The data utilized in the classification phase.

Dataset	Number of Images		Balanced Dataset		Total
Dataset	Glaucoma	Normal	Glaucoma	Normal	Total
ORIGA [16]	168	482	168	168	336
HRF [17]	15	15	15	15	30
RIM-ONE DL [18]	172	313	172	172	344

Table 5. Augmentation methods and post-augmentation training image counts for ORIGA, RIM-ONE DL, and HRF datasets.

Stage	Dataset	Augmentation Technique	Total Count of Images after Augmentation
			Glaucoma	Normal
Classification	ORIGA [16]	RR, BA, RHF	1006	1006
	HRF [17]		87	87
	RIM-ONE DL [18]		959	959

Table 6. Comprehensive summary of experimental segmentation and classification analysis.

Stage	Experiments No.	Objective
Segmentation	Experiment 1	To identify the backbone architecture that yielded the highest performance for the U-Net model
	Experiment 2	To evaluate the effect of augmentation techniques on image segmentation using ResNet34 in the U-Net model
	Experiment 3	To test batch sizes for mIoU impact
	Experiment 4	To use varied epochs
Classification	Experiment 5	To test EfficientNetB0 on ORIGA, RIM-ONE DL, and HRF datasets with different batch sizes
	Experiment 6	To evaluate the performance of model with different numbers of epochs
	Experiment 7	To assess the impact of data augmentation on image classification
	Experiment 8	To assess the impact of varying epochs on the performance of an EfficientNetB0 model trained on augmented ORIGA and RIM-ONE DL

Table 7. Summary of the parameters used in the segmentation model.

Parameter	Value
Pretrained	Yes (ImageNet)
Optimizer	Adam
Activation	SoftMax
Learning rate	0.0001
Proposed backbone	ResNet34
Batch sizes	8, 16, 32, 64, 128
Epochs	30, 40, 50
Alternative backbones	ResNet50, VGG19, DenseNet121, EfficientNetB0

Table 8. Summary of the parameters used in the classification model.

Parameter	Value
Optimizer	Adam
Activation	Sigmoid
Learning rate	0.0001
Loss function	Binary cross-entropy
Proposed model	EfficientNetB0
Alternative models	DenseNet121, VGG19, ResNet50

Table 9. The mIoU scores for the pretrained backbones in Experiment 1.1.

Experiment 1.1
Dataset	Epoch	Batch Size	Model	Backbone	mIoU
ORIGA and REFUGE-TS	30	8	U-Net	ResNet34	0.8432825
				ResNet50	0.8392462
				VGG19	0.281714
				EfficientNetb0	0.5335277
				DenseNet121	0.6566112

Table 10. The mIoU scores for the pretrained backbones in Experiment 1.2.

Experiment 1.2
Dataset	Epoch	Batch Size	Model	Backbone	mIoU
ORIGA and RIM-ONE DL	30	8	U-Net	ResNet34	0.82803315
				ResNet50	0.82745105
				VGG19	0.2535684
				EfficientNetb0	0.27394077
				DenseNet121	0.28304914

Table 11. The mIoU for different numbers of images and augmentation techniques in Experiment 2.1.

Experiment 2.1
Dataset	Batch Size	Model	Augmentation Technique	Number of Images	mIoU
ORIGA and REFUGE-TS	8	U-Net with ResNet34	RR, RB, CA, RHF	832	0.926651
			RR, RHF	832	0.929291
			RR, RHF	1248	0.948287

Table 12. The mIoU for different numbers of images and augmentation techniques in Experiment 2.2.

Experiment 2.2
Dataset	Batch Size	Model	Augmentation Technique	Number of Images	mIoU
ORIGA and RIM-ONE DL	8	U-Net with ResNet34	RR, RB, CA, RHF	1360	0.967521
			RR, RHF	1360	0.970067
			RR, RHF	2040	0.977422

Table 13. The mIoU values for the different batch sizes in Experiment 3.1.

Experiment 3.1
Dataset	Epoch	Batch Size	Model	Backbone	mIoU
ORIGA and REFUGE-TS	30	8	U-Net	ResNet34	0.94828725
		16			0.9441021
		32			0.8463246
		64			0.6734596
		128			0.51045746

Table 14. The mIoU values for the different batch sizes in Experiment 3.2.

Experiment 3.2
Dataset	Epoch	Batch Size	Model	Backbone	mIoU
ORIGA and RIM-ONE DL	30	8	U-Net	ResNet34	0.977422
		16			0.976188
		32			0.9721312
		64			0.6423847
		128			0.5387978

Table 15. The mIoU values for different numbers of epochs in Experiment 4.1.

Experiment 1.4
Dataset	Epoch	Batch Size	Model	Backbone	mIoU
ORIGA and REFUGE-TS	30	16	U-Net	ResNet34	0.944102
	40				0.950539
	50				0.953595

Table 16. The mIoU values for the different numbers of epochs in Experiment 2.4.

Experiment 2.4
Dataset	Epoch	Batch Size	Model	Backbone	mIoU
ORIGA and RIM-ONE DL	30	16	U-Net	ResNet34	0.976188
	40				0.9792807
	50				0.98035127
	60				0.9826489

Table 17. Comparison between sub-experiments (1.1, 2.1, 3.1, and 4.1) and sub-experiments (1.2, 2.2, 3.2, and 4.2).

	Sub-Experiments (1.1, 2.1, 3.1, and 4.1)	Sub-Experiments (1.2, 2.2, 3.2, and 4.2)
Datasets	ORIGA and REFUGE-TS	ORIGA and RIM-ONE DL
Pretrained backbone	ResNet34	ResNet34
Denoising	No enhancement observed	No enhancement observed
Augmentation	Positive impact observed	Positive impact observed
Epochs no.	40	50
Highest mIoU	0.950539	0.98035127

Table 18. Results for each of the pretrained models trained on the ORIGA dataset.

Performance Metrics	Pretrained Models
Performance Metrics	EfficientNetB0	DenseNet121	VGG19	ResNet50
Train accuracy	0.97	0.88	0.8	0.59
Test accuracy	0.9	0.82	0.88	0.63
AUC	0.9	0.82	0.88	0.63
Precision	0.9	0.87	0.89	0.63
Recall	0.9	0.82	0.88	0.63
F1 score	0.9	0.82	0.88	0.63
Specificity	0.92	1	0.96	0.62

Table 19. Results for each of the pretrained models trained on the RIM-ONE DL dataset.

Performance Metrics	Pretrained Models
Performance Metrics	EfficientNetB0	DenseNet121	VGG19	ResNet50
Train accuracy	0.98	0.97	0.92	0.51
Test accuracy	0.92	0.92	0.81	0.50
AUC	0.92	0.92	0.81	0.50
Precision	0.93	0.93	0.86	0.25
Recall	0.92	0.92	0.81	0.50
F1 score	0.92	0.92	0.80	0.33
Specificity	0.88	0.88	1.0	0.0

Table 20. Results for each of the pretrained models trained on the augmented HRF dataset.

Performance Metrics	Pretrained Models
Performance Metrics	EfficientNetB0	DenseNet121	VGG19	ResNet50
Train accuracy	0.98	0.97	0.90	0.59
Test accuracy	1.00	0.62	0.38	0.50
AUC	1.00	0.63	0.38	0.50
Precision	1.00	0.62	0.37	0.50
Recall	1.00	0.62	0.38	0.50
F1 score	1.00	0.56	0.37	0.50
Specificity	1.00	1.00	0.50	0.50

Table 21. The performance metrics values with different batch sizes in Experiment 5.1.

Pretrained EfficientNetB0 on ORIGA Dataset
Batch Size	Train Accuracy	Test Accuracy	AUC	Precision	Recall	F1 Score	Specificity
8	0.97	0.90	0.90	0.90	0.90	0.90	0.92
16	0.98	0.49	0.50	0.25	0.50	0.33	0.0
32	0.99	0.53	0.54	0.76	0.54	0.41	0.08
64	0.99	0.51	0.52	0.75	0.52	0.37	0.04

Table 22. The performance metrics values with different batch sizes in Experiment 5.2.

Pretrained EfficientNetB0 on RIM-ONE DL Dataset
Batch Size	Train Accuracy	Test Accuracy	AUC	Precision	Recall	F1 Score	Specificity
8	0.98	0.92	0.92	0.93	0.92	0.92	0.88
16	0.99	0.90	0.90	0.90	0.90	0.90	0.92
32	0.99	0.85	0.85	0.87	0.85	0.84	0.96
64	0.99	0.77	0.77	0.84	0.77	0.76	1.0

Table 23. The performance metrics values with different batch sizes in Experiment 5.3.

Pretrained EfficientNetB0 on HRF Dataset
Batch Size	Train Accuracy	Test Accuracy	AUC	Precision	Recall	F1 Score	Specificity
4	0.98	1.00	1.00	1.00	1.00	1.00	1.00
8	0.98	1.00	1.00	1.00	1.00	1.00	1.00
16	1.00	0.50	0.50	0.50	0.50	0.50	0.50
32	1.00	0.75	0.75	0.75	0.75	0.75	0.75

Table 24. The performance metrics values with different numbers of epochs from Experiment 6.1.

Pretrained EfficientNetB0 on ORIGA Dataset
Batch Size	Epoch	Train Accuracy	Test Accuracy	AUC	Precision	Recall	F1 Score	Specificity
8	10	0.97	0.66	0.67	0.80	0.67	0.63	0.35
	15	0.97	0.90	0.90	0.90	0.90	0.90	0.92
	20	1.0	0.92	0.92	0.93	0.92	0.92	1.0
	30	0.98	0.86	0.86	0.87	0.86	0.86	0.81

Table 25. The performance metrics values with different numbers of epochs in Experiment 6.2.

Pretrained EfficientNetB0 on RIM-ONE DL Dataset
Batch Size	Epoch	Train Accuracy	Test Accuracy	AUC	Precision	Recall	F1 Score	Specificity
8	7	0.96	0.60	0.60	0.78	0.60	0.52	0.19
	11	0.97	0.84	0.85	0.85	0.85	0.85	0.77
	15	0.94	0.90	0.90	0.90	0.90	0.90	0.92
	20	0.96	0.85	0.85	0.85	0.85	0.85	0.81

Table 26. The performance metrics values with different numbers of epochs in Experiment 6.3.

Pretrained EfficientNetB0 on HRF Dataset
Batch Size	Epoch	Train Accuracy	Test Accuracy	AUC	Precision	Recall	F1 Score	Specificity
4	7	0.94	0.75	0.75	0.75	0.75	0.75	0.75
	15	0.98	1.00	1.00	1.00	1.00	1.00	1.00
	20	0.98	0.75	0.75	0.75	0.75	0.75	0.75

Table 27. The performance metrics values with different batch sizes on augmented ORIGA dataset in Experiment 7.1.

Pretrained EfficientNetB0 on Augmented ORIGA Dataset
Epoch	Batch Size	Train Accuracy	Test Accuracy	AUC	Precision	Recall	F1 Score	Specificity
15	8	0.992	0.91	0.91	0.91	0.91	0.91	0.97
	16	0.995	0.89	0.89	0.89	0.89	0.89	0.94
	32	0.999	1.00	1.00	1.00	1.00	1.00	1.00
	64	0.995	0.81	0.81	0.81	0.81	0.81	0.84

Table 28. The performance metrics values with different batch sizes on augmented RIM-ONE DL dataset in Experiment 7.2.

Pretrained EfficientNetB0 on Augmented RIM-ONE DL Dataset
Epoch	Batch Size	Train Accuracy	Test Accuracy	AUC	Precision	Recall	F1 Score	Specificity
15	8	0.999	0.99	0.99	0.99	0.99	0.99	1.00
	16	0.999	0.93	0.93	0.93	0.93	0.93	0.93
	32	0.998	0.92	0.92	0.92	0.92	0.92	0.93
	64	0.998	0.73	0.73	0.73	0.73	0.73	0.75

Table 29. The performance metrics values with different number of epochs on augmented ORIGA dataset in Experiment 8.1.

Pretrained EfficientNetB0 on Augmented ORIGA Dataset
Batch Size	Epoch	Train Accuracy	Test Accuracy	AUC	Precision	Recall	F1 Score	Specificity
32	10	0.995	0.95	0.95	0.95	0.95	0.95	0.97
	15	0.999	1.00	1.00	1.00	1.00	1.00	1.00
	20	0.998	0.89	0.89	0.89	0.89	0.89	0.88

Table 30. The performance metrics values with different number of epochs on augmented RIM-ONE DL dataset in Experiment 8.2.

Pretrained EfficientNetB0 on Augmented RIM-ONE DL Dataset
Batch Size	Epoch	Train Accuracy	Test Accuracy	AUC	Precision	Recall	F1 Score	Specificity
8	10	0.999	0.98	0.98	0.98	0.98	0.98	0.98
	15	0.999	0.99	0.99	0.99	0.99	0.99	1.00
	17	0.991	0.98	0.98	0.98	0.98	0.98	0.98

Table 31. Summary of the optimal classification results for the proposed EfficientNetB0 model on the ORIGA, HRF, and RIM-ONE DL datasets.

	ORIGA	RIM-ONE DL	HRF
Epoch	15	15	15
Batch size	32	8	4
Train accuracy	0.999	0.999	0.98
Test Accuracy	1.00	0.99	1.00
AUC	1.00	0.99	1.00
Precision	1.00	0.99	1.00
Recall	1.00	0.99	1.00
F1 score	1.00	0.99	1.00
Specificity	1.00	1.00	1.00

Table 32. Comparison of the proposed model with the related works.

	Performance Criteria
Study	Dataset	Accuracy	Sensitivity (Recall)	Specificity	AUC	F-Measure	IOU	Computational Efficiency
[1]	ORIGA	Train: 98.82% Test: 96.90%	Train: 98.95% Test: 97.03%	Train: 98.15% Test: 96.33%	NA	NA	NA	High processing time and memory usage.
[2]	ORIGA HRF	NA	Avg. 96.3%	NA	NA	0.97%	0.97%	Execution time of 1067 s; 6.2 million parameters, making it more memory efficient.
[3]	ORIGA	95.75%	94.75%	94.9%	97.9%	NA	NA	EfficientDet-D0: Designed to minimize memory usage while maintaining high accuracy.
[9]	ORIGA HRF RIM-ONE DL	97.2% 98.21% 97.96%	97%	NA	98%	NA	NA	EfficientDet-D0: Designed to minimize memory usage while maintaining high accuracy.
Our Proposed Model	ORIGA	Train: 99.9% Test: 100%	100%	100%	100%	100%	98%	The training times were as follows: approximately 240 s for ORIGA, 150 s for RIM ONE-DL, and 270 s for HRF. Our model is optimized for high efficiency on Google Colab Pro, utilizing GPU and high RAM, with 4,008,829 trainable parameters.
	RIM-ONE DL	Train: 99.9% Test: 99%	99%	100%	99%	99%	98%
	HRF	Train: 98% Test: 100%	100%	100%	100%	100%	98%

NA: not available.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alkhaldi, N.A.; Alabdulathim, R.E. Optimizing Glaucoma Diagnosis with Deep Learning-Based Segmentation and Classification of Retinal Images. Appl. Sci. 2024, 14, 7795. https://doi.org/10.3390/app14177795

AMA Style

Alkhaldi NA, Alabdulathim RE. Optimizing Glaucoma Diagnosis with Deep Learning-Based Segmentation and Classification of Retinal Images. Applied Sciences. 2024; 14(17):7795. https://doi.org/10.3390/app14177795

Chicago/Turabian Style

Alkhaldi, Nora A., and Ruqayyah E. Alabdulathim. 2024. "Optimizing Glaucoma Diagnosis with Deep Learning-Based Segmentation and Classification of Retinal Images" Applied Sciences 14, no. 17: 7795. https://doi.org/10.3390/app14177795

APA Style

Alkhaldi, N. A., & Alabdulathim, R. E. (2024). Optimizing Glaucoma Diagnosis with Deep Learning-Based Segmentation and Classification of Retinal Images. Applied Sciences, 14(17), 7795. https://doi.org/10.3390/app14177795

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimizing Glaucoma Diagnosis with Deep Learning-Based Segmentation and Classification of Retinal Images

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Datasets Description

3.2. Proposed Methodology

3.2.1. Data Preprocessing

3.2.2. Segmentation

3.2.3. Classification

3.2.4. Evaluation

3.2.5. Experimental Design

4. Results and Discussion

4.1. Data Preprocessing

4.2. U-Net with Pretrained Backbones for Segmentation

4.3. Pretrained Models for Classification

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI