Next Article in Journal
Physiological Indices for the Selection of Drought-Tolerant Safflower Genotypes for Cultivation in Marginal Areas
Previous Article in Journal
Monocular Vision Guidance for Unmanned Surface Vehicle Recovery
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optic Disc Segmentation in Human Retina Images Using a Meta Heuristic Optimization Method and Disease Diagnosis with Deep Learning

by
Hamida Almeshrky
1,* and
Abdulkadir Karacı
2,*
1
Faculty of Engineering and Architecture, Kastamonu University, Kastamonu 37150, Turkey
2
Faculty of Engineering and Natural Sciences, Samsun University, Samsun 55270, Turkey
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2024, 14(12), 5103; https://doi.org/10.3390/app14125103
Submission received: 7 May 2024 / Revised: 6 June 2024 / Accepted: 7 June 2024 / Published: 12 June 2024

Abstract

:
Glaucoma is a common eye disease that damages the optic nerve and leads to loss of vision. The disease shows few symptoms in the early stages, making its identification a complex task. To overcome the challenges associated with this task, this study aimed to tackle the localization and segmentation of the optic disc, as well as the classification of glaucoma. For the optic disc segmentation, we propose a novel metaheuristic approach called Grey Wolf Optimization (GWO). Two different approaches are used for glaucoma classification: a one-stage approach, in which the whole image without cropping is used for classification, and a two-stage approach. In the two-stage approach, the optic disc region is detected using the You Only Look Once (YOLO) detection algorithm. Once the optic disc region of interest (ROI) is identified, glaucoma classification is performed using pre-trained convolutional neural networks (CNNs) and vision transformation techniques. In addition, both the one-stage and the two-stage approaches are applied in combination with the pre-trained CNN using the Random Forest algorithm. In segmentation, GWO achieved an average sensitivity of 96.04%, a specificity of 99.58%, an accuracy of 99.39%, a DICE coefficient of 94.15%, and a Jaccard index of 90.4% on the Drishti-GS dataset. For classification, the proposed method achieved remarkable results with a high-test accuracy of 100% and 88.18% for hold-out validation and three-fold cross-validation for the Drishti-GS dataset, and 96.15% and 93.84% for ORIGA with hold-out and five-fold cross-validation, respectively. Comparing the results with previous studies, the proposed CNN model outperforms them. In addition, the use of the Swin transformer shows its effectiveness in classifying glaucoma in different subsets of the data.

1. Introduction

Diabetic retinopathy and glaucoma are eye problems that can cause loss of vision unknowingly. More than 150 million people around the world are affected by these problems [1]. Glaucoma is characterized by progressive damage to the blood vessels and optic nerves in the eye, often associated with high intraocular pressure (IOP). Increased IOP eventually damages the optic nerve fibers, leading to a thickening of the retinal nerve fiber layer (RNFL) and a change in the cup-to-disc ratio (CDR). As glaucoma’s impact progresses, the optic cup (the center of the optic nerve (ONH)) may become larger while the optic disc (the entire optic nerve) remains relatively stable in size. This region of interest (ROI) can be identified as the area where these changes occur. Figure 1 shows a visual representation of the differences in the cup-to-disc size, highlighted in blue, of a healthy and an affected eye, respectively, as observed in the Drishti-GS dataset.
If left untreated, it can lead to extensive damage to the optic nerves, resulting in loss of vision. The biggest challenge in detecting these eye diseases is the absence of outward symptoms until vision loss has already occurred [2]. There are no warning signs in the early stages. Glaucoma is second to cataracts among the common causes of vision loss. By 2040, 111.8 million people between the ages of 40 and 80 will be diagnosed with glaucoma [3].
The key to preventing glaucoma is regular eye examinations after 40 years for early detection [4]. The earlier glaucoma is diagnosed and treated, the higher the patient’s chances of preserving the vision of the affected eye.
The assessment of fundus images of the retina taken with fundus cameras can be a valuable method of detecting glaucoma. The optic disc appears as the brightest yellow region on the fundus image. As mentioned earlier, changes in the shape of the cup region can serve as an early indicator of glaucoma. Traditionally, ophthalmologists manually examined these retinal fundus images to detect signs of glaucoma disease by assessing the ONH. However, this process was tedious and time-consuming.
The automatic optic disc detection and glaucoma classification system offers ophthalmologists several advantages, including improving the diagnostic process and optimizing therapy. These systems contribute to a more efficient management of glaucoma and improve patient care. Many computer-aided diagnostic techniques are being proposed to analyze OCT images and aid the detection of various eye diseases. However, accurate optic disc (OD) segmentation is challenging due to factors such as poor contrast, low resolution, varying illumination, and noise in fundus images. Furthermore, selecting the best image features and classifiers for glaucoma detection is a significant challenge. Because it requires analyzing features from both the retinal image and the optic disc to achieve accurate segmentation and glaucoma detection.
Since glaucoma disease is closely related to the optic disc nerve areas, ROI can be used for glaucoma assessment. Some computer-aided diagnosis algorithms focus solely on the segmentation of the optic discs, while others combine segmentation for ROI detection and subsequent classification. Two different approaches have been deployed in this study. The first approach for the segmentation of the optic disc utilizes the Grey Wolf Optimizer (GWO) algorithm. The second approach is classification, where we deploy both two-stage and one-stage approaches. In the one-stage approach, we directly use the whole image for training and testing. However, in the two-stage approach, we utilize the YOLO algorithm to segment the region of the optic disc and crop it. Then, we use the ROI and conduct model training and testing on the cropped ROI fundus image. To perform glaucoma classification for both cropped and uncropped images, we apply deep learning techniques, including convolutional neural networks (CNNs), vision transformers, specifically the Swin Transformer (ST), and a combination of CNNs with Random Forest (RF). The contributions of our work are summarized as follows:
i.
We provide a metaheuristic approach for optic disc segmentation in fundus images that uses the Grey Wolf Optimization (GWO) algorithm, which has not previously been deployed in literature for this purpose. Thanks to this approach, we achieved a high performance of 99.39% accuracy in optical disc segmentation;
ii.
Vision transformers, including the Swin Transformer, show robustness to variations in image quality and noise, which are common challenges in medical imaging. Because of this, we used a new application of vision transformers, specifically the state-of-the-art Swin Transformer, for glaucoma image classification, which has received little attention in previous research. Thanks to this method, we achieved high classification performance;
iii.
We provide a thorough evaluation and comparison of two-stage and one-stage approaches for glaucoma classification. This evaluation provides valuable guidance for researchers working on glaucoma classification. These two approaches represent different strategies for glaucoma classification. The one-stage approach considers the entire image, potentially capturing broader contextual information but also including irrelevant features. On the other hand, the two-stage approach focuses specifically on the optic disc region, which is directly relevant to glaucoma diagnosis;
iv.
By effectively addressing the problem of class imbalance in ORIGA and Drishti-GS, our experimental results were able to achieve good results in glaucoma detection and classification.

2. Related Work

Optic Disc localization is a crucial step in the detection of glaucoma. Therefore, numerous methods have been developed to achieve this. In the following sections, we discuss various image-processing approaches that have been proposed for optic disc localization and glaucoma disease detection.

2.1. Optic Disc Localization

In fundus images, intensity variations can help to localize the optic disc. To take advantage of these variations, many researchers have used metaheuristic algorithms for optic disc detection in retinal fundus images. Pruthi et al. [2] proposed some metaheuristic algorithms, namely Ant Colony Optimization Algorithm, Bacterial Foraging Optimization Algorithm, Firefly Algorithm, Cuckoo Search Algorithm, and Krill Herd Algorithms. Kumar et al. [5] used the Jaya algorithm to solve the problem of localizing the optic disc in retinal images. They also proposed a preprocessing technique to improve the quality of retinal images. A method proposed by Abed et al. [6] uses the Fish School Search Algorithm to detect the optic disc. Abed et al. [7] presented a hybrid nature-inspired model called swarm intelligence with preprocessing that includes multiple stages as background subtraction, median, and mean filtering. Rahebi and Hardalac’s [8] firefly algorithm used the intensity of the optic disc detection. Abdullah et al. [9] presented a bat optimization algorithm. In their work, they used grayscale images as a preliminary stage to reduce the computation time, followed by an ellipse fitting approach to further improve the accuracy of disc detection. Gui et al. [10] observed the Harris Corner Detection method refers to changes in the grayscale within a sliding window. The algorithm identifies the corner area of the optic disc with the highest changes in the grey levels as the location of the optical disk. It measures each channel in the ROI to determine the most appropriate for OD detection. Thakur and Juneja [11] presented an automatic hybridization using a combination of adaptively regularized kernel-based fuzzy cluster mean and Intuitionistic Fuzzy C-Means (IFCM and Intuitionistic Fuzzy C-Means (IFCM), followed by a level set approach. Shaikha and Sallow [12] used a new technique for segmenting the optic disc using a matching histogram template and OD size algorithm. The center of the optic disc was found and then the cross-correlation between the original image and the template was determined. After finding the rectangle size, a binary mask was created for segmenting the optic disc using the ROI. Abdullah et al. [13] proposed a method for optic disc localization and segmentation using an active contour model based on the FCM clustering algorithm. In the first step, the ROI was extracted and preprocessed, such as selecting the red channel from the RGB image, and the center of the optic disc was localized; then, the active contour was applied to detect the OD edges and perform segmentation.

2.2. Classification of Glaucoma

Sreng et al. [14] proposed an automatic glaucoma detection system that is based on deep learning (DL) and consists of two stages. In the first stage, DeepLabv3+ and Mobile Net are used as feature encoders and segmentation OD. In the second stage, three different Deep CNNs were used to classify the segmented OD region as normal or glaucoma. Zhen et al. [15] studied and compared several deep learning models VGG16, VGG19, ResNet, DenseNet, InceptionV3, Inception ResNet, Xception, and NASNetMobile, and applied preprocessing techniques such as cropping, and histogram equalization to the data in their methods. Yang et al. [16] used the mask color of funds images as a segmentation model to crop the optic disk area. Then, they created an attentive approach to detect glaucoma. Elangovan et al. [17] presented a method for glaucoma detection using a CNN architecture with 18 layers. Image resizing and data augmentation techniques were applied in the preprocessing phase. Gómez-Valverde, et al. [18] used different pre-trained CNN models in their study, including VGG19, GoogleNet, ResNet50, and DenseNet. They applied preprocessing techniques, including optic disc localization and ROI detection. Hemelings et al. [19] presented a novel approach for glaucoma detection that combines deep learning and active learning methods. The approach uses an extracted ROI region. In addition, the model generated interpretable heat maps to assist in decision-making for glaucoma classification. Natarajan et al. [20] presented a method for glaucoma detection that includes CLAHE-based preprocessing, followed by ROI segmentation using the green channel data, and identification of OD and OC via a modified kernel fuzzy c means algorithm. Li et al. [21] presented an attention-based CNN model for glaucoma classification, which includes three stages: identification of the ROI, localization of the pathological area using the attention map, and classification as normal or glaucomatous. A two-stage method for optic disc localization and glaucoma classification from fundus images was developed by Bajwa et al. [22]. In the first stage, the localization of the optic disc is achieved using a Recurrent Convolutional Neural Network (RCNN). In the second stage, a CNN is used to classify normal and glaucoma. Kamesh et al. [23] presented glaucoma detection using a CNN architecture. Their methodology involves ROI selection. Al-Bander et al. [24] used AlexNet CNNs to extract features from raw fundus images. These feature learners on the images were then classified into normal and glaucomatous types using a support vector machine (SVM) classifier. Abbas [25] developed and implemented a combination of an unsupervised CNN architecture and a deep belief network model. During the preprocessing phase, ROI was extracted from the green channel image. The CNN was then trained using these ROI images. A new DL architecture, the so-called “Vision Transformers (ViTs)”, has recently attracted much attention [26]. There are relatively few studies that focus specifically on distinguishing glaucoma from normal eyes using color fundus photographs. Wassel et al. [27] investigated the effectiveness of various vit-based single models (such as Swin Transformer, CrossFit, and cait) and vit-based ensemble models using raw color fundus images for glaucoma detection. Hu et al. [28] used GLIM-Net, which is a Glaucoma Forecast Transformer based on irregularly sampled to predict the future onset of glaucoma for patient images.

3. Materials and Methods

3.1. Datasets

Two publicly available datasets were used in this study: ORIGA [29] and Drishti-GS [30]. Table 1 contains detailed information on these datasets.

3.2. Methods

In this study, we propose both segmentation and classification approaches. In segmentation, we used the GWO metaheuristic algorithm to segment the optic disc from the retinal fundus images. Two approaches are proposed for classification (Figure 2): one-stage and two-stage classifications.
In the one-stage classification approach, whole images were used to train pre-trained CNN and ST models without any cropping. In the two-stage classification, we used the YOLO detection algorithm. The process comprises two phases. In the first phase, the YOLO detection algorithm is used to locate and crop the optic disc region. In the second phase, the classification is performed by feeding the cropped images of the optical disc into the DenseNet201, VGG19, ResNet50, InceptionV3 CNN, and Swin Transformer models. In the RF model, the features extracted from these pre-trained CNN models are then fed into a Random Forest (RF) classifier for classification. Both the cropped and the uncropped approaches are evaluated using RF classifiers. This comprehensive framework allows us to evaluate the performance of different models and approaches in distinguishing individuals with glaucoma from healthy individuals using retinal fundus images.
The proposed method for optic disc segmentation and glaucoma classification offers several distinctive characteristics compared to previous studies. This study is distinguished by its introduction of a novel metaheuristic algorithm, GWO, for optic disc segmentation. Unlike traditional segmentation methods, GWO mimics the hunting behavior of grey wolves to optimize the segmentation process, providing robust and accurate results even in challenging retinal images. The use of a state-of-the-art Swin Transformer for glaucoma classification represents an approach not used in previous studies. Also, this study provides an in-depth evaluation and comparison of one-stage and two-stage approaches for glaucoma classification. A two-stage approach harnesses the complementary strengths of different methodologies, leading to superior performance compared to methods relying solely on traditional algorithms or deep learning models. Many previous studies have often focused on individual methods. The remainder of this section contains information on GWO and YOLO, ST.

3.2.1. GWO Meta-Heuristic Optimization Algorithm

The GWO algorithm is a metaheuristic algorithm inspired by the social hierarchy hunting behavior of grey wolves. The algorithm can be considered a robust swarm-based optimizer [31]. Grey wolves prefer to live in a group. All members of this group have specific tasks. In each wolf pack, there are 4 levels of hunting, modeled as a pyramidal structure, alpha, beta, omega, and delta. The alpha wolf is the leader of the group and makes the decisions. The beta wolf supports the alpha wolf and carries out its orders. The omega wolf submits to the higher-ranking wolves and helps to maintain order. The delta wolves have different roles, such as spies, guardians, elders, hunters, and guards. This hierarchy enables efficient decision-making, coordination, and the well-being of the group. Similarly, the GWO algorithm consists of 4 main steps:
  • Search, approach, and track the prey (exploitation);
  • Pursuing, harassing, and encircling the prey until it stops moving;
  • Hunting;
  • Attacking the prey when it is exhausted (exploration).
In conventional GWO, to mathematically model encircling behavior, Equations (1)–(4) are used.
D = | C   · X P t X     t |
X t + 1 = X P t A · D
where t current iteration, X P is the prey position vector, and X is the grey wolf position vector. A and C are Vectors calculated as follows:
A = 2 a   · r 1 a
C = 2 · r 2
where   r 1 and r 2 are random vector ϵ [0, 1] and a linearly decreased from 2 to 0.
Since we do not know the real position of the optimal solution, X P   depends on the three best solutions of wolves (alpha, beta, delta, and other omega wolves), which update their position during the hunt and as Equation (5).
D α = | C 1   · X α X , D β = | C 2   · X β X , D δ = | C 3 · X δ X | , X 1 = X α A 1 · D α ,   X 2 = X β A 2 · D β , X 3 = X δ A 3     · D δ
The position of the GWO is updated as follows:
X t + 1   = X 1 + X 2 + X 3 3
The search phase is exactly the opposite of the attack process: in these behaviors, the GWO depends on A and C parameters, which are defined using Equations (3) and (4), respectively.

3.2.2. YOLO

YOLO is a state-of-the-art object detection algorithm developed by Joseph Redmon in 2016. It handles object detection as a classification and regression problem. YOLO’s one-stage detection relies on deep learning. YOLO uses features extracted from an entire image to predict each bounding box. Moreover, it simultaneously predicts all bounding boxes of all classes for a given image [32].
The YOLO model utilizes a methodology in which the input image is divided into a grid comprising of cells(C), thereby generating square blocks with dimensions of S × S. Each prediction of a bounding box is composed of five essential elements: x, y, w, h, and confidence. The x-coordinate and y-coordinate signify the position of the center of the bounding box about the grid cell it belongs to. The width and height of the bounding box are represented as a proportion to the entire image. The confidence score denotes the likelihood that the bounding box encompasses an object and the accuracy of its prediction. Its estimates in the YOLO algorithm are based on the intersection over Union (IoU) between the predicted bounding box and the ground truth bounding box. The intersection refers to the area where the two boxes overlap, while the union refers to the combined area of both boxes. IoU is calculated by dividing the area of the intersection by the area of the union as shown in Equation (7) [33].
I o U = ( A r e a   o f   I n t e r s e c t i o n ) / ( A r e a   o f   U n i o n )
Another metric for evaluating the effectiveness of models in object detection is average precision, which is given in Equation (8). In this equation, A P K represents the average precision for class k, while n represents the total number of classes. The mAP metric, in turn, compares the identified bounding box with the ground truth.
m A P = 1 n k = 1 k = n A P K

3.2.3. Vision Transformer

Vision transformers were originally built for machine translation, and their potential in the field of computer vision has been recognized. The architecture of the transformer is different from that of CNNs. While CNNs consist of many types of layers, including convolutional, pooling, and fully connected layers [34], ViTs include a series of transformer blocks, each of which consists of a self-attention layer and a feed-forward layer [35]. Also, the transformer does not need to adopt the size of the input image at the preprocessing step. Attention mechanisms are mostly used in vision transformers, which allow the transform-based model to focus on relevant parts of the input image and effectively capture long-range dependencies. In this study, we apply both the Swin Transformer and CNNs to determine the optimal framework for glaucoma classification.

3.2.4. Performance Metrics

For segmentation tasks, the five metrics used to evaluate performance are the Dice Similarity Coefficient, Jaccard Index, specificity, sensitivity, and accuracy. The formulas for these metrics are as follows:
A c c = ( T P + T N ) / ( T P + T N + F P + F N )
Dice = ( 2 × T P ) ( 2 × T P + F P + F N )
J a c c a r d   I n d e x = T P / ( T P + F P + F N )
S p e c i f i c i t y = T N / ( T N + F P )
S e n s i t i v i t y = T P / ( T P + F N )
For classification tasks, in addition to accuracy, sensitivity, and specificity, we commonly use F1 score and precision. The formulas for these metrics are as follows:
P r e c i s i o n = T P / ( T P + F P )
F 1   S c o r e = 2 × ( P r e c i s i o n R e c a l l ) / ( P r e c i s i o n + R e c a l l )

4. Experiments

4.1. Hyperparameter Selection

Using the hold-out strategy, which uses 80% of the data for training and 20% for testing, the models were trained and tested. Also, we used three-fold for DRISHTI-GS and five-fold cross-validation for ORIGA. According to [36], optimization is a learning algorithm that specifies how millions or even billions of parameters are updated. Selecting optimization techniques for training is a challenging task [37]. Thus, when developing the models, the Adam, Sgd, Rmsprop, Adamax, and Nadam algorithms were tested. Table 2 displays the hyperparameters at which the best classification performance is obtained. The “Base Model Trainable” parameter is related to the convolutional and pooling layers. When this parameter is set to “False,” no training is carried out, and these layers are frozen. The weights “imagenet” are utilized exactly as they are. Pre-trained CNN models have this parameter set to “False.” The learning rate is a crucial parameter in model training. Overly high learning rates can cause oscillations during the training process. The training time may increase if it is set too low. Therefore, during model training, various learning rates were tested, and the value that produced the best performance was reported in the study.

4.2. OD Localization Using GWO

The GWO algorithm was used to estimate the threshold value for the gray-level mask, which isolates the OD from the background, the initial stage in the detection of glaucoma. The GWO approach for optic disc detection is shown in Figure 3.
For each image in the dataset, the preprocessing steps were applied. These steps include reading the original image as in Figure 4a, and then converting the image to grayscale by extracting the red channel as in Figure 4b, to enhance the visibility of the optic disc region and distinguish it from the background. A morphological closing operation, as seen in Figure 4c, was applied to smoothen the boundaries in the optic disc and fill in any gaps or holes. Additionally, a gray mask (Abdullah et al. [11]) that helps to highlight relevant features was applied and unnecessary details were removed to effectively emphasize the optic disc region while suppressing the background as shown in Figure 4d. Following the grayscale mask, the next step is to apply histogram equalization. Histogram equalization is a process that redistributes the intensities of the image to improve the overall contrast. By stretching the intensity values across the entire range, histogram equalization enhances the visibility of the optic disc region and makes subsequent analysis by GWO more effective. Next, the GWO algorithm was deployed to optimize the segmentation threshold for the optic disc region. The wolves (search agents) iteratively update their positions based on the intensity values derived from the histogram analysis of the retina image. We used 30 search agents (the number of wolves). After obtaining the optimized segmentation threshold (prey), post-processing techniques were used to refine the segmented optic disc region. This involves morphological operations to further improve the segmentation result by removing noise and enhancing the shape of the optic disc. Finally, the segmentation result shown in Figure 5a is compared with the ground truth shown in Figure 5b.
To evaluate the segmentation, we used various evaluation metrics such as sensitivity, specificity, accuracy, the Jaccard index, and the Dice coefficient. These metrics provide quantitative measures of the performance of the segmentation algorithm compared to the ground truth data.

4.3. YOLO Object Detection

The YOLO detection algorithm was run on the DRISHTI-GS and ORIGA datasets. The purpose of YOLO in this application is to detect the optic disc region in retinal fundus images and crop this region to obtain a new dataset. This approach aims to improve the accuracy of disease classification by focusing on the diseased area using pre-trained CNN models. Before training YOLO, 101 retinal fundus images in this dataset were labeled using the “labelImg” graphical image annotation tool (https://github.com/tzutalin/labelImg: accessed on 10 May 2024). Subsequently, YOLOv7 was trained for 300 epochs on 68 randomly selected labeled images. The testing process was performed on the remaining 33 images. The model’s performance was evaluated based on the mAP and IoU metrics. The performance metrics obtained from the training and testing data are shown in Table 3. When evaluating based on the mean average precision (mAP) metric, bounding boxes above a certain IoU threshold are considered. Taking an IoU threshold of 0.50, the mAP values for the training and testing data are 99.53% and 99.50%, respectively. Both the recall (R) and precision (P) values are 1 for both the training and testing data.
After training the YOLOV7 model, all images were given to this model, and the optic disc region was determined. These regions were cropped. A new data set was created with images consisting of cropped optic disc regions. An example of an image in which the YOLO model detects the optic disc region on the test data is presented in Figure 6.

4.4. Glaucoma Classification

For glaucoma classification, both the ROI images (cropped) and whole images (uncropped images) were used as the inputs for the proposed RF-based and CNN-based models. To train the RF-based model, deep features were extracted by feeding all retinal images, both those cropped with the YOLO and those uncropped, to the trained CNN models. Previous layers from the fully connected layer of trained DenseNet201, ResNet50V2, InceptionV3, and VGG19 models were used to extract features. The number of features obtained from these layers is as follows: 1920 features from the “global_average_pooling2d_12” layer of the DenseNet201 model, 2048 features from the “global_average_pooling2d” layers of the ResNet50V2 and InceptionV3 models, and 25,088 features form the VGG19 model.

5. Results and Discussion

5.1. Segmentation of Optic Disc Area Using GWO

The results of the GWO-based segmentation method are sensitivity = 96.04, specificity = 99.58, accuracy = 99.39, dice = 94.15, and Jaccard = 90.4. The results show high specificity and accuracy as well as relatively good performance in terms of the dice coefficient. Figure 7 shows the optic disc segmentation results of the proposed method for healthy and glaucomatous retinal images from the Drishti-GS datasets.

5.2. Glaucoma Classification Results

5.2.1. Classification Results for DRISHTI-GS Dataset

Table 4 shows the results of hold-out validation classification using both one-stage and two-stage approaches with pre-trained CNN architectures: DenseNet201, ResNet50, Vgg19, InceptionV3, and Swin Transformer on the DRISHTI-GS dataset. DenseNet201 and Vgg19 exhibit similar results. Both models had an accuracy of 90.48%, a sensitivity of 93.33%, and a specificity of 83.33%. These models correctly classified glaucoma cases with an accuracy of 90.48% and healthy cases with an accuracy of 83.33%. The ResNet50 model has a similar accuracy level to these models. However, comparing models based on accuracy may not be appropriate. While ResNet50 correctly classified glaucoma cases with 100% accuracy, they only correctly classified healthy cases with 66.67% accuracy. A similar situation exists with the InceptionV3 model. Swin Transformer achieved exceptional performance specifically in classifying glaucoma with 100% accuracy. In the case of uncropped images, Swin Transformer excelled across all metrics, achieving 100% accuracy.
Using hold-out validation alone is not sufficient for testing model performance. Therefore, we used the k-fold cross-validation method to ensure better data utilization. In this study, the three-fold cross-validation method was implemented, and the classification performances obtained for the YOLO-cropped dataset (two-stage approach) are presented in Table 5. The highest classification performance was achieved with the DenseNet201 and ResNet50 models, with an accuracy of 81.19%. This accuracy value was lower compared to the one obtained from the hold-out validation method. According to the R and S metric values obtained from the overlapped confusion matrix, DenseNet201 correctly classifies glaucoma cases with an accuracy of 92.86% and healthy individuals with an accuracy of 54.84%. For ResNet50, these rates were 94.29% and 51.61%, respectively. These two models exhibited high accuracy in classifying glaucoma cases but low accuracy in classifying healthy individuals.
Furthermore, the uncropped dataset was used to test the performance of these models. Table 6 presents the results with an accuracy of 88.18%; the Swin Transformer had the best classification performance. This result is compatible with the result obtained in the hold-out method. Swin Transformer correctly classified glaucoma cases with an accuracy of 94.29% and healthy individuals with an accuracy of 74.19%. Overall, the results of glaucoma on the Drishti-GS dataset demonstrate the effectiveness of pre-trained CNN models, particularly the Swin transformer, in glaucoma classification using the hold-out and three-fold methods.
Table 7 presents the hold-out validation classification results for the RF model using cropped images and uncropped images. As shown in Table 7, the RF model fed with features extracted from the DenseNet201 model had the highest accuracy of all models, at 95.24%. It correctly classified glaucoma cases with 100% accuracy and healthy individuals with 83.33% accuracy. When compared to other models, this model provides improvement in correctly classifying glaucoma but does not show improvement in identifying healthy individuals. The highest results obtained with ResNet50+RF and VGG19+RF models had an accuracy of 90.48%, which is the same performance value previously obtained in CNN-based models. However, their results were lower compared to the performance achieved in the uncropped dataset. For most of the CNN architectures, the cropped and uncropped accuracies were relatively close. However, there were some cases where there was a noticeable difference. For example, DenseNet201+RF had a cropped accuracy of 95.24% compared to the uncropped (90.48%). However, the best result for hold-out was obtained by the Swin transformer in all experiments.
The three-fold cross-validation classification results for RF models using cropped datasets are presented in Table 8. The highest classification accuracy was achieved by the RF model trained with features extracted from the ResNet50+RF and DenseNet201+RF models, reaching an accuracy level of 81.22%. These models correctly classified glaucoma cases with an accuracy level of 100% and healthy individuals with an accuracy level of 38.70%.
In addition to the experimental study conducted on the cropped data, it is important to perform similar experimental studies using a one-stage approach on the raw dataset. Therefore, similar experimental studies were conducted on the same models, and the classification performances are presented in Table 9. The highest classification performance was achieved by the VGG19+RF model, with an accuracy of 87.14%. This value is the second-highest classification performance achieved following the Swin Transformer. Based on the results obtained from DRISHTI-GS, it was indicated that the Swin Transformer model demonstrated high effectiveness especially when used in a cropped image (one-stage) for glaucoma classification in both the hold-out and three-fold techniques.

5.2.2. Classification Result for ORIGA Dataset

Table 10 shows the hold-out validation classification results using both one-stage and two-stage approaches with the pre-trained CNN architectures: DenseNet201, ResNet50, Vgg19, InceptionV3, and Swin Transformer on the ORIGA dataset. In the cropped dataset, the Swin Transformer model achieved the highest accuracy among the tested models with an accuracy of 96.15%. The Swin-Transformer model also achieved the highest recall, specificity, precision, and F1 score. These results demonstrate the effectiveness of the Swin-Transformer model in glaucoma classification, regardless of whether the dataset is cropped or uncropped.
The classification performances of the models with five-fold cross-validation that we obtained for the cropped dataset are shown in Table 11. The highest classification performance was obtained with the Inception V3 model with an accuracy of 93.53%.
Table 12 shows the classification results achieved for the uncropped dataset using five-fold cross-validation. The results show that the Inception V3 model achieved the highest classification performance, with an accuracy of 93.84%.
The five-fold cross-validation findings show that several models, such as Inception V3, DenseNet201, and the Swin Transformer, perform better on uncropped images than cropped images.
Table 13 shows the outcomes of RF models trained on cropped and uncropped datasets. Based on these findings, the DenseNet201+RF model performed best on both the cropped and uncropped datasets.
Table 14 presents the performance for models: DenseNet201+RF, Resnet50+RF, VGG19+RF, and Inception V3+RF five-fold validation on the cropped dataset. The results show that DenseNet201+RF, Resnet50+RF, VGG19+RF, and Inception V3+RF had a similar range of accuracy between 74% to 76%, with DenseNet201+RF demonstrating relatively high accuracy among all models.
Table 15 presents the performance of the DenseNet201+RF, ResNet50+RF, VGG19+RF, and Inception V3+RF models with 5-fold validation on the uncropped dataset. All models had similar accuracy metrics ranging from 74% to 74.76%, with the highest result obtained by Inception V3+RF.
The results from the ORIGA dataset for both cropped and uncropped images suggest that using the Swin Transformer with cropped images and hold-out validation can produce better results for glaucoma identification. Furthermore, utilizing a k-fold validation strategy, the Inception V3 model produced better performance results for uncropped datasets.

5.3. Comparison and Discussion with Previous Studies

For segmentation, we compared our proposed optic disc segmentation method with other studies on Drishti-GS, using average sensitivity, specificity, accuracy, DICE score, and Jaccard, and the results are presented in Table 16. According to these results, the proposed model achieved the highest Jaccard, recall, and specificity performance. Also, the accuracy and DICE performance of the proposed approach are quite close to other studies. These findings demonstrate the effectiveness of the proposed methodology in segmenting the OD.
There are previous studies that classify glaucoma based on the data sets in our study. However, there are differences in the way the authors approached this problem. Some studies used different image-splitting rates, while others opted for cross-validation with varying folds. Based on Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11, Table 12, Table 13, Table 14 and Table 15, both k-fold and hold-out validation approaches were used in this study. Table 17 shows that none of the studies used hold-out and k-fold cross-validation methods for both cropped (two-stage) and uncropped images (one-stage).
When comparing the results obtained on the Drishti-GS dataset with previous studies, the main achievement of this study is that it finds the Swin Transformer algorithm had a 100% classification accuracy and accurately detected normal and glaucoma images. This means that the Swin Transformer can precisely predict negative and positive results. It is worth noting that none of the reviewed studies, even those that used holdout or cross-validation, achieved such a high level of all prominence metrics. These findings highlight our approach and its potential for accurate glaucoma detection in the uncropped Drishti-GS dataset. In three-fold cross-validation, the highest classification accuracy was obtained with Swin Transformer (Acc = 88.18%). The specificity for classifying healthy images was 74.19%, while the sensitivity for classifying glaucoma images was 94.29%. Although these results demonstrate high accuracy compared with other studies, the accuracy was somewhat lower than that of Alagirisamy [48] and Chaudhary et al. [50]. It is important to consider that those studies applied holdout validation and five-fold cross-validation, respectively. When the hold-out validation results obtained on the ORIGA dataset are compared with previous studies using different models, Swin Transformer had a significantly higher classification performance, with 96.15% accuracy for cropped images. This indicates the robustness of the Swin Transformer model for glaucoma classification. The Swin Transformer model correctly classified glaucoma at a rate of 95.05% (R = 95.05) when applied to the ORIGA dataset. This rate was 100% (S = 100) for healthy images. The results highlight the ability of the Swin Transformer architecture to accurately detect glaucoma cases. In the five-fold cross-validation, the accuracy value obtained for the ORIGA dataset using InceptionV3 was 93.84%. This accuracy value is considerably higher than in previous studies.

6. Conclusions

Optic disc detection is one of the basic steps in the automated diagnosis of glaucoma. This paper presents an automated method for the segmentation of optic discs and the classification of glaucoma. The proposed segmentation method incorporates gray scale masking [13], mathematical morphology operations (closing and dilation), and the Grey Wolf Optimization (GWO) algorithm for optimum threshold determination. We used the GWO algorithm for the first time in retinal optic disc segmentation on the DRISHTI-GS. Four deep learning approaches were used to classify glaucoma: ResNet50, Inception V3, DenseNet201, and VGG19, as well as Swin Transform. The proposed models were tested on the demanding ORIGA and DRISHTI-GS datasets. These datasets are uneven in terms of the number of images per class. In addition, classification was carried out with a random forest model fed with features obtained from trained CNNs. All classification experiments were conducted on both cropped images using the YOLO algorithm and uncropped images. According to all classification results, the Swin Transformer had the highest overall performance in accurately classifying glaucoma on the ORIGA and DRISHTI-GS datasets. The comparison also reveals that both one-stage and two-stage solutions performed well on the ORIGA dataset; however, the DRISHTI-GS dataset favored the one-stage approach. Compared to existing techniques in the literature, the models achieved a higher classification accuracy rate for all the datasets. Therefore, future work should focus on identifying and recognizing other types of glaucoma abnormalities and their analysis. Investigating the effects of cropped images using other datasets, both cropped and uncropped, might also be beneficial.

Author Contributions

Methodology, H.A. and A.K.; Software, H.A. and A.K.; Validation, A.K.; Investigation, A.K.; Writing—original draft H.A.; Writing—review & editing, A.K.; Supervision, A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sivaprasad, S.; Gupta, B.; Crosby-Nwaobi, R.; Evans, J. Prevalence of diabetic retinopathy in various ethnic groups: A worldwide Perspective. Surv. Ophthalmol. 2012, 57, 347–370. [Google Scholar] [CrossRef] [PubMed]
  2. Pruthi, J.; Arora, S.; Khanna, K. Metaheuristic techniques for detection of optic disc in retinal fundus images. 3D Res. 2018, 9, 47. [Google Scholar] [CrossRef]
  3. Senjam, S.S. Glaucoma blindness—A rapidly emerging non-communicable ocular disease in India: Addressing the issue with advocacy. J. Fam. Med. Prim. Care 2020, 9, 2200–2206. [Google Scholar] [CrossRef]
  4. Yan, M.; Lin, Y.; Peng, X.; Zeng, Z. mixDA: Mixup domain adaptation for glaucoma detection on fundus images. Neural Comput. Appl. 2023, 32, 1–20. [Google Scholar] [CrossRef]
  5. Kumar, B.V.; Zhang, S.; Wu, T.; Prakash, J.; Zhou, L.; Li, K. A novel JAYA algorithm for optic disc localization in eye fundus images. Int. J. Comput. Vis. Robot. 2022, 12, 324–342. [Google Scholar] [CrossRef]
  6. Abed, S.; Al-Oraifan, D.; Safar, A. Optic disc detection using fish school search algorithm based on FPGA. J. Eng. 2019, 7, 161–177. [Google Scholar]
  7. Abed, S.; Al-Roomi, S.A.; Al-Shayeji, M. Effective optic disc detection method based on swarm intelligence techniques and novel pre-processing steps. Appl. Soft Comput. 2016, 49, 146–163. [Google Scholar] [CrossRef]
  8. Rahebi, J.; Hardalaç, F. A new approach to optic disc detection in human retinal images using the firefly algorithm. Med. Biol. Eng. Comput. 2016, 54, 453–461. [Google Scholar] [CrossRef]
  9. Abdullah, A.S.; Özok, Y.E.; Rahebi, J. A novel method for retinal optic disc detection using bat meta-heuristic algorithm. Med. Biol. Eng. Comput. 2018, 56, 2015–2024. [Google Scholar] [CrossRef]
  10. Gui, B.; Shuai, R.J.; Chen, P. Optic disc localization algorithm based on improved corner detection. Procedia Comput. Sci. 2018, 131, 311–319. [Google Scholar] [CrossRef]
  11. Thakur, N.; Juneja, M. Optic disc and optic cup segmentation from retinal images using hybrid approach. Expert Syst. Appl. 2019, 127, 308–322. [Google Scholar] [CrossRef]
  12. Shaikha, H.K.; Sallow, A.B. Optic Disc Detection and Segmentation in Retinal Fundus Image. In Proceedings of the 2019 International Conference on Advanced Science and Engineering (ICOASE), Zakho, Iraq, 2–4 April 2019; pp. 23–28. [Google Scholar]
  13. Abdullah, A.S.; Rahebi, J.; Özok, Y.E.; Aljanabi, M. A new and effective method for human retina optic disc segmentation with fuzzy clustering method based on active contour model. Med. Biol. Eng. Comput. 2019, 58, 25–37. [Google Scholar] [CrossRef]
  14. Sreng, S.; Maneerat, N.; Hamamoto, K.; Win, K.Y. Deep Learning for Optic Disc Segmentation and Glaucoma Diagnosis on Retinal Images. Appl. Sci. 2020, 10, 4916. [Google Scholar] [CrossRef]
  15. Zhen, Y.; Wang, L.; Liu, H.; Zhang, J.; Pu, J. Performance assessment of the deep learning technologies in grading glaucoma severity. arXiv 2018, arXiv:1810.13376. [Google Scholar]
  16. Yang, G.; Li, F.; Ding, D.; Wu, J.; Xu, J. Automatic diagnosis of glaucoma on color fundus images using adaptive mask deep network. In MultiMedia Modeling: MMM 2021, International Conference on Multimedia Modeling, Prague, Czech Republic, 22–24 June 2021; Springer: Cham, Switzerland, 2021; pp. 99–110. [Google Scholar]
  17. Elangovan, P.; Nath, M.K. Glaucoma assessment from color fundus images using convolutional neural network. Int. J. Imaging Syst. Technol. 2021, 31, 955–971. [Google Scholar] [CrossRef]
  18. Gómez-Valverde, J.J.; Antón, A.; Fatti, G.; Liefers, B.; Herranz, A.; Santos, A.; Sánchez, C.I.; Ledesma-Carbayo, M.J. Automatic glaucoma classification using color fundus images based on convolutional neural networks and transfer learning. Br. J. Ophthalmol. 2019, 10, 892–913. [Google Scholar] [CrossRef] [PubMed]
  19. Hemelings, R.; Elen, B.; Barbosa-Breda, J.; Lemmens, S.; Meire, M.; Pourjavan, S.; Stalmans, I. Accurate prediction of glaucoma from colour fundus images with a convolutional neural network that relies on active and transfer learning. Acta Ophthalmol. 2020, 98, 94–100. [Google Scholar] [CrossRef]
  20. Natarajan, D.; Sankaralingam, E.; Balraj, K.; Thangaraj, V. Automated Segmentation Algorithm with Deep Learning Framework for Early Detection of Glaucoma. Concurr. Comput. Pract. Exp. 2021, 33, e6181. [Google Scholar] [CrossRef]
  21. Li, L.; Xu, M.; Wang, X.; Jiang, L.; Liu, H. Attention Based Glaucoma Detection: A Large-Scale Database and CNN Model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 10563–10572. [Google Scholar]
  22. Bajwa, M.N.; Malik, M.I.; Siddiqui, S.A.; Dengel, A.; Shafait, F.; Neumeier, W.; Ahmed, S. Two-stage framework for optic disc localization and glaucoma classification in retinal fundus images using deep learning. BMC Med. Inform. Decis. Mak. 2019, 19, 136. [Google Scholar]
  23. Sonti, K.; Dhuli, R. A new convolution neural network model “KR-NET” for retinal fundus glaucoma classification. Optik 2023, 283, 170861. [Google Scholar] [CrossRef]
  24. Al-Bander, B.; Al-Nuaimy, W.; Al-Taee, M.A.; Zheng, Y. Automated glaucoma diagnosis using deep learning approach. In Proceedings of the 2017 14th International Multi-Conference on Systems, Signals & Devices (SSD), Marrakech, Morocco, 28–31 March 2017; pp. 207–210. [Google Scholar]
  25. Abbas, Q. Glaucoma-Deep: Detection of Glaucoma Eye Disease on Retinal Fundus Images Using Deep Learning. Int. J. Adv. Comput. Sci. Appl. 2017, 8, 41–45. [Google Scholar] [CrossRef]
  26. Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in Vision: A Survey. ACM Comput. Surv. 2021, 54, 1–41. [Google Scholar] [CrossRef]
  27. Wassel, M.; Hamdi, A.M.; Adly, N.; Torki, M. Vision Transformers Based Classification for Glaucomatous Eye Condition. In Proceedings of the 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 21–25 August 2022; pp. 5082–5088. [Google Scholar]
  28. Hu, X.; Zhang, L.-X.; Gao, L.; Dai, W.; Han, X.; Lai, Y.-K.; Chen, Y. GLIM-Net: Chronic Glaucoma Forecast Transformer for Irregularly Sampled Sequential Fundus Images. IEEE Trans. Med. Imaging 2023, 42, 1875–1884. [Google Scholar] [CrossRef] [PubMed]
  29. Zhang, Z.; Yin, F.S.; Liu, J.; Wong, W.K.; Tan, N.M.; Lee, B.H.; Cheng, J.; Wong, T.Y. Origa-light: An online retinal fundus image database for glaucoma analysis and research. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Buenos Aires, Argentina, 1–4 September 2010; pp. 3065–3068. [Google Scholar]
  30. Sivaswamy, J.; Krishnadas, S.; Joshi, G.D.; Jain, M.; Tabish, A.U.S. Drishti-gs: Retinal image dataset for optic nerve head (onh) segmentation. In Proceedings of the IEEE International Symposium on Biomedical Imaging, Beijing, China, 29 April–2 May 2014; pp. 53–56. [Google Scholar]
  31. Komaki, G.M.; Kayvanfar, V. Grey wolf optimizer algorithm for the two-stage assembly flow shop scheduling problem with release time. J. Comput. Sci. 2015, 8, 109–120. [Google Scholar] [CrossRef]
  32. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  33. Wu, S.; Yang, J.; Yu, H.; Gou, L.; Li, X. Gaussian Guided IoU: A Better Metric for Balanced Learning on Object Detection. arXiv 2021, arXiv:2103.13613. [Google Scholar]
  34. Singh, P.B.; Singh, P.; Dev, H.; Tiwari, A.; Batra, D.; Chaurasia, B.K. Glaucoma Classification using Light Vision Transformer. In Proceedings of the International Conference on Intelligent Systems and Machine Learning (ICISML), Odisha, India, 27–28 July 2023; pp. 1–12. [Google Scholar]
  35. Lin, T.; Wang, Y.; Liu, X.; Qiu, X. A survey of transformers. AI Open 2022, 3, 111–132. [Google Scholar] [CrossRef]
  36. An, W.; Wang, H.; Sun, Q.; Xu, J.; Dai, Q.; Zhang, L. A PID controller approach for stochastic optimization of deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8522–8531. [Google Scholar]
  37. Maas, A.L.; Qi, P.; Xie, Z.; Hannun, A.Y.; Lengerich, C.T.; Jurafsky, D.; Ng, A.Y. Building dnn acoustic models for large vocabulary speech recognition. Comput. Speech Lang. 2017, 41, 195–213. [Google Scholar] [CrossRef]
  38. Gao, J.; Jiang, Y.; Zhang, H.; Wang, F. Joint disc and cup segmentation based on recurrent fully convolutional network. PLoS ONE 2020, 15, e0238983. [Google Scholar] [CrossRef] [PubMed]
  39. Samawi, H.J.; Al-Sultan, A.Y.; Al-Saadi, E.H. Optic disc segmentation in retinal fundus images using morphological techniques and intensity thresholding. In Proceedings of the 2020 International Conference on Computer Science and Software Engineering (CSASE), Duhok, Iraq, 16–18 April 2020; pp. 302–307. [Google Scholar]
  40. Tadisetty, S.; Chodavarapu, R.; Jin, R.; Clements, R.J.; Yu, M. Identifying the Edges of the Optic Cup and the Optic Disc in Glaucoma Patients by Segmentation. Sensors 2023, 23, 4668. [Google Scholar] [CrossRef]
  41. Sevastopolsky, A. Optic disc and cup segmentation methods for glaucoma detection with modification of U-Net convolutional neural network. Pattern Recognit. Image Anal. 2017, 27, 618–624. [Google Scholar] [CrossRef]
  42. Al-Bander, B.; Williams, B.M.; Al-Nuaimy, W.; Al-Taee, M.A.; Pratt, H.; Zheng, Y. Dense Fully Convolutional Segmentation of the Optic Disc and Cup in Colour Fundus for Glaucoma Diagnosis. Symmetry 2018, 10, 87. [Google Scholar] [CrossRef]
  43. Ramani, R.G.; Shanthamalar, J.J. Improved image processing techniques for optic disc segmentation in retinal fundus images. Biomed. Signal Process. Control 2020, 58, 101832. [Google Scholar] [CrossRef]
  44. Elangovan, P.; Nath, M.K. Performance analysis of optimizers for glaucoma diagnosis from fundus images using transfer learning. In Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication: Proceedings of MDCWC; Springer: Singapore, 2021; pp. 507–518. [Google Scholar]
  45. Diaz-Pinto, A.; Morales, S.; Naranjo, V.; Köhler, T.; Mossi, J.M.; Navea, A. CNNs for automatic glaucoma assessment using fundus images: An extensive validation. BioMed. Eng. Online 2019, 18, 29. [Google Scholar] [CrossRef] [PubMed]
  46. Guo, F.; Mai, Y.; Zhao, X.; Duan, X.; Fan, Z.; Zou, B.; Xie, B. Yanbao: A mobile app using the measurement of clinical parameters for glaucoma screening. IEEE Access 2018, 6, 77414–77428. [Google Scholar] [CrossRef]
  47. Chen, X.-W.; Lin, X. Big data deep learning: Challenges and perspectives. IEEE Access 2014, 2, 514–525. [Google Scholar] [CrossRef]
  48. Alagirisamy, M. Micro statistical descriptors for glaucoma diagnosis using neural networks. Int. J. Adv. Signal Image Sci. 2021, 7, 1–10. [Google Scholar] [CrossRef]
  49. Pranathi, K.; Pingili, M.; Mamatha, B. Fundus Image Processing for Glaucoma Diagnosis Using Dynamic Support Vector Machine. In Advances in Cognitive Science and Communications, ICCCE 2023: Proceedings of the International Conference on Communications and Cyber Physical Engineering, Hyderabad, India, 29–30 April 2022; Springer: Singapore, 2018; pp. 551–558. [Google Scholar]
  50. Chaudhary, P.K.; Jain, S.; Damani, T.; Gokharu, S.; Pachori, R.B. Detection of Primary and Secondary Glaucoma Using 2D-FBSE-EWT From Different Fundus Image Modalities. Authorea Prepr. 2023, 14, 1–10. [Google Scholar]
Figure 1. (a) Healthy image (normal optic nerve). (b) Glaucoma image (large optic cup).
Figure 1. (a) Healthy image (normal optic nerve). (b) Glaucoma image (large optic cup).
Applsci 14 05103 g001
Figure 2. Overview of the classification framework.
Figure 2. Overview of the classification framework.
Applsci 14 05103 g002
Figure 3. Flowchart for optic disc detection.
Figure 3. Flowchart for optic disc detection.
Applsci 14 05103 g003
Figure 4. Preprocessing step.
Figure 4. Preprocessing step.
Applsci 14 05103 g004
Figure 5. Segmentation result.
Figure 5. Segmentation result.
Applsci 14 05103 g005
Figure 6. Cropping of the optic disc region detected with YOLO.
Figure 6. Cropping of the optic disc region detected with YOLO.
Applsci 14 05103 g006
Figure 7. Optic disc segmentation method performances using the GWO metaheuristic algorithm (a): retinal images; (b): segmentation result, (c): ground truth.
Figure 7. Optic disc segmentation method performances using the GWO metaheuristic algorithm (a): retinal images; (b): segmentation result, (c): ground truth.
Applsci 14 05103 g007
Table 1. List of fundus image datasets used for glaucoma diagnosis.
Table 1. List of fundus image datasets used for glaucoma diagnosis.
DatasetNo of ImagesSegmentation/ClassificationNormalGlaucomaInput SizeLabel Ground TruthSource
DRISHTI-GS101Both31702896 × 1944LabeledAravind Eye Hospital in Madurai (India)
ORIGA650Classification4821683072 × 2048-Singapore Eye Research Institute
Table 2. Parameters of pre-trained CNN and ST models.
Table 2. Parameters of pre-trained CNN and ST models.
ModelOptimization AlgorithmLearning RateBatch SizeEpochActivation FunctionBase Model
Trainable
Loss Function
SwinTransformer Rmsprop0.00132150
DenseNet201Rmsprop0.00132150Relu
Softmax
FalseCategorical
Cross-entropy
Resnet50Adam0.00132150
InceptionV3Sgd0.0132150
VGG19Adamax0.00132150
Table 3. mAP, R, and P performance metrics for the YOLOV7 detection algorithm.
Table 3. mAP, R, and P performance metrics for the YOLOV7 detection algorithm.
Training/Testing[email protected] (%)PR
Training99.5311
Test99.5011
Table 4. Results of hold-out validation classification of one-stage and two-stage approaches for the DRISHTI-GS dataset.
Table 4. Results of hold-out validation classification of one-stage and two-stage approaches for the DRISHTI-GS dataset.
ModelAccuracy (%)Sensitivity (%)Specificity
(%)
F1_Score
(%)
Precision
(%)
Cropped Dataset (two-stage approach)
DenseNet20190.4893.3383.3390.4890.48
Vgg1990.4893.3383.3390.4890.48
ResNet5090.4810066.6789.8291.60
Swin Transformer76.1910016.6769.3982.14
Inception V380.9593.3350.0079.6480.25
Uncropped Dataset (one-stage approach)
Swin Transformer100100100100100
ResNet50858086.781.8783.33
Inception V3807581.274.6278.125
DenseNet201755784.670.8872.62
Vgg197566.776.571.5663.09
Table 5. Three-fold cross-validation classification result for cropped images (two-stage approach) in the DRISHTI-GS dataset.
Table 5. Three-fold cross-validation classification result for cropped images (two-stage approach) in the DRISHTI-GS dataset.
ModelsFoldTPTNFPFNRSPF1Acc
DenseNet-201Fold 12182387.5080.0085.8485.4985.29
Fold 22156291.045.4575.7284.0076.47
Fold 32346010040.0085.5878.9781.82
Overlapped
Glaucoma651714592.8654.8482.2887.25
Normal176551454.8492.8677.2764.15
Average73.8573.8579.7875.781.19
RESNET50Overlapped
Glaucoma661615494.2951.6181.4887.42
Normal166641551.6194.2962.7462.75
Average72.9572.9572.1175.0981.19
Inception V3Overlapped
Glaucoma631615790.0051.6180.7785.14
Normal166371551.6190.0069.5759.26
Average70.8170.8175.1772.2078.28
VGG19Overlapped
Glaucoma641318691.4341.9478.0584.21
Normal136461841.9491.4368.4252
Average66.6966.6973.2468.1176.20
Swin TransformerOverlapped
Glaucoma661021494.2932.2675.8784.08
Normal106642132.2694.2971.4344.44
Average 63.2863,2873.6564.2675.31
Table 6. Three-fold cross-validation classification result for uncropped images (one-stage approach) in the DRISHTI-GS dataset.
Table 6. Three-fold cross-validation classification result for uncropped images (one-stage approach) in the DRISHTI-GS dataset.
ModelsFoldTPTNFPFNRSPF1Acc
Swin TransformerFold 12165291.3054.5578.9078.4279.41
Fold 22382195.838091.0891.0391.18
Fold 32291195.659093.9493.9493.94
Overlapped
Glaucoma66238494.2974.1989.1991.67
Normal23664874.1994.2985.1979.31
Average84.2484.2487.1985.4988.18
Inception V3Overlapped
Glaucoma651912592.8661.2984.4288.44
Normal196551261.2992.8679.1769.09
Average77.0877.0881.8078.7783.18
DenseNet-201Overlapped
Glaucoma671417395.7145.1679.7687.01
Normal146731745.1695.7182.3558.33
Average70.4470.4481.0672.6780.21
RESNET50Overlapped
Glaucoma621813888.5758.0682.6785.52
Normal186281358.0688.5769.2363.16
Average73.3273.3275.9574.3479.26
VGG19Overlapped
Glaucoma5919121184.2961.2983.1083.69
Normal1959111261.2984.2963.3362.30
Average 72.7972.7973.2272.9977.21
Table 7. Results of hold-out validation classification of the one-stage and two-stage approaches for RF Models in the DRISHTI-GS dataset.
Table 7. Results of hold-out validation classification of the one-stage and two-stage approaches for RF Models in the DRISHTI-GS dataset.
ModelAccuracy (%)Sensitivity (%)Specificity
(%)
F1_Score
(%)
Precision
(%)
Cropped Dataset (two-stage approach)
DenseNet201+RF95.2410083.3395.1095.54
Resnet50+RF90.4810066.6789.8291.60
Vgg19+RF90.4810066.6789.8291.60
InceptionV3+RF85.7193.3366.6785.3085.36
Uncropped Dataset (one-stage approach)
DenseNet201+RF90.4810066.6789.8291.60
Vgg19+RF90.4810066.6789.8291.60
Resnet50+RF85.7110050.0083.9888.10
InceptionV3+RF85.7110050.0083.9888.10
Table 8. The RF Model’s three-fold cross-validation classification results for cropped images (two-stage approach) on the DRISHTI-GS dataset.
Table 8. The RF Model’s three-fold cross-validation classification results for cropped images (two-stage approach) on the DRISHTI-GS dataset.
ModelsFoldTPTNFPFNRSPF1Acc
DenseNet201+RFFold 1245501005087.8383.5485.29
Fold 22329010018.1880.9766.5373.53
Fold 3235501005087.5583.0784.85
Overlapped
Glaucoma701219010038.7078.6588.05
Normal127001938.7010010055.80
Average69.3569.3589.3371.9381.22
RESNET50+RFOverlapped
Glaucoma701219010038.7078.6588.05
Normal127001938.7010038.7155.81
Average69.3569.3558.6871.9381.19
Inception V3+RF Overlapped
Glaucoma701021010032.2676.9286.96
Normal107002132.2610076.9248.78
Average66.1366.1376.9267.8779.26
VGG19+RF Overlapped
Glaucoma681120297.1435.4877.2786.08
Normal116822035.4897.1484.6250
Average66.3166.3180.9568.0478.28
Table 9. RF model’s three-fold cross-validation classification result for uncropped images (one-stage approach) on the DRISHTI-GS dataset.
Table 9. RF model’s three-fold cross-validation classification result for uncropped images (one-stage approach) on the DRISHTI-GS dataset.
ModelsFoldTPTNFPFNRSPF1Acc
VGG19+RFFold 12365010054.5587.9283.8585.29
Fold 2246401006089.9287.2288.24
Fold 3236401006089.6786.8587.88
Overlapped
Glaucoma701813010058.0684.3491.50
Normal187001358.0610010073.47
Average79.0379.0392.1782.48587.14
DenseNet201+RFOverlapped
Glaucoma701021010032.2676.9286.96
Normal107002132.2610076.9248.78
Average77.4277.4291.6780.8786.16
Inception V3+RF Overlapped
Glaucoma691714198.5754.8483.1390.20
Normal176911454.8498.5794.4469.39
Average76.7176.7188.7979.8085.12
RESNET50+RFOverlapped
Glaucoma671318395.7141.9478.8286.45
Normal136731841.9495.7181.2555.32
Average68.8368.8380.0470.8979.20
Table 10. Results of hold-out validation classification of one-stage and two-stage approaches for the ORIGA dataset.
Table 10. Results of hold-out validation classification of one-stage and two-stage approaches for the ORIGA dataset.
Pre-Trained CNN ModelAccuracy
(%)
Recall
(%)
Specificity
(%)
F1_Score
(%)
Precision (%)
cropped (two-stage approach)
DenseNet20169.3372.4452.178089.32
ResNet5073.859673.8562.7454.53
Vgg1975.3875.38N84.96100
Inception V374.6276.0355.5684.7995.83
Swin Transformer96.1595.0510097.46100
uncropped (one-stage approach)
DenseNet20178.1579.1774.1985.2092.23
ResNet5073.859673.8562.7454.53
Vgg1972.3196.91083.9374.02
Inception V369.23100081.8269.23
Swin Transformer86.1588.2478.5790.9193.75
Table 11. Five-fold cross-validation classification result for (two-stage approach) in the ORIGA dataset.
Table 11. Five-fold cross-validation classification result for (two-stage approach) in the ORIGA dataset.
ModelsFoldTPTNFPFNRSPF1 Acc
Inception V3Fold 11687101748.4889.6978.0478.3679.23
Fold 2239431069.7096.9189.9089.5890.00
Fold333960197.0610099.2499.2399.23
Fold 434951010098.9699.2599.2399.23
Fold 5349600100100100100100
Overlapped
Glaucoma140468142883.3397.1090.9186.96
Normal468140281497.1083.3394.3595.71
Average90.2190.2192.6391.3393.53
VGG19Overlapped
Glaucoma135470121391.2297.5191.8491.5393.07
Normal470135131297.5191.2297.3197.41
Average94.3694.3694.5794.47
Swin TransformerOverlapped
Glaucoma100414686859.5285.8959.5259.52
Normal414100686885.8959.5285.8985.89
Average72.7172.7172.7172.7179.07
DenseNet-201Overlapped
Glaucoma1273681144175.6076.3552.7062.10
Normal3681274111476.3575.6089.9882.60
75.9775.9771.3472.3576.15
RESNET50Overlapped
Glaucoma048201680100NN
Normal48201680100074.1585.16
Average5050NN74.15
Table 12. Five-fold cross-validation classification result for uncropped image (one-stage approach) in the ORIGA dataset.
Table 12. Five-fold cross-validation classification result for uncropped image (one-stage approach) in the ORIGA dataset.
ModelsFoldTPTNFPFNRSPF1 Acc
Inception V3Fold 11486111942.4288.6675.3375.7976.92
Fold 2339700100100100100100
Fold330951488.2498.9696.1796.0996.15
Fold 431942391.1897.9296.1396.1496.15
Fold 5349600100100100100100
Overlapped
Glaucoma142468142684.5297.1091.0387.65
Normal468142261497.1084.5294.7495.90
Average90.8190.8192.8891.7793.84
VGG19Overlapped
Glaucoma136463193280.9596.0687.8284.21
Normal463136321996.0680.5993.5494.78
Average88.5088.3290.6889.4992.15
Swin transformOverlapped
Glaucoma127442404175.6091.7076.0575.82
Normal442127414091.7075.6091.5191.61
Average83.6583.6583.7883.7187.54
DenseNet-201Overlapped
Glaucoma114448345467.8692.9577.0372.15
Normal448114543492.9567.8689.2491.06
Average80.4080.4083.1381.6086.46
RESNET50Overlapped
Glaucoma048201680100NN
Normal48201680100074.1585.16
Average5050NN74.15
Table 13. Results of hold-out validation classification of the one-stage and two-stage approaches for RF Models in the ORIGA dataset.
Table 13. Results of hold-out validation classification of the one-stage and two-stage approaches for RF Models in the ORIGA dataset.
ModelAccuracy (%)Sensitivity (%)Specificity
(%)
F1_Score (%)Precision
(%)
Cropped Dataset with YOLO (two stage)
DenseNet201+RF77.6978.1572.7386.5196.88
Resnet50+RF73.8573.85NaN84.96100
Vgg19+RF76.9277.507086.1196.88
InceptionV3+RF76.1576.008085.9798.96
Uncropped Dataset (one stage)
DenseNet201+RF78.4677.8787.5087.1698.96
Resnet50+RF73.8573.85NaN84.96100
Vgg19+RF74.6275.206085.0797.92
InceptionV3+RF76.9277.057586.2497.92
Table 14. RF Models’ five-fold validation result for cropped image (two-stage approach) on the ORIGA dataset.
Table 14. RF Models’ five-fold validation result for cropped image (two-stage approach) on the ORIGA dataset.
ModelsFoldTPTNFPFNRSPF1Acc
DenseNet201+RFFold199612427.2798.9782.5476.6580.77
Fold259342815.1595.8871.4569.7175.38
Fold 3109062429.4193.7574.6573.7676.92
Fold 459332914.7196.8872.6469.2375.38
Fold 599062526.4793.7573.4872.6076.15
Overlapped
Glaucoma384622013022.6295.8565.5233.63
Normal462381302095.8522.6278.0486.03
Average 59.2359.2371.7859.8376.92
VGG19+RFOverlapped
Glaucoma234721014513.6997.9369.6922.89
Normal472231451097.9313.6976.4985.89
Average 55.8155.8173.0954.3976.15
Resnet50+RFOverlapped
Glaucoma048201680100NN
Normal48201680100074.1585.15
Average 5050NN74.15
Inception V3+RFOverlapped
Glaucoma10471111580.0597.7247.6110.58
Normal471101581197.720.0574.8884.78
Average48.8848.8861.2447.6874
Table 15. RF Models’ five-fold validation result for uncropped image (one-stage approach) on the ORIGA dataset.
Table 15. RF Models’ five-fold validation result for uncropped image (one-stage approach) on the ORIGA dataset.
ModelsFoldTPTNFPFNRSPF1Acc
Inception V3+RFFold 169612718.1898.9779.9972.7378.46
Fold 279522621.2197.9478.3373.4978.46
Fold 348973011.7692.7164.7465.7971.54
Fold42915320.0594.7962.1163.9271.54
Fold 559152914.7194.7969.0868.1773.85
Overlapped
Glaucoma244622014414.2995.8554.5522.64
Normal462241442095.8514.2976.2484.93
Average55.0755.0765.3953.7874.76
VGG19+RFOverlapped
Glaucoma234622014513.6995.8553.4821.8
Normal462231452095.8513.6976.1184.85
Average 54.7754.7764.7953.3374.61
Resnet50+RFOverlapped
Glaucoma048201680100NN
Normal48201680100074.1585.15
Average 5050NN74.15
DenseNet201+RFOverlapped
Glaucoma304513113817.8693.5749.1826.20
Normal451301383193.571776.5784.22
Average 55.7155.2862.8755.2174
Table 16. Comparison of different methods for segmentation on the Drishti-GS dataset.
Table 16. Comparison of different methods for segmentation on the Drishti-GS dataset.
StudiesRSAccDICEJACCARD
Gao et al. [38]95.7897.8397.64--
Samawi [39]--97.3--
Tadisetty [40]---94.389.3
Sevastopolsky [41] 85
AL-Bander et al. [42]--99.6994.990.4
Ramani et al. [43]95.2899.4399.3188.43-
Our Study 96.0499.5899.3994.1590.40
Table 17. Comparing the research classification results to the results of previous studies’ results.
Table 17. Comparing the research classification results to the results of previous studies’ results.
Author(s)MethodsAccRSPF1DatasetSplitCropped ROI
Elangovanetal et al. [17]CNN with
18 layers
86.6292.3148.1592.38-DRISHTI-GSHold-outNot Applied
78.3258.06 92.4484.36-ORIGA
Elangovanetal [44]VGG-19 91.5098.0949.17 92.87-DRISHTI-GS Hold-outNot Applied
ResNet-10180.5068.6088.80 81.20-ORIGA
Sreng [15]ShuffleNet(P1)86.67----DRISHTI-GS Hold-outSegmentation using DeepLabv3+ MDCNN
MobileNet(P1)81.54----ORIGA
Inception
+SVM(P2)
91.53----DRISHTI-GS
ResNet+SVM(P2)78.97----ORIGA
Ensemble of P1 85.19----DRISHTI-GS
88.86----ORIGA
Ensemble of P2 92.06----DRISHTI-GS
85.26----ORIGA
Diaz-Pinto et al. [45]Xception architecture75.2574.1971.43--DRISHTI-GS10-foldNot applied
Guo et al. [46]random forest classifier + smote76.979.973.8--ORIGAHold-outsegmented OD regions using U-net
Juan et al., 2019 [19]CNN a with 16 layers85.8998.19 26.4 84.75-DRISHTI GSHold-outNot Applied
71.88 61.62 79.3 69.44-ORIGA
Chen et al. [47]CNN a with 6 layers78.0255.6693.5685.8-DRISHTI-GSHold-outNot Applied
86.8698.7 67.6 85.21-ORIGA
Alagirisamy [48]linear vector quantizer-artificial neural network95.05 95.71 93.55--DRISHTI-GS Hold-outROI extraction with micro textures feature extraction
85.38 83.33 86.10--ORIGA
Pranath et al. [49]dynamic support vector machine85.5----ORIGA10 fold Not applied
Chaudhary et al. [50]image processing
2D-FBSE-EWT
9997100--DRISHTI-GS 5-fold Cropped histogram matching method
92.391.994.8--ORIGA
Present studySwin Transformer100100100100100DRISHTI-GS Hold-outUncropped
Swin Transformer88.1894.2974.1987.1985.493-fold Uncropped
Swin Transformer96.1595.0510010097.46ORIGAHold-outCropped
InceptionV393.8497.1084.5292.8891.775-foldUncropped
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Almeshrky, H.; Karacı, A. Optic Disc Segmentation in Human Retina Images Using a Meta Heuristic Optimization Method and Disease Diagnosis with Deep Learning. Appl. Sci. 2024, 14, 5103. https://doi.org/10.3390/app14125103

AMA Style

Almeshrky H, Karacı A. Optic Disc Segmentation in Human Retina Images Using a Meta Heuristic Optimization Method and Disease Diagnosis with Deep Learning. Applied Sciences. 2024; 14(12):5103. https://doi.org/10.3390/app14125103

Chicago/Turabian Style

Almeshrky, Hamida, and Abdulkadir Karacı. 2024. "Optic Disc Segmentation in Human Retina Images Using a Meta Heuristic Optimization Method and Disease Diagnosis with Deep Learning" Applied Sciences 14, no. 12: 5103. https://doi.org/10.3390/app14125103

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop