= number of images.

#### *3.2. Proposed Methodology*

An automatic DR classification model was developed using the dataset referenced in this paper; its general process is demonstrated in Figure 1. It demonstrates two different scenarios: case 1 in which the preprocessing step is performed using CLAHE followed by ESRGAN is used, and case 2 in which neither step is performed, while using augmentation of the images to prevent overfitting in both scenarios. Lastly, images were sent into the Inception-V3 model for classification step.

## 3.2.1. Preprocessing Using CLAHE and ESRGAN

Images of the retinal fundus are often taken from several facilities using various technologies. Consequently, given the high intensity variation in the photographs used by the proposed method, it was crucial to enhance the quality of DR images and get rid of various types of noise. All images in case 1 underwent a preliminary preprocessing phase prior to augmentation, and training necessitated various stages:


Figure 3 shows that first, CLAHE (shown in Figure 4) was used to improve the DR image's fine details, textures, and low contrast by redistributing the input image's lightness values [38]. Utilizing CLAHE, the input image was first sectioned into four small tiles. Each tile underwent histogram equalization with a clip limit, which involved five steps: computation, excess calculation, distribution, redistribution, and scaling and mapping using a cumulative distribution function (CDF). For each tile, a histogram was calculated, where bins value above the clip limit were aggregated and spread to other bins. Histogram values were then calculated using CDF for the input image pixel scale and then mapped tile to CDF values. To boost contrast, bilinear interpolation stitched the tiles together [39]. This technique improved local contrast enhancement while also making borders and slopes more apparent. Following this, all photos were scaled to suit the input of the learning model, which was 224 × 224 × 3. Figure 3 depicts the subsequent application of ESRGAN on the output of the preceding stage. ESRGAN [40] (shown in Figure 5) pictures can more closely mimic image artifacts' sharp edges [41]. To improve performance, ESRGAN adopted the basic architecture of SRResNet, in which Residual-in-Residual Dense Blocks are substituted for the traditional ESRGAN basic blocks, as shown in Figure 5. Intensity differences between images can be rather large, thus images were normalized so that their intensities fell within the range −1 to 1. This kept the data within acceptable bounds and removed noise. As a result of normalization, the model was less sensitive to variations in weights, making it easier to tune. Since the method shown in Figure 3 improved the

yielded more accurate findings.

image's contrast while simultaneously emphasizing the image's boundaries and arcs, it

*Healthcare* **2023**, *11*, x 8 of 18

*Healthcare* **2023**, *11*, x 8 of 18

*Healthcare* **2023**, *11*, x 8 of 18

**Figure 3.** Samples of the proposed image-enhancement techniques: original, unedited image; then rendition of this same image with CLAHE; finally final enhanced image after applying ESRGAN*.*  **Figure 3.** Samples of the proposed image-enhancement techniques: original, unedited image; then rendition of this same image with CLAHE; finally final enhanced image after applying ESRGAN. **Figure 3.**Samples of the proposed image-enhancement techniques: original, unedited image; then rendition of this same image with CLAHE; finally final enhanced image after applying ESRGAN*.*  **Figure 3.** Samples of the proposed image-enhancement techniques: original, unedited image; then rendition of this same image with CLAHE; finally final enhanced image after applying ESRGAN*.* 

**Figure 4.** CLAHE architecture. **Figure 4.** CLAHE architecture. **Figure 4.** CLAHE architecture. **Figure 4.** CLAHE architecture.

#### 3.2.2. Data Augmentation 3.2.2. Data Augmentation 3.2.2. Data Augmentation 3.2.2. Data Augmentation

Data augmentation was implemented on the training set to increase the number of images and alleviate the issue of an imbalanced dataset before exposing Inception-V3 to the dataset images. In most cases, deeper learning models perform better when given more data to learn from. We can utilize the characteristics of DR photos by applying several modifications to each image. A deep neural network (DNN) is unaffected by any changes made to the input image, including scaling it up or down, flipping it horizontally or vertically, or rotating it by a certain number of degrees. Regulating the data, minimizing overfitting, and rectifying imbalances in the dataset are all accomplished through the use Data augmentation was implemented on the training set to increase the number of images and alleviate the issue of an imbalanced dataset before exposing Inception-V3 to the dataset images. In most cases, deeper learning models perform better when given more data to learn from. We can utilize the characteristics of DR photos by applying several modifications to each image. A deep neural network (DNN) is unaffected by any changes made to the input image, including scaling it up or down, flipping it horizontally or vertically, or rotating it by a certain number of degrees. Regulating the data, minimizing overfitting, and rectifying imbalances in the dataset are all accomplished through the use Data augmentation was implemented on the training set to increase the number of images and alleviate the issue of an imbalanced dataset before exposing Inception-V3 to the dataset images. In most cases, deeper learning models perform better when given more data to learn from. We can utilize the characteristics of DR photos by applying several modifications to each image. A deep neural network (DNN) is unaffected by any changes made to the input image, including scaling it up or down, flipping it horizontally or vertically, or rotating it by a certain number of degrees. Regulating the data, minimizing overfitting, and rectifying imbalances in the dataset are all accomplished through the use Data augmentation was implemented on the training set to increase the number of images and alleviate the issue of an imbalanced dataset before exposing Inception-V3 to the dataset images. In most cases, deeper learning models perform better when given more data to learn from. We can utilize the characteristics of DR photos by applying several modifications to each image. A deep neural network (DNN) is unaffected by any changes made to the input image, including scaling it up or down, flipping it horizontally or vertically, or rotating it by a certain number of degrees. Regulating the data, minimizing overfitting, and rectifying imbalances in the dataset are all accomplished through the use of

of data augmentations (i.e., shifting, rotating, and zooming). One of the transformations used in this investigation was horizontal shift augmentation, which involves shifting the

of data augmentations (i.e., shifting, rotating, and zooming). One of the transformations used in this investigation was horizontal shift augmentation, which involves shifting the

of data augmentations (i.e., shifting, rotating, and zooming). One of the transformations used in this investigation was horizontal shift augmentation, which involves shifting the

data augmentations (i.e., shifting, rotating, and zooming). One of the transformations used in this investigation was horizontal shift augmentation, which involves shifting the pixels of an image horizontally while maintaining the image's aspect ratio, with the step size being specified by an integer between 0 and 1. Another kind of transformation is rotation, in which the image is arbitrarily rotated by an angle between 0 and 180 degrees. To create fresh samples for the network, all prior alterations to the training set's images were applied. size being specified by an integer between 0 and 1. Another kind of transformation is rotation, in which the image is arbitrarily rotated by an angle between 0 and 180 degrees. To create fresh samples for the network, all prior alterations to the training set's images were applied. size being specified by an integer between 0 and 1. Another kind of transformation is rotation, in which the image is arbitrarily rotated by an angle between 0 and 180 degrees. To create fresh samples for the network, all prior alterations to the training set's images were applied.

In this study, two scenarios were utilized to train Inception-V3. The first was to apply augmentation to the enhanced images, as depicted in Figure 6, and the second was to apply augmentation to the raw images, as depicted in Figure 7. Both Figures 4 and 5 are attempts to expand data volume by making slightly modified copies of current data or by synthesizing data generated from existing data while keeping all other parameters constant (Figures 4 and 5), with the same total number of images in both cases. In this study, two scenarios were utilized to train Inception-V3. The first was to apply augmentation to the enhanced images, as depicted in Figure 6, and the second was to apply augmentation to the raw images, as depicted in Figure 7. Both Figures 4 and 5 are attempts to expand data volume by making slightly modified copies of current data or by synthesizing data generated from existing data while keeping all other parameters constant (Figures 4 and 5), with the same total number of images in both cases. In this study, two scenarios were utilized to train Inception-V3. The first was to apply augmentation to the enhanced images, as depicted in Figure 6, and the second was to apply augmentation to the raw images, as depicted in Figure 7. Both Figures 4 and 5 are attempts to expand data volume by making slightly modified copies of current data or by synthesizing data generated from existing data while keeping all other parameters constant (Figures 4 and 5), with the same total number of images in both cases.

*Healthcare* **2023**, *11*, x 9 of 18

*Healthcare* **2023**, *11*, x 9 of 18

**Figure 6.** Illustrations of the same image, augmented with enhancement*.*  **Figure 6.** Illustrations of the same image, augmented with enhancement. **Figure 6.** Illustrations of the same image, augmented with enhancement*.* 

**Figure 7.** Illustrations of the same image augmented without enhancement*.*  **Figure 7.** Illustrations of the same image augmented without enhancement*.*  **Figure 7.** Illustrations of the same image augmented without enhancement.

In a second use of data augmentation techniques, the issues of inconsistent sample sizes and complicated classifications were resolved. As seen in Table 2, the APTOS dataset exemplifies the "imbalanced class" because the samples are not distributed evenly throughout the several classes. After applying augmentation techniques to the dataset, the classes are obviously balanced for both scenarios, as depicted in Figure 8. In a second use of data augmentation techniques, the issues of inconsistent sample sizes and complicated classifications were resolved. As seen in Table 2, the APTOS dataset exemplifies the "imbalanced class" because the samples are not distributed evenly throughout the several classes. After applying augmentation techniques to the dataset, the classes are obviously balanced for both scenarios, as depicted in Figure 8. In a second use of data augmentation techniques, the issues of inconsistent sample sizes and complicated classifications were resolved. As seen in Table 2, the APTOS dataset exemplifies the "imbalanced class" because the samples are not distributed evenly throughout the several classes. After applying augmentation techniques to the dataset, the classes are obviously balanced for both scenarios, as depicted in Figure 8.

#### 3.2.3. Learning Model (Inception-V3)

**NUMBER OF TRAINING IMAGES IN APTOS DATASET** No DR Mild DR Moderate DR Severe DR Proliferative DR **NUMBER OF TRAINING IMAGES IN APTOS DATASET** No DR Mild DR Moderate DR Severe DR Proliferative DR In this section, the approach's fundamental theory is outlined and explained. Inceptionv3 [11,12] is among transfer learning pretrained models, superseding the original architecture for Inception-v1 [42] and Inception-v2 [43]. The Inception-v3 model is trained using the ImageNet datasets [44,45], which contain the information required for identifying one thousand classes. The error rate for the top five in ImageNet is 3.5%, while the error rate for the top one was lowered to 17.3%.

**No DR, 1996, 21% Mild DR, 1813, 20% Moderate DR, 1972, Severe DR, 1716, 18% Proliferative DR, 1863, 20% No DR, 1996, 21% Mild DR, 1813, 20% Moderate DR, 1972, Severe DR, 1716, 18% Proliferative DR, 1863, 20%** Inception was influenced in particular by technique of Serre et al. [46], which processes information in several stages. By adopting the Lin et al. [47] method, the developers of Inception were able to improve the model precision of the neural networks, making them a significant design requirement. As a result of the dimension reduction to 1\*1 convolutions, this also protected them from computing constraints. Researchers were able to significantly reduce the amount of time and effort spent on DL picture classification using Inception [48]. Using only the theoretical explanations offered by Arora et al. [49], they emphasized discovering an optimal spot between the typical technique of improving performance increasing both depth and size—and layer separability. When utilized independently, both procedures are computationally expensive. This was the fundamental goal of the 22-layer

**21%**

**21%**

architecture employed by the Inception DL system, in which all filters are learned. On the basis of research by Arora et al. [49], a correlation statistical analysis was developed to generate highly associated categories that were input into the subsequent layer. The 1 × 1 layer, the 3 × 3 layer, and the 5 × 5 convolution layer were all inspired by the concept of multiscale processing of visual data. Each of these layers eventually becomes a set of 1 × 1 convolutions [48] following a process of dimension reduction. In a second use of data augmentation techniques, the issues of inconsistent sample sizes and complicated classifications were resolved. As seen in Table 2, the APTOS dataset exemplifies the "imbalanced class" because the samples are not distributed evenly throughout the several classes. After applying augmentation techniques to the dataset, the classes are obviously balanced for both scenarios, as depicted in Figure 8.

*Healthcare* **2023**, *11*, x 9 of 18

were applied.

size being specified by an integer between 0 and 1. Another kind of transformation is rotation, in which the image is arbitrarily rotated by an angle between 0 and 180 degrees. To create fresh samples for the network, all prior alterations to the training set's images

constant (Figures 4 and 5), with the same total number of images in both cases.

**Figure 6.** Illustrations of the same image, augmented with enhancement*.* 

**Figure 7.** Illustrations of the same image augmented without enhancement*.* 

In this study, two scenarios were utilized to train Inception-V3. The first was to apply augmentation to the enhanced images, as depicted in Figure 6, and the second was to apply augmentation to the raw images, as depicted in Figure 7. Both Figures 4 and 5 are attempts to expand data volume by making slightly modified copies of current data or by synthesizing data generated from existing data while keeping all other parameters

**Figure 8.** Number of training images after using augmentation techniques.

#### **4. Experimental Results**

#### *4.1. Instruction and Setup of Inception-V3*

To demonstrate the effectiveness of the deployed DL system and to compare results to industry standards, tests were carried out on the APTOS dataset. The dataset was divided into three categories in accordance with the suggested training method. Eighty percent of the data was utilized for training (9952 photographs), ten percent for testing (1012 photos), and the remaining ten percent was randomly selected and used as a validation set (1025 photos) to evaluate performance and save the best weight combinations. All photographs were reduced in size during the training process to 224 × 224 × 3 pixel resolution. We tested the proposed system's TensorFlow Keras implementation on a Linux desktop equipped with a GPU RTX3060 and 8 GB of RAM.

Using the Adam optimizer and a method that slows down training when learning has stalled for too long, the proposed framework was first trained on the APTOS dataset (i.e., validation patience). Throughout the entirety of the training process, hyperparameters were input into the Adam optimizer. We used a range of 1 <sup>×</sup> <sup>10</sup><sup>3</sup> to 1 <sup>×</sup> <sup>10</sup><sup>5</sup> for the learning rate, 2–64 for the batch size (with an increase of 2× the previous value), 50 epochs, 10 for patience, and 0.90 for momentum. Our arsenal of anti-infectious measures was completed by a method known as "batching" for the dissemination of infectious forms.

#### *4.2. Evaluative Parameters*

This section describes the evaluation methods and their results. Classifier accuracy (*Acc*) is a standard performance measure. It is determined by dividing the number of successfully categorized instances (images) by the total number of examples in the dataset (Equation (1)). Picture categorization systems are often evaluated using precision (*Prec*) and recall (*Re*). As demonstrated in Equation (2), precision improves with the number of accurately labeled photos, whereas recall is the ratio of properly categorized images in the dataset to those related numerically (3). The higher the *F*1-score, the more reliable the system is at making predictions about the future. The *F*1-score can be determined using Equation (4), (*F1sc*). With respect to the study's last criterion, top N accuracy, it was found that the highest probability answers from model N should coincide with the expected softmax distribution. An accurate classification is made if at least one of N predictions corresponds to the target label.

$$Accuracy = \frac{T^p + T^n}{T^p + T^n + F^p + F^n} \tag{1}$$

$$Precision = \frac{T^p}{T^p + F^p} \tag{2}$$

$$Recall = \frac{T^p}{T^p + F^n} \tag{3}$$

$$F1\text{-score} = 2 \ast \left(\frac{Prec \ast Re}{Prec + Re}\right) \tag{4}$$

True positives, represented by the symbol (*T p* ), are successfully anticipated positive cases, and true negatives (*T n* ) are effectively predicted negative scenarios. False positives (*F p* ) are falsely predicted positive situations, whereas false negatives (*F n* ) are falsely projected negative situations.

#### *4.3. Performance of Inception-V3 Model Outcomes*

Considering the APTOS dataset, two distinct cases sets were investigated, in which Inception-V3 was applied to our dataset in two distinct scenarios, the first with enhancement (CLAHE + ESRGAN) and the second without enhancement (CLAHE + ESRGAN), as depicted in Figure 2. We split it up this way to cut down on the total amount of time needed to conduct the project. To train a model, 50 epochs were used, with learning rates ranging from 1 <sup>×</sup> <sup>10</sup><sup>3</sup> to 1 <sup>×</sup> <sup>10</sup><sup>5</sup> , and batch sizes varying from 2 to 64. To achieve the highest possible level of precision, Inception-V3 was further tweaked by freezing between 140 and 160 layers. Several iterations of the same model with the same parameters were used to generate a model ensemble, since random weights were generated for each iteration, the precision fluctuated from iteration to iteration. Mean and standard deviation statistics for this procedure are displayed in Tables 3 and 4, respectively, for the cases where the first 143 layers were frozen with CLAHE + ESRGAN and the cases where they were not.


**Table 3.** Average and standard deviation accuracy with enhancement (CLAHE + ESRGAN).


**Table 3.** *Cont.*



The top performance from each iteration was saved and is shown in Tables 5 and 6, for case 1 and case 2, respectively, revealing that the best results produced with and without preprocessing using CLAHE + ESRGAN were 98.7% and 80.87%, respectively. Figure 9 depicts the optimal outcome for the two scenarios based on the utilized evaluation metrics case 1 using CLAHE and ESRGAN, and case 2 without using them.

**Table 5.** Best accuracy with enhancement (CLAHE + ESRGAN).



**Table 6.** Best accuracy without enhancement (CLAHE + ESRGAN).

Figures 10 and 11 show the confusion matrix with and without using CLAHE + ESRGAN, respectively. Figures 10 and 11 show the confusion matrix with and without using CLAHE + ESRGAN, respectively. Figures 10 and 11 show the confusion matrix with and without using CLAHE + ESRGAN, respectively.

**Figure 10.** Best confusion matrix of Inception-V3 with enhancement (with CLAHE + ESRGAN)*.*  **Figure 10.** Best confusion matrix of Inception-V3 with enhancement (with CLAHE + ESRGAN).

**Figure 10.** Best confusion matrix of Inception-V3 with enhancement (with CLAHE + ESRGAN)*.* 

**Figure 9.** Best results for both scenarios*.*  **Figure 9.** Best results for both scenarios. **Figure 9.** Best results for both scenarios*.* 

ESRGAN)*.*  Tables 7 and 8 show the total number of photos utilized for testing in each class for the APTOS dataset. According to the data, it is clear that the No DR class has the most images with 504, and its *Prec*, *Re*, and *F1sc* give the highest values of 99 100 and 100% for Tables 7 and 8 show the total number of photos utilized for testing in each class for the APTOS dataset. According to the data, it is clear that the No DR class has the most images with 504, and its *Prec*, *Re*, and *F1sc* give the highest values of 99 100 and 100% for case 1, and 97, 97, and 97% for case 2.


case 1, and 97, 97, and 97% for case 2. **Table 7.** Detailed results for each class using CLAHE + ESRGAN.

**Table 8.** Detailed results for each class without using CLAHE + ESRGAN. **Table 8.** Detailed results for each class without using CLAHE + ESRGAN.


infections, while reducing their effort, was demonstrated to be practical in real-world scenarios. Using retinal pictures to improve the accuracy with which ophthalmologists identify infections, while reducing their effort, was demonstrated to be practical in real-world scenarios.

Using retinal pictures to improve the accuracy with which ophthalmologists identify

#### *4.4. Evaluation Considering a Variety of Other Methodologies*

Effectiveness was compared to that of other methods. According to Table 9, our method exceeds other alternatives in terms of effectiveness and performance. The proposed inception model achieved an overall accuracy rate of 98.7%, surpassing the present methods.



## **5. Discussion**

Based on CLAHE and ESRGAN, a novel DR categorization scheme is presented in this research. The developed model was tested on the DR images founded in the APTOS 2019 dataset. There were two training scenarios: case 1 with CLAHE + ESRGAN applied to the APTOS dataset, and case 2 without CLAHE + ESRGAN. Through 80:20 hold-out validation, the model attained a five-class accuracy rate of 98.7% for case 1 and 80.87% for case 2. The proposed method classified both cases scenarios using the pretrained Inception-V3 infrastructure. Throughout model construction, we evaluated the classification performance of two distinct scenarios and found that enhancement techniques produced the best results (Figure 9). The main contributing element in our methodology was the general resolution enhancement of CLAHE + ESRGAN, which we proved, with evidence, is responsible for the great improvement in the accuracy.

## **6. Conclusions**

By identifying retinal images displayed in the APTOS dataset, we established a strategy for quickly and accurately diagnosing five distinct forms of cancer. The proposed

method employs case 1 with images enhanced with CLAHE and ESRGAN, and case 2 with original images. The case 1 scenario employs four-stage picture enhancement techniques to increase the image's luminance and eliminate noise. CLAHE and ESRGAN were the two stages with the best impact on accuracy, as demonstrated by experimental results. State-ofthe-art techniques in preprocessed medical imagery were employed to train Inception-V3 with augmentation techniques that helped reduce overfitting and raised the entire competencies of the suggested methodology. This solution showed that when using Inception-V3, the conception model achieved a correctness of 98.7% ≈ 99% for the case 1 scenario and 80.87% ≈ 81% for the case 2 scenario, both of which are in line with the accuracy of trained ophthalmologists. The use of CLAHE and ESRGAN in the preprocessing step further contributed to the study's novelty and significance. The proposed methodology outperformed established models, as evidenced by a comparison of their respective strengths and weaknesses. To prove the effectiveness of the proposed method, it must be tested on a sizable and intricate dataset, ideally consisting of a significant number of potential DR instances. In the future, new datasets may be analyzed using DenseNet, VGG, or ResNet, as well as additional augmentation approaches. Additionally, ESRGAN and CLAHE can be conducted independently to determine their impact on the classification procedure.

**Author Contributions:** Conceptualization, W.G.; Data curation, W.G.; Formal analysis, G.A. and W.G.; Funding acquisition, G.A.; Methodology, W.G. and M.H.; Project administration, M.H.; Supervision, W.G. and M.H.; Writing—original draft, W.G.; Writing—review & editing, W.G. and M.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number 223202.

**Institutional Review Board Statement:** Not Applicable.

**Informed Consent Statement:** Not Applicable.

**Data Availability Statement:** Will be furnished on request.

**Acknowledgments:** The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number 223202.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
