A Deep Learning Model for Detecting Diabetic Retinopathy Stages with Discrete Wavelet Transform

Mutawa, A. M.; Al-Sabti, Khalid; Raizada, Seemant; Sruthi, Sai

doi:10.3390/app14114428

Open AccessArticle

A Deep Learning Model for Detecting Diabetic Retinopathy Stages with Discrete Wavelet Transform

by

A. M. Mutawa

^1,*

,

Khalid Al-Sabti

²,

Seemant Raizada

² and

Sai Sruthi

¹

Department of Computer Engineering, Kuwait University, Safat 13060, Kuwait

²

Kuwait Specialized Eye Center, Kuwait City 35453, Kuwait

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(11), 4428; https://doi.org/10.3390/app14114428

Submission received: 1 March 2024 / Revised: 16 May 2024 / Accepted: 20 May 2024 / Published: 23 May 2024

(This article belongs to the Special Issue Recent Advances in and Applications of Medical Image Processing and Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

Diabetic retinopathy (DR) is the primary factor leading to vision impairment and blindness in diabetics. Uncontrolled diabetes can damage the retinal blood vessels. Initial detection and prompt medical intervention are vital in preventing progressive vision impairment. Today’s growing medical field presents a more significant workload and diagnostic demands on medical professionals. In the proposed study, a convolutional neural network (CNN) is employed to detect the stages of DR. This research is crucial for studying DR because of its innovative methodology incorporating two different public datasets. This strategy enhances the model’s capacity to generalize unseen DR images, as each dataset encompasses unique demographics and clinical circumstances. The network can learn and capture complicated hierarchical image features with asymmetric weights. Each image is preprocessed using contrast-limited adaptive histogram equalization and the discrete wavelet transform. The model is trained and validated using the combined datasets of Dataset for Diabetic Retinopathy and the Asia-Pacific Tele-Ophthalmology Society. The CNN model is tuned in with different learning rates and optimizers. An accuracy of 72% and an area under curve score of 0.90 was achieved by the CNN model with the Adam optimizer. The recommended study results may reduce diabetes-related vision impairment by early identification of DR severity.

Keywords:

artificial intelligence; convolutional neural networks; diabetic retinopathy; deep learning; medical image analysis

1. Introduction

Diabetes is a debilitating, persistent condition characterized by an insufficient production or use of insulin in the body. The count of individuals aged 20 to 79 with diabetes is supposed to increase from 536.6 million in 2021 to 783.2 million by 2045 [1]. Type 1 and Type 2 are its two forms. Retinopathy usually strikes 60% of type 2 diabetics and nearly all type 1 diabetics in the 20 years after diagnosis. The worldwide occurrence of diabetes has been reported to be 9.8% [2].

Diabetic retinopathy (DR) is the main underlying factor behind worldwide vision loss, impacting around 33% of individuals diagnosed with diabetes. Consequently, timely diagnosis and therapy of DR can significantly diminish the likelihood of permanent visual impairment, eye floaters, and eventual vision loss. Within 20 years of diagnosis, retinopathy impacts around 60% of patients with type 2 diabetes and nearly all with type 1 diabetes. Nevertheless, DR can progress without exhibiting any symptoms until it poses a risk to vision.

Manually inspecting and determining the severity of DR requires a significant amount of time and is prone to errors. Precise evaluation of the impact of DR requires highly experienced ophthalmology skills [3]. The screening approach depends on an expert’s evaluation of color fundus photographs, often misdiagnosing several instances and delaying treatment [4].

Medical imaging has employed deep learning algorithms for disease screening and detection. Diverse approaches are used to identify DR at its earlier stages, with machine learning playing a significant part. Artificial intelligence employs machine learning techniques to enhance the learning process independently, eliminating direct human interaction. Deep learning is often used in predicting medical conditions [5]. Deep learning and machine learning offer advantages in many fields, including computer vision, medical image analysis, and fraud detection systems [6,7].

Deep learning extracts fundamental features from massive datasets, utilizing feature extraction, which data mining tools assess and predict. These algorithms anticipate optimal, exact outputs, simplifying early illness forecasts and saving millions of lives [8]. Computer vision and medical image processing applications extensively use deep learning techniques, notably convolutional neural networks (CNNs) [8,9,10,11,12].

Most conventional techniques cannot reveal hidden characteristics and their relationships. Instead of learning valuable information, they retain useless features such as size, brightness, and rotation, decreasing performance and accuracy [13]. Another concern is that irrelevant anatomical differences or other extraneous information in retinal images might hamper the model’s generalization capacity. Although not essential for the main diagnostic objective, these subtleties may lead to overfitting of the model. When the model is used on other datasets that do not have the same incidental characteristics, this overfitting might make it less valuable. It is crucial to build and train the model using pertinent factors that are constant over various datasets to overcome this issue. It will increase the model’s durability and flexibility in clinical situations [14].

Non-proliferative (NonPrDR) and proliferative (PrDR) are the two primary stages of DR. NonPrDR refers to the initial stages of DR and can be mild, moderate, and severe. Conversely, PrDR is the advanced stage of the illness. Many people do not experience symptoms until the disease progresses to the PrDR stage [15]. Figure 1 shows several stages of retinal imaging with DR from the Asia-Pacific Tele-Ophthalmology Society (APTOS) dataset [16].

The combination of machine learning and deep learning indicates the potential to transform the categorization of DR [17]. However, demand is increasing for automated technologies that can efficiently evaluate vast datasets while maintaining high accuracy due to the growing volume of collected patient data. The demand for efficient and cost-effective techniques for early DR identification is of utmost importance [18]. As each dataset varies in demographics and clinical situations, this research is essential to the study of DR because it employs a unique strategy, combining two different datasets to improve the model’s capacity to generalize the unseen DR images.

Our work aimed to develop a deep learning model that successfully predicts DR images using a CNN model. The model was trained, validated, and tested on APTOS-DDR fundus datasets. Images were preprocessed using contrast-limited adaptive histogram equalization (CLAHE) to perform histogram equalization and then discrete wavelet transform (DWT) to extract the coefficients. The CNN model can extract the relevant information or features from the preprocessed images. To train the model, a multi-class classification technique categorizes DR into five stages: normal or no DR, mild, moderate, severe, and PrDR. Limited academic research evaluates CLAHE and DWT with combined datasets, focusing on their performance in categorizing the DR stages.

Significant contributions to this study include the following:

The study utilized two public datasets, including APTOS and the Dataset for Diabetic Retinopathy (DDR), to categorize each stage of DR. The APTOS and DDR datasets were merged to train, validate, and test the model. Generalizability can be mitigated by blending diverse datasets and evaluating the model’s performance on an unseen test set.
A CNN model has been constructed to predict the DR stages. The learning rate is tuned during the model training, and optimizers such as adaptive moment estimation (Adam), root mean square propagation (RMSProp), Adamax, adaptive gradient algorithm (Adagrad), and stochastic gradient descent (SGD) are evaluated on the model. Data augmentation is used in the training set to mitigate the issue of overfitting.
We provide a novel image classification framework that carefully blends many effective processing algorithms to improve classification performance. Each image is preprocessed using CLAHE. Altering the intensity distribution in specific regions of the picture, this approach enhances the image contrast. We then employ DWT to divide the images into frequency components to recover spatially covered features.
Our methodology also offers new research paths, notably in integrating DWT with other deep learning architectures and applying it to complicated image processing problems.

The consequent portions of the article are categorized as follows: The background research in Section 2 shows the literature review of the DR, followed by an overview of the investigation methodology in Section 3. The results are presented and examined in Section 4, and Section 5 discusses them. Finally, Section 6 provides a conclusion to the study.

2. Literature Review

Deep learning has been employed in medical imaging for illness diagnosis. Several research papers have investigated the prediction of DR, utilizing CNN and pretrained models, including DenseNet, Visual Geometry Group (VGG), ResNet, and Inception [19]. Moreover, specific study is required to enhance the efficiency of the deep learning framework [20]. Usman et al. [4] observed that a pretrained CNN model detected lesions better, with 94.4% accuracy. The researchers used principal component analysis to characterize the fundus pictures.

Gargeya and Leng [21] introduced a sophisticated automated technique for identifying DR using deep learning methods, which offers a neutral alternative for detecting DR using any device. Another work by Areeb and Nadeem [22] categorizes DR photos using support vector machines and random forest methods, with convolutional layers employed to extract the features. The analysis revealed high sensitivity and specificity of random forest, surpassing other approaches in performance.

The surge in transfer learning over the past few years can be linked to the scarcity of supervised learning options for a diverse array of practical scenarios and the abundance of pretrained models. Several research studies have constructed DR models utilizing VGG16 [23,24], DenseNet [25], InceptionV3 [26], and ResNet [27]. In their study, Qian et al. [28] utilized transfer learning and attention approaches to classify DR and obtained an accuracy rate of 83.2%. Another paper [25] utilized a “convolutional block attention module” in conjunction with the APTOS dataset. The DR grading exercise was completed with an accuracy rate of 82%.

Zhang et al. [29] employed a limited retinal dataset to refine a pre-existing CNN model using a process known as fine-tuning. Using the EyePACS and Messidor datasets, they outperformed the accuracy and sensitivity of previous DR grading techniques. The research authors [30] investigated DR classification by combining Swin with wavelet transformation. Their performance attained an accuracy rate of 82%, along with a sensitivity of 0.4626. An attention model was used in conjunction with a pretrained DenseNet model by Dinpajhouh and Seyyedsalehi to determine the severity of DR, utilizing the APTOS database [31], with an accuracy of 83.69%.

A machine learning approach was suggested to determine the main causes of DR in individuals with elevated glucose levels [32]. Employing transfer learning approaches, this process isolates and organizes the features of DR into many classes. Before the classification phase, the entropy technique selects the most distinguishing attributes from a set of features. The objective of the model is to ascertain the level of severity of DR in the patient’s eye, which will be valuable in precisely categorizing its severity. Murugappan et al. [33] utilized a few-shot learning approach to develop a tool for detecting and evaluating DR. The approach employs an attention mechanism and episodic learning to train the model with limited training data. When evaluated on the APTOS dataset, the model achieves high DR classification and grading regarding accuracy and sensitivity.

CNN’s U-Net architecture is utilized for image segmentation applications [34]. The study by Jena et al. [35] employed an asymmetric deep learning architecture utilizing U-Net architecture for DR image segmentation. Pictures from the green channel are utilized to analyze and thus improve the performance using CLAHE. The paper [36] presents a computerized system that mixes U-Net and transfer learning to diagnose DR automatically from fundus pictures. The researchers utilized the publicly accessible Kaggle DR Detection dataset and improved an Inception V3 model already trained to extract features. To partition the retina and optic disc, the collected characteristics are subsequently input into a U-Net model. Divided into segments, the pictures are further categorized into five degrees of DR severity using a multi-layer perceptron classifier. On the Kaggle dataset, their strategy outperforms other cutting-edge techniques with an accuracy of 88.3%.

Transfer learning has been utilized in the domain of DR to overcome the limitation of having little annotated data available to train deep learning models. The researchers utilized AlexNet, InceptionV3, VGG16, DenseNet169, VGG19, and ResNet50, trained on large datasets such as ImageNet, as feature extractors for DR pictures [36,37,38,39,40]. To achieve a highly precise DR classification, the pre-existing models were adjusted and optimized using a labeled dataset specifically designed for DR, such as the APTOS, EyePACS, and Messidor datasets. C. Zhang et al. [29] investigated a transfer learning method for assessing DR without needing a source. The researchers employed a pretrained CNN model and adjusted its parameters on a limited number of retinal pictures without needing more source data. The evaluation uses the EyePACS and Messidor datasets, surpassing other cutting-edge methodologies in accuracy and sensitivity for assessing DR.

Data augmentation is widely regarded as the predominant technique for addressing the problem of imbalanced data in image classification tasks [41]. It involves creating new data components from current data to enhance the data artificially. Image augmentation techniques, such as cropping, rotating, zooming, vertical and horizontal flip, width shift, and rescaling, can modify the pictures [42]. Mungloo-Dilmohamud [43] found an enhanced quality of transfer learning in DR classification using a data augmentation technique. Different datasets and CLAHE techniques were employed by Ashwini et al. [44] for detecting DR. The images were preprocessed using the DWT method. According to their study, the individual datasets performed better than the combined ones.

Dihin et al. [45] utilized a wavelet transform with a shifted window model on the APTOS dataset. Compared to the 98% accuracy of the binary classification, the multi-class classification has reached only 86%. Images were segmented by Cornforth et al. [46], who combined the wavelet assessment, supervised classifier likelihoods, adaptive threshold methods, and morphology-based approaches. Different retinal diseases were monitored by Yagmur et al. [47] by incorporating DWT in image processing. Another study by Rehman et al. [48] applied DWT with different machine learning classifiers, such as k-nearest neighbors (KNN) and support vector machine (SVM) models. They used Messidor data with binary classification.

The challenges of DR detection include proper image processing methods and accurate model development to detect different datasets [49]. Most of the studies employed a single image set for training and testing. The common performance measures used to analyze the data are accuracy, recall or sensitivity, and precision [50]. Table 1 describes the inferences from previous DR-related studies.

3. Materials and Methods

The workflow of the study is shown in Figure 2. The stages involved collecting retinal images from public data, preprocessing the images, designing the model, training, and validating the model with fine-tuning, and testing the model with an external dataset.

3.1. Dataset

The dataset employed in this study is public access data. The APTOS [16] and DDR [60] data were combined for training, testing, and validation. APTOS contained 3662 photos, each accompanied by its corresponding label. DDR had 12,524 images depicting various stages of DR. The two were merged, resulting in 16,186 images fed into the model with five labels (no DR, mild, moderate, severe, and PrDR). However, 16 images were excluded from the dataset due to missing valid image names in the file, so the total images were 16,170. Out of the total images, we employed 4333 images with 5 classes.

The class label no DR (class 0) has 1014 images, mild DR (class 1) 999 images, moderate DR (class 2) 684 images, severe DR (class 3) 429 images, and PDR (class 4) 1207 images. Following a 70:15:15 data split, 3033 images were utilized for training, 650 for validation, and 650 for testing. The complete count for each class is depicted in Table 2.

3.2. Data Augmentation

Deep learning methods exhibit optimal performance when applied to large-scale datasets. Typically, the model’s performance improves with more data. Whether images or data, data augmentation complements the model’s training data. Each image was assigned an identical label to the original image from which it was generated. Before subjecting the dataset to the model, more images were incorporated into the training set to rectify the imbalance [41].

Different techniques involved rotating, shifting, flipping, zooming, and brightening the image. This proposed study applied properties such as image rotation with 40 degrees, width shift with a value of 0.2, height shift with 0.2, and horizontal flipping on the training set. Figure 3 shows the augmented images of a random sample image. Also, we implemented elastic transformation, which enhances the diversity in the dataset by applying elastic changes to the image in a random manner, resulting in distorted versions.

3.3. Image Preprocessing

For machine learning and deep learning to succeed in their respective fields, image preprocessing is essential. The photos were first resized by a ratio of 1/255. To improve the stability and effectiveness of the model training process, the picture pixel values need to be normalized to the range [0, 1]. Rescaling the pixel values aids in preventing numerical instabilities and promotes quicker convergence during the training phase. After rescaling, other preprocessing methods improved the pictures further in preparing for the later feature extraction and classification phases, such as converting RGB to grayscale, CLAHE, and DWT.

Converting the color image to grayscale is the next stage in preprocessing, followed by applying CLAHE. CLAHE is an image processing method employed to enhance the contrast and visibility of fine features in images, particularly those exhibiting uneven lighting conditions or poor contrast. CLAHE prevents contrast over-amplification in adaptive histogram equalization (AHE). Tiles, not the complete image, are what CLAHE processes. It removes spurious borders by combining nearby tiles using bilinear interpolation. This method boosts the visual contrast [57].

Contrary to typical AHE, CLAHE restricts contrast. The CLAHE established a clipping limit to alleviate noise amplification [61]. In this study, after applying CLAHE preprocessing, the images were resized to a similar size. The image size varied in each dataset; for example, the APTOS images were 3216 × 2136 pixels, and DDR was 512 × 512 pixels. Therefore, to make it similar, the image size parameter was set to 224 × 224 pixels. Then, the DWT method was applied to each image.

Discrete Wavelet Transform

A mathematical method for processing signals and images, the DWT allows the examination of distinct signal components at varying resolutions or scales. Particularly, the DWT breaks down a signal into two sets of coefficients: approximation and detail, which stand for the signal’s low-frequency and high-frequency components, respectively [62]. In frequency, among the most often used techniques for image processing is the wavelet transform, which allows the information in images to be represented collectively [63]. Using the Python wavelet framework, the DWT is represented by

c A, (c H, c V, c D) = d w t 2 (i m a g e)

(1)

where cA represents the approximation coefficients, cH is the horizontal coefficients, cV is the vertical coefficients, and cD is the diagonal coefficients. Here, cH, cV, and cD are the detailed coefficients.

The first step in the DWT method is to run the original image through two filters: a high-pass filter to extract the cV, cH, and cD coefficients and a low-pass filter to extract the cA component, as shown in Equation (1). These filters separate the signal’s fine and coarse features. Following filtering, both sets of coefficients are downsampled by two, halving the image resolution. As our primary wavelet for the DWT procedure, we chose the ‘db1’ wavelet, also referred to as the Daubechies wavelet, with one vanishing moment. We used a single-level decomposition of the original retinal images. Figure 4 depicts the image decomposition into different coefficients.

We manipulated these coefficients after the decomposition to produce a four-channel composite image. As each channel is devoted to a single set of coefficients, the whole information from the original image was captured such that our CNN could efficiently learn from it.

3.4. The Deep Learning Model

The proposed model aims to precisely classify and predict the DR stages by analyzing the image. The input image preprocessed data is used as features, while the label supplied to each image acts as the target variable. Multiple filters and layers extracted the image information and attributes throughout the feature extraction process. Following this stage, the images were categorized based on the desired target labels. The recommended model’s architecture is presented in Figure 5.

The input layer was modified by the input shape from the training data. The preprocessed image (converting RGB to grayscale and applying CLAHE and DWT) was fed into the CNN model. After applying the DWT method, the image resolution was halved.

Convolutional layers have convolution operations to summarize and minimize the information features. The feature map output shows picture corners and edges. Later, further layers use this feature map to learn more input image features. To reduce computational expenses, the pooling layer reduces the convolved feature map size by lowering the layer connections and separately operating on each feature map. The kernel size was 3, and the units of each convolutional layer were 16, 32, 64, and 128, respectively. The activation function employed was ‘relu’. Max pooling uses the most significant feature map element with a pool size of two. BatchNormalization in neural networks is a technique that normalizes the activation levels between layers. It helps to improve the efficiency and accuracy of model training by regularization. The Flatten layer reduces the dimensionality of the previous layer’s pictures by flattening the last layer’s results.

The fully connected layers consisted of two dense layers. The first dense layer was 256 units, and the activation function was ‘relu’. Including a dropout layer set at a frequency rate of 0.5 mitigated the model’s overfitting. These layers were then connected to the output dense layer, which had ‘softmax’ activation and classified data into five different DR stages. The loss function was sparse categorical entropy, as the class labels were considered integers. The sparse categorical cross-entropy is a variant of the categorical cross-entropy loss and is used when the labels are provided as integers instead of one-hot encoded vectors (as shown in Equation (2)).

Loss (w) = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} \log (p_{i}) + (1 - y_{i}) l o g (1 - p_{i})]

(2)

where w refers to the model parameters, N is the number of samples, y_i is the actual labels, p_i is the predicted labels, and the labels are integers.

Model Optimizer

Optimization methods are essential to machine and deep learning model training for enhancing the model’s predictions. Various techniques have been created to tackle specific optimization problems, including handling sparse gradients or quickening convergence [64]. In this study, we utilized different optimizers to ascertain the most effective one. SGD, Adam, Adagrad, Adamax, and RMSProp were the five optimizers employed.

One of the most straightforward yet most powerful optimization methods is SGD. It uses a small portion of training data to update the model’s weights instead of the entire dataset, speeding up computation significantly.

By scaling each parameter inversely proportional to the square root of the total of all its previous squared values, Adagrad modifies the learning rates of all parameters. It works exceptionally well with sparse data.

RMSprop uses a moving average of squared gradients to solve the Adagrad problem of declining learning rates. This technique dynamically modifies the learning rate for each parameter, increasing it for steps with small gradients to expedite convergence and decreasing it for steps with significant gradients to avoid divergence.

By modifying the learning rate according to a moving average of the gradients and their squares, Adam combines the benefits of Adagrad and RMSprop. Because of its ability to traverse the optimization landscape efficiently, Adam is one of the most often used optimizers for deep learning model training.

An Adam variant based on the infinite norm is called Adamax. Despite being less popular than Adam, Adamax can help in some situations when its normalizing technique has benefits.

The particulars of the training job and the type of data can influence the choice of the optimizer, as each optimizer has advantages and uses [65]. For example, Adagrad or RMSprop may be favored for jobs with sparse data, although Adam is frequently used due to its broad usefulness across various situations.

3.5. Performance Measures

Accuracy in DR detection measures how often the model successfully detects positive and negative situations (Equation (3)). Precision measures how successfully the model recognizes real positive cases (actual DR) among all the projected positive cases (Equation (4)). Recall, or sensitivity (Equation (5)), measures how well the model collects and detects all real occurrences of DR, minimizing false negatives. The F1-score is essential because it incorporates false positives and negatives (Equation (6)), which is beneficial when positive and negative situations are imbalanced. We may calculate the area under the curve (AUC) to see how much of the graphic is below the curve. The closer the AUC is to 1, the better the model is [66].

Accuracy = (truePos + trueNeg)/(truePos + trueNeg + falsePos + falseNeg),

(3)

where truePos is true positive, trueNeg is true negative, falsePos is false positive, and falseNeg is false negative.

Precision = (truePos)/(truePos + falsePos),

(4)

Recall = (truePos)/(truePos + falseNeg),

(5)

F1-score = (2 ∗ truePos)/(2 ∗ truePos + falsePos + falseNeg),

(6)

4. Results

4.1. Experiment Setup

The models in this work were implemented using Python packages such as TensorFlow [67] and Python wavelets [68]. The NVidia Titan V GPU was employed to deploy the model. To address the problem of overfitting, the deep learning models used the EarlyStopping and ReduceLROnPlateau approaches. Seventy was the value allocated to the epoch parameter. For the ReduceLROnPlateau method, the value set to the patience parameter was 2. Consider a scenario with a constant or upward-trending validation loss. In such a scenario, the learning rate parameter is subjected to an update 0.1 higher than its previous value. Suppose the validation loss in the EarlyStopping mechanism demonstrates a high loss value for ten epochs. In that case, the training process for the model is ended. Table 3 displays the training parameters of the model.

4.2. Model Evaluation

The preprocessed image is depicted in Figure 6. The APTOS-DDR combined data are employed for training and validating the model. For training, 70% of the data are used, and 15% for validation. The CNN layers extract the features, and the fully connected layers classify the data according to the class labels provided.

The validation accuracy and loss are displayed in Table 4. The initial learning rate is 0.001. It was tuned during the training by the model, using ReduceLROnPlateau. The models utilized sparse categorical loss entropy. The Adam optimizer shows approximately 75.14% validation accuracy with a loss of 0.8257.

The accuracy of training and validation is shown in Figure 7. The EarlyStopping parameter stops the Adam model at the 27th epoch, the RMSProp at the 30th, the SGD at the 31st, the Adagrad at the 22nd, and the Adamax at the 27th. Similarly, the loss of training and validation is depicted in Figure 8.

The test data contains 650 images from APTOS-DDR images. Table 5 illustrates the model’s performance. The Adam optimizer model performs better than others, with 71.85% accuracy and a 0.72 recall. The F1-score is also the highest in the Adam model, with a value of 0.71. The RMSProp optimizer has the lowest accuracy, with 62.31% accuracy, and Adagrad, with 65.59%.

Finally, the results are plotted in Figure 9 to easily visualize each model’s comparison with the test dataset.

The confusion matrix is depicted in Figure 10, illustrating how each class’s performance is assessed for the model. The proportion of successfully categorized examples for each class is displayed in the diagonal cells. Here, class 0, the Normal images, predicts 90% of the images correctly; class 1 (mild) predicts 63%; class 2 (moderate) predicts 78%; class 3 (severe) predicts only 28%; and class 4 (proliferative) predicts 73% of the correct images. Considering more images for class 4 for better performance can be a future enhancement.

As the AUC approaches 1, the model becomes more accurate. Table 6 depicts the AUC score of the models. The AUC score shows that all models are good enough to classify the labels. The Adam optimizer model shows a 0.9042 AUC score, SGD has 0.8834, Adamax has 0.8759, Adagrad a score of 0.8815, and RMSProp has 0.8831.

5. Discussion

The proposed study employs a CNN model with preprocessing methods such as CLAHE and DWT (Figure 4). The model predicts five stages of DR. Not all previous studies utilized different datasets for evaluation, which lacks the model’s generalizability. Significantly, few studies are related to CLAHE and DWT preprocessing with DR image data. Adam is frequently used due to its broad usefulness across various situations. Because of its ability to traverse the optimization landscape efficiently, Adam is one of the most often used optimizers for deep learning model training [64]. The performance comparison of the study with previous research is illustrated in Table 7.

Although our CNN model’s accuracy of 70% and AUC of 0.82 may seem minor in comparison, we feel our work brings substantial value to the area of DR identification for the following reasons:

Data Integration Approach: Our work stands out due to its unique method of merging the APTOS and DDR datasets, improving the model’s robustness and mitigating generalizability concerns. This approach not only tackles biases peculiar to the dataset but also guarantees an easy adjustment of our model to real-world situations in many populations.
Integration of DWT in Image Preprocessing: It is a substantial methodological progress in the processing and categorization of DR pictures. DWT helps to highlight the intricate features in the retinal pictures essential for precisely recognizing the different DR phases. This hybrid methodology not only amplifies the process of extracting distinctive characteristics but also boosts the model’s responsiveness to the smallest signs of advancing illness.
Foundation for Future Research: Although our findings offer a strong foundation for future research, we acknowledge the need for ongoing enhancement. Our work provides explicit guidelines for future research aimed at improving accuracy and AUC scores to a greater extent.

Limitations and Future Works

The model is tuned only for two parameters, namely, the learning rate and optimizers. CNN, with only four layers, was employed in this study. Understanding the image quality and using pretrained models is one of the possible future enhancements of the study. Another future work direction is to collect a local dataset and evaluate the model’s performance.

6. Conclusions

Classifying DR into five stages using retina images is a complex area of research. Despite extensive efforts to classify DR as binary or multi-classes, these methods have been ineffective in reliably detecting its early stages from retinal pictures. Traditional approaches take longer and have limited prediction accuracy, making it impossible to identify early on. CNN is a popular image analysis network. The adoption of CLAHE and DWT aims to enhance the clarity of the images, facilitating the CNN model’s extraction of the most distinctive aspects. Potentially contributing to preventing diabetic eye impairment, the suggested study could identify the severity of DR at earlier stages.

Author Contributions

Conceptualization, A.M.M. and K.A.-S.; methodology, A.M.M., S.R. and S.S.; software, S.S.; validation, A.M.M.; data curation, A.M.M. and S.S.; writing—original draft preparation, A.M.M.; writing—review and editing, K.A.-S. and S.R.; supervision, A.M.M.; funding acquisition, A.M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Kuwait University Research Grant No. EO04/18.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: [16,60].

Acknowledgments

The first and fourth authors thank Kuwait University for their continuous support in completing this work.

Conflicts of Interest

The authors declare no conflict of interest.

References

Sun, H.; Saeedi, P.; Karuranga, S.; Pinkepank, M.; Ogurtsova, K.; Duncan, B.B.; Stein, C.; Basit, A.; Chan, J.C.N.; Mbanya, J.C.; et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res. Clin. Pract. 2022, 183, 109119. [Google Scholar] [CrossRef]
World Bank Group. World Development Indicators: Diabetes Prevalence (% of Population Ages 20 to 79). Available online: https://databank.worldbank.org/reports.aspx?dsid=2&series=SH.STA.DIAB.ZS (accessed on 19 October 2023).
Tan, T.-E.; Wong, T.Y. Diabetic retinopathy: Looking forward to 2030. Front. Endocrinol. 2023, 13, 1077669. [Google Scholar] [CrossRef]
Usman, T.M.; Saheed, Y.K.; Ignace, D.; Nsang, A. Diabetic retinopathy detection using principal component analysis multi-label feature extraction and classification. Int. J. Cogn. Comput. Eng. 2023, 4, 78–88. [Google Scholar] [CrossRef]
Zhu, S.; Xiong, C.; Zhong, Q.; Yao, Y. Diabetic Retinopathy Classification with Deep Learning via Fundus Images: A Short Survey. IEEE Access 2024, 12, 20540–20558. [Google Scholar] [CrossRef]
Sun, R.; Pang, Y.; Li, W. Efficient Lung Cancer Image Classification and Segmentation Algorithm Based on an Improved Swin Transformer. Electronics 2023, 12, 1024. [Google Scholar] [CrossRef]
Jeong, Y.; Hong, Y.J.; Han, J.H. Review of Machine Learning Applications Using Retinal Fundus Images. Diagnostics 2022, 12, 134. [Google Scholar] [CrossRef]
Oishi, A.M.; Tawfiq-Uz-Zaman, M.; Emon, M.B.H.; Momen, S. A Deep Learning Approach to Diabetic Retinopathy Classification. In Cybernetics Perspectives in Systems: Proceedings of 11th Computer Science On-line Conference 2022, Vol. 3; Springer: New York, NY, USA, 2022; Volume 503, p. 417. [Google Scholar] [CrossRef]
Vipparthi, V.; Rao, D.R.; Mullu, S.; Patlolla, V. Diabetic Retinopathy Classification using Deep Learning Techniques. In Proceedings of the 2022 3rd International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 17–19 August 2022; pp. 840–846. [Google Scholar]
Jagan Mohan, N.; Murugan, R.; Goel, T. Deep Learning for Diabetic Retinopathy Detection: Challenges and Opportunities. In Next Generation Healthcare Informatics; Tripathy, B.K., Lingras, P., Kar, A.K., Chowdhary, C.L., Eds.; Springer Nature: Singapore, 2022; pp. 213–232. [Google Scholar]
Tsiknakis, N.; Theodoropoulos, D.; Manikis, G.; Ktistakis, E.; Boutsora, O.; Berto, A.; Scarpa, F.; Scarpa, A.; Fotiadis, D.I.; Marias, K. Deep learning for diabetic retinopathy detection and classification based on fundus images: A review. Comput. Biol. Med. 2021, 135, 104599. [Google Scholar] [CrossRef]
Qureshi, I.; Ma, J.; Abbas, Q. Diabetic retinopathy detection and stage classification in eye fundus images using active deep learning. Multimed. Tools Appl. 2021, 80, 11691–11721. [Google Scholar] [CrossRef]
Qummar, S.; Khan, F.G.; Shah, S.; Khan, A.; Shamshirband, S.; Rehman, Z.U.; Khan, I.A.; Jadoon, W. A Deep Learning Ensemble Approach for Diabetic Retinopathy Detection. IEEE Access 2019, 7, 150530–150539. [Google Scholar] [CrossRef]
Jain, A.K.; Jalui, A.; Jasani, J.; Lahoti, Y.; Karani, R. Deep Learning for Detection and Severity Classification of Diabetic Retinopathy. In Proceedings of the 2019 1st International Conference on Innovations in Information and Communication Technology (ICIICT), Tamil Nadu, India, 23 March 2019; pp. 1–6. [Google Scholar] [CrossRef]
Higuera, V. The 4 Stages of Diabetic Retinopathy. Available online: https://www.healthline.com/health/diabetes/diabetic-retinopathy-stages (accessed on 25 December 2023).
APTOS 2019 Blindness Detection. Available online: https://www.kaggle.com/competitions/aptos2019-blindness-detection/overview (accessed on 5 September 2022).
Mukherjee, N.; Sengupta, S. Application of deep learning approaches for classification of diabetic retinopathy stages from fundus retinal images: A survey. Multimed. Tools Appl. 2023, 83, 43115–43175. [Google Scholar] [CrossRef]
Xu, S.; Huang, Z.; Zhang, Y. Diabetic Retinopathy Progression Recognition Using Deep Learning Method. Available online: http://cs231n.stanford.edu/reports/2022/pdfs/20.pdf (accessed on 21 November 2022).
Mutawa, A.M.; Alnajdi, S.; Sruthi, S. Transfer Learning for Diabetic Retinopathy Detection: A Study of Dataset Combination and Model Performance. Appl. Sci. 2023, 13, 5685. [Google Scholar] [CrossRef]
Uppamma, P.; Bhattacharya, S. Deep Learning and Medical Image Processing Techniques for Diabetic Retinopathy: A Survey of Applications, Challenges, and Future Trends. J. Healthc. Eng. 2023, 2023, 2728719. [Google Scholar] [CrossRef]
Gargeya, R.; Leng, T. Automated identification of diabetic retinopathy using deep learning. Ophthalmology 2017, 124, 962–969. [Google Scholar] [CrossRef] [PubMed]
Areeb, Q.M.; Nadeem, M. A Comparative Study of Learning Methods for Diabetic Retinopathy Classification. Adv. Data Comput. Commun. Secur. 2022, 106, 239–249. [Google Scholar] [CrossRef] [PubMed]
Khan, Z.; Khan, F.G.; Khan, A.; Rehman, Z.U.; Shah, S.; Qummar, S.; Ali, F.; Pack, S. Diabetic retinopathy detection using VGG-NIN a deep learning architecture. IEEE Access 2021, 9, 61408–61416. [Google Scholar] [CrossRef]
da Rocha, D.A.; Ferreira, F.M.F.; Peixoto, Z.M.A. Diabetic retinopathy classification using VGG16 neural network. Res. Biomed. Eng. 2022, 38, 761–772. [Google Scholar] [CrossRef]
Farag, M.M.; Fouad, M.; Abdel-Hamid, A.T. Automatic Severity Classification of Diabetic Retinopathy Based on DenseNet and Convolutional Block Attention Module. IEEE Access 2022, 10, 38299–38308. [Google Scholar] [CrossRef]
Kaur, J.; Kaur, P. UNIConv: An enhanced U-Net based InceptionV3 convolutional model for DR semantic segmentation in retinal fundus images. Concurr. Comput. Pract. Exp. 2022, 34, e7138. [Google Scholar] [CrossRef]
Vij, R.; Arora, S. A novel deep transfer learning based computerized diagnostic Systems for Multi-class imbalanced diabetic retinopathy severity classification. Multimed. Tools Appl. 2023, 82, 34847–34884. [Google Scholar] [CrossRef]
Qian, Z.; Wu, C.; Chen, H.; Chen, M. Diabetic Retinopathy Grading Using Attention based Convolution Neural Network. In Proceedings of the 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 12–14 March 2021; pp. 2652–2655. [Google Scholar]
Zhang, C.; Lei, T.; Chen, P. Diabetic retinopathy grading by a source-free transfer learning approach. Biomed. Signal Process. Control 2022, 73, 103423. [Google Scholar] [CrossRef]
Dihin, R.A.; AlShemmary, E.; Al-Jawher, W. Diabetic Retinopathy Classification Using Swin Transformer with Multi Wavelet. J. Kufa Math. Comput. 2023, 10, 167–172. [Google Scholar] [CrossRef] [PubMed]
Dinpajhouh, M.; Seyyedsalehi, S.A. Automated detecting and severity grading of diabetic retinopathy using transfer learning and attention mechanism. Neural Comput. Appl. 2023, 35, 23959–23971. [Google Scholar] [CrossRef]
Zia, F.; Irum, I.; Qadri, N.N.; Nam, Y.; Khurshid, K.; Ali, M.; Ashraf, I.; Khan, M.A. A multilevel deep feature selection framework for diabetic retinopathy image classification. Comput. Mater. Contin. 2022, 70, 2261–2276. [Google Scholar] [CrossRef]
Murugappan, M.; Prakash, N.; Jeya, R.; Mohanarathinam, A.; Hemalakshmi, G. A Novel Attention Based Few-shot Classification Framework for Diabetic Retinopathy Detection and Grading. Measurement 2022, 200, 111485. [Google Scholar] [CrossRef]
Bilal, A.; Zhu, L.; Deng, A.; Lu, H.; Wu, N. AI-Based Automatic Detection and Classification of Diabetic Retinopathy Using U-Net and Deep Learning. Symmetry 2022, 14, 1427. [Google Scholar] [CrossRef]
Jena, P.K.; Khuntia, B.; Palai, C.; Nayak, M.; Mishra, T.K.; Mohanty, S.N. A Novel Approach for Diabetic Retinopathy Screening Using Asymmetric Deep Learning Features. Big Data Cogn. Comput. 2023, 7, 25. [Google Scholar] [CrossRef]
Bilal, A.; Sun, G.; Mazhar, S.; Imran, A.; Latif, J. A Transfer Learning and U-Net-based automatic detection of diabetic retinopathy from fundus images. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 2022, 10, 663–674. [Google Scholar] [CrossRef]
Khalifa, N.E.M.; Loey, M.; Taha, M.H.N.; Mohamed, H.N.E.T. Deep Transfer Learning Models for Medical Diabetic Retinopathy Detection. Acta Inform. Med. 2019, 27, 327–332. [Google Scholar] [CrossRef]
Kandel, I.; Castelli, M. Transfer learning with convolutional neural networks for diabetic retinopathy image classification. A review. Appl. Sci. 2020, 10, 2021. [Google Scholar] [CrossRef]
Al-Smadi, M.; Hammad, M.; Baker, Q.B.; Sa’ad, A. A transfer learning with deep neural network approach for diabetic retinopathy classification. Int. J. Electr. Comput. Eng. 2021, 11, 3492. [Google Scholar] [CrossRef]
Zhou, Y.; Wang, B.; Huang, L.; Cui, S.; Shao, L. A Benchmark for Studying Diabetic Retinopathy: Segmentation, Grading, and Transferability. IEEE Trans. Med. Imaging 2021, 40, 818–828. [Google Scholar] [CrossRef] [PubMed]
Van Dyk, D.A.; Meng, X.-L. The art of data augmentation. J. Comput. Graph. Stat. 2001, 10, 1–50. [Google Scholar] [CrossRef]
Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef]
Mungloo-Dilmohamud, Z.; Heenaye-Mamode Khan, M.; Jhumka, K.; Beedassy, B.N.; Mungloo, N.Z.; Peña-Reyes, C. Balancing Data through Data Augmentation Improves the Generality of Transfer Learning for Diabetic Retinopathy Classification. Appl. Sci. 2022, 12, 5363. [Google Scholar] [CrossRef]
Ashwini, K.; Dash, R. Grading diabetic retinopathy using multiresolution based CNN. Biomed. Signal Process. Control. 2023, 86, 105210. [Google Scholar] [CrossRef]
Dihin, R.; Alshemmary, E.; Al-Jawher, W. Wavelet-Attention Swin for Automatic Diabetic Retinopathy Classification. Baghdad Sci. J. 2024, 21, 8. [Google Scholar] [CrossRef]
Cornforth, D.J.; Jelinek, H.J.; Leandro, J.J.G.; Soares, J.V.B.; Cesar, R.M., Jr.; Cree, M.J.; Mitchell, P.; Bossomaier, T. Development of retinal blood vessel segmentation methodology using wavelet transforms for assessment of diabetic retinopathy. Complex. Int. 2005, 11, 50–61. [Google Scholar]
Yagmur, F.D.; Karlik, B.; Okatan, A. Automatic recognition of retinopathy diseases by using wavelet based neural network. In Proceedings of the 2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT), Ostrava, Czech Republic, 4–6 August 2008; pp. 454–457. [Google Scholar] [CrossRef]
Rehman, M.U.; Abbas, Z.; Khan, S.H.; Ghani, S.H.; Najam. Diabetic retinopathy fundus image classification using discrete wavelet transform. In Proceedings of the 2018 2nd International Conference on Engineering Innovation (ICEI), Bangkok, Thailand, 5–6 July 2018; pp. 75–80. [Google Scholar] [CrossRef]
Selvachandran, G.; Quek, S.G.; Paramesran, R.; Ding, W.; Son, L.H. Developments in the detection of diabetic retinopathy: A state-of-the-art review of computer-aided diagnosis and machine learning methods. Artif. Intell. Rev. 2023, 56, 915–964. [Google Scholar] [CrossRef] [PubMed]
Sebastian, A.; Elharrouss, O.; Al-Maadeed, S.; Almaadeed, N. A Survey on Deep-Learning-Based Diabetic Retinopathy Classification. Diagnostics 2023, 13, 345. [Google Scholar] [CrossRef]
Rajamani, S.; Sasikala, S. Artificial Intelligence Approach for Diabetic Retinopathy Severity Detection. Informatica 2023, 46. [Google Scholar] [CrossRef]
Jiwani, N.; Gupta, K.; Sharif, M.H.U.; Datta, R.; Habib, F.; Afreen, N. Application of Transfer Learning Approach for Diabetic Retinopathy Classification. In Proceedings of the 2023 International Conference on Power Electronics and Energy (ICPEE), Bhubaneswar, India, 3–5 January 2023; pp. 1–4. [Google Scholar] [CrossRef]
Gu, Z.; Li, Y.; Wang, Z.; Kan, J.; Shu, J.; Wang, Q. Classification of Diabetic Retinopathy Severity in Fundus Images Using the Vision Transformer and Residual Attention. Comput. Intell. Neurosci. 2023, 2023, 1305583. [Google Scholar] [CrossRef] [PubMed]
Mondal, S.S.; Mandal, N.; Singh, K.K.; Singh, A.; Izonin, I. EDLDR: An Ensemble Deep Learning Technique for Detection and Classification of Diabetic Retinopathy. Diagnostics 2023, 13, 124. [Google Scholar] [CrossRef]
Kalyani, G.; Janakiramaiah, B.; Karuna, A.; Prasad, L.V.N. Diabetic retinopathy detection and classification using capsule networks. Complex Intell. Syst. 2023, 9, 2651–2664. [Google Scholar] [CrossRef]
Ali, G.; Dastgir, A.; Iqbal, M.W.; Anwar, M.; Faheem, M. A Hybrid Convolutional Neural Network Model for Automatic Diabetic Retinopathy Classification From Fundus Images. IEEE J. Transl. Eng. Health Med. 2023, 11, 341–350. [Google Scholar] [CrossRef]
Hayati, M.; Muchtar, K.; Roslidar; Maulina, N.; Syamsuddin, I.; Elwirehardja, G.N.; Pardamean, B. Impact of CLAHE-based image enhancement for diabetic retinopathy classification through deep learning. Procedia Comput. Sci. 2023, 216, 57–66. [Google Scholar] [CrossRef]
Alwakid, G.; Gouda, W.; Humayun, M. Deep Learning-Based Prediction of Diabetic Retinopathy Using CLAHE and ESRGAN for Enhancement. Healthcare 2023, 11, 863. [Google Scholar] [CrossRef] [PubMed]
Dihin, R.; AlShemmary, E.; Al-Jawher, W. Automated Binary Classification of Diabetic Retinopathy by SWIN Transformer. J. Al-Qadisiyah Comput. Sci. Math. 2023, 15, 169. [Google Scholar] [CrossRef]
Li, T.; Gao, Y.; Wang, K.; Guo, S.; Liu, H.; Kang, H. Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. Inf. Sci. 2019, 501, 511–522. [Google Scholar] [CrossRef]
Setiawan, A.W.; Mengko, T.R.; Santoso, O.S.; Suksmono, A.B. Color retinal image enhancement using CLAHE. In International Conference on ICT for Smart Society; IEEE: New York, NY, USA, 2013; pp. 1–3. [Google Scholar] [CrossRef]
Shensa, M.J. The discrete wavelet transform: Wedding the a trous and Mallat algorithms. IEEE Trans. Signal Process 1992, 40, 2464–2482. [Google Scholar] [CrossRef]
Othman, G.; Zeebaree, D.Q. The Applications of Discrete Wavelet Transform in Image Processing: A Review. J. Soft Comput. Data Min. 2020, 1, 31–43. [Google Scholar]
Hassan, E.; Shams, M.Y.; Hikal, N.A.; Elmougy, S. The effect of choosing optimizer algorithms to improve computer vision tasks: A comparative study. Multimed. Tools Appl. 2023, 82, 16591–16633. [Google Scholar] [CrossRef] [PubMed]
Mehmood, F.; Ahmad, S.; Whangbo, T.K. An Efficient Optimization Technique for Training Deep Neural Networks. Mathematics 2023, 11, 1360. [Google Scholar] [CrossRef]
Fan, J.; Upadhye, S.; Worster, A. Understanding receiver operating characteristic (ROC) curves. Can. J. Emerg. Med. 2006, 8, 19–20. [Google Scholar] [CrossRef] [PubMed]
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv 2016, arXiv:1603.04467. [Google Scholar]
Lee, G.R.; Gommers, R.; Waselewski, F.; Wohlfahrt, K.; O‘Leary, A. PyWavelets: A Python package for wavelet analysis. J. Open Source Softw. 2019, 4, 1237. [Google Scholar] [CrossRef]
Patel, S.; Lohakare, M.; Prajapati, S.; Singh, S.; Patel, N. DiaRet: A browser-based application for the grading of diabetic retinopathy with integrated gradients. In Proceedings of the 2021 IEEE International Conference on Robotics, Automation and Artificial Intelligence (RAAI), Hong Kong, 21–23 April 2021; pp. 19–23. [Google Scholar] [CrossRef]
Rahhal, D.; Alhamouri, R.; Albataineh, I.; Duwairi, R. Detection and Classification of Diabetic Retinopathy Using Artificial Intelligence Algorithms. In Proceedings of the 2022 13th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 21–23 June 2022; pp. 15–21. [Google Scholar]

Figure 1. The stages of DR from the APTOS dataset. (a) No DR or normal retina, (b) mild DR, (c) moderate DR, (d) severe DR, and (e) proliferative DR.

Figure 2. The suggested study’s workflow.

Figure 3. Example of an augmented image from a random image in training data. From left to right: the original image, followed by a horizontally flipped image, height-shifted image, width-shifted image, and rotated image.

Figure 4. The level-1 decomposition of DWT. The input image size is 224 height by 224 width, which is then downsampled to half the resolution.

Figure 5. The architecture of the proposed model.

Figure 6. The original and preprocessed image in the model: (a) raw image, (b) CLAHE preprocessed image, and (c) sample of approximation coefficients from DWT preprocessing.

Figure 7. The accuracy plot of training and validation data: (a) Adam optimizer, (b) RMSProp optimizer, (c) SGD optimizer, (d) Adagrad optimizer, and (e) Adamax optimizer.

Figure 8. The loss plot of training and validation data: (a) Adam optimizer, (b) RMSProp optimizer, (c) SGD optimizer, (d) Adagrad optimizer, and (e) Adamax optimizer.

Figure 9. The comparison of the proposed model’s performance with the test dataset.

Figure 10. The confusion matrix of the best model with the Adam optimizer.

Table 1. Inferences from previous studies related to the classification of DR stages.

Reference	Model	Dataset	Advantage	Disadvantage
[19]	DenseNet121	APTOS, EyePACS, ODIR	Able to classify based on different datasets.	The stage of DR is not detected.
[51]	CNN	Kaggle	The model achieves an accuracy of 89%.	The model needs to be tested on different datasets.
[52]	VGG16	IDRiD	The model could detect different stages of DR.	The model was tested on very few images.
[53]	Vision Transformer	DDR, IDRiD	The model could detect different stages of DR.	The model has a class imbalance problem and tests on a few images.
[54]	ResNext, DenseNet	APTOS	The model classifies the DR with high performance.	The model has a class imbalance problem and tests on a single dataset.
[55]	Capsule network	Messidor	The model detects the stages of DR.	Only four stages are detected and tested only on a single dataset.
[56]	InceptionV3, Resnet50, CNN	Messidor, IDRiD	The model classifies the DR with high performance.	The features are extracted using only two pretrained models.
[35]	U-Net	APTOS, Messidor	The model segments and detects the stages of DR.	Only four stages are detected, and parameter tuning can be performed.
[57]	EfficientNet, VGG16, InceptionV3	APTOS	The model classifies DR after CLAHE preprocessing.	The stage of DR is not detected, and only a single dataset is evaluated.
[58]	InceptionV3	APTOS	The model classifies DR stages after CLAHE preprocessing.	Only a single dataset and model are evaluated.
[59]	Swin Transformer	APTOS	The model classifies DR with high performance.	The stage of DR is not detected, and the model can be tuned with more datasets.
[43]	VGG16	APTOS, Mauritius	The model detects the stages of DR.	The model needs to be tuned for moderate and proliferative DR.
[45]	Wavlet with Swin Transformer	APTOS	The classification accuracy was improved.	The study utilized only a single image set for testing the model.
[48]	DWT with KNN, SVM	Messidor	The model classifies the normal and DR images perfectly.	The stage of DR is not detected, and the dataset contains fewer samples.

Table 2. The count of each class in training, validating, and testing datasets.

Dataset	Class 0	Class 1	Class 2	Class 3	Class 4
Training	715	711	483	299	825
Validation	151	139	95	71	194
Testing	148	149	106	59	188

Table 3. The training parameter of the model.

Parameter	Value
Image size	224
Initial Learning rate	0.001
Optimizer	Adam, SGD, Adamax, Adagrad, RMSProp
Loss function	SparseCategoricalEntropy
Epoch	70
Batch Size	32

Table 4. The validation accuracy of the APTOS-DDR dataset.

Optimizers	Accuracy	Loss
Adam	75.14%	0.8257
SGD	73.31%	0.7446
Adamax	73.13%	0.8103
Adagrad	61.54%	0.9456
RMSProp	62.77%	0.8991

Table 5. The test accuracy results of all optimizers.

Optimizer	Accuracy	Recall	Precision	F1-Score	Loss
Adam	71.85%	0.72	0.71	0.71	0.8049
SGD	68.43%	0.68	0.69	0.64	0.8536
Adamax	69.16%	0.69	0.59	0.61	1.0942
Adagrad	65.69%	0.66	0.64	0.64	0.8918
RMSProp	62.31%	0.62	0.63	0.61	0.8922

Table 6. The AUC scores of the test results.

Optimizer	Class 0	Class 1	Class 2	Class 3	Class 4
Adam	0.9781	0.8461	0.9646	0.7626	0.8897
SGD	0.9633	0.8209	0.8414	0.7740	0.7756
Adamax	0.9504	0.8000	0.8190	0.8267	0.7681
Adagrad	0.9719	0.7765	0.9499	0.7629	0.8739
RMSProp	0.9715	0.7877	0.9502	0.7963	0.8604

Table 7. Comparison of the proposed study with previous studies.

Reference	Year	Model	Class Type	Dataset	Accuracy	F1-Score	AUC
[69]	2021	InceptionResNetV2	5-class	DDR	52.50%	-	0.80
[70]	2022	CNN	5-class	DDR	66.68%	-	-
[53]	2023	Vision Transformer	6-class	DDR	91.54%	0.67	-
[45]	2024	Swin Transformer, DWT	5-class	APTOS	86.00%	-	-
Proposed model		CNN with CLAHE, DWT	5-class	APTOS + DDR	71.85%	0.71	0.90

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mutawa, A.M.; Al-Sabti, K.; Raizada, S.; Sruthi, S. A Deep Learning Model for Detecting Diabetic Retinopathy Stages with Discrete Wavelet Transform. Appl. Sci. 2024, 14, 4428. https://doi.org/10.3390/app14114428

AMA Style

Mutawa AM, Al-Sabti K, Raizada S, Sruthi S. A Deep Learning Model for Detecting Diabetic Retinopathy Stages with Discrete Wavelet Transform. Applied Sciences. 2024; 14(11):4428. https://doi.org/10.3390/app14114428

Chicago/Turabian Style

Mutawa, A. M., Khalid Al-Sabti, Seemant Raizada, and Sai Sruthi. 2024. "A Deep Learning Model for Detecting Diabetic Retinopathy Stages with Discrete Wavelet Transform" Applied Sciences 14, no. 11: 4428. https://doi.org/10.3390/app14114428

APA Style

Mutawa, A. M., Al-Sabti, K., Raizada, S., & Sruthi, S. (2024). A Deep Learning Model for Detecting Diabetic Retinopathy Stages with Discrete Wavelet Transform. Applied Sciences, 14(11), 4428. https://doi.org/10.3390/app14114428

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Deep Learning Model for Detecting Diabetic Retinopathy Stages with Discrete Wavelet Transform

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Dataset

3.2. Data Augmentation

3.3. Image Preprocessing

Discrete Wavelet Transform

3.4. The Deep Learning Model

Model Optimizer

3.5. Performance Measures

4. Results

4.1. Experiment Setup

4.2. Model Evaluation

5. Discussion

Limitations and Future Works

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI