Diabetic Retinopathy Detection Using Convolutional Neural Networks with Background Removal, and Data Augmentation

Suedumrong, Chaichana; Phongmoo, Suriya; Akarajaka, Tachanat; Leksakul, Komgrit

doi:10.3390/app14198823

Open AccessArticle

Diabetic Retinopathy Detection Using Convolutional Neural Networks with Background Removal, and Data Augmentation

by

Chaichana Suedumrong

¹,

Suriya Phongmoo

^2,*

,

Tachanat Akarajaka

³ and

Komgrit Leksakul

^3,*

¹

Graduate Program in Industrial Engineering, Department of Industrial Engineering, Faculty of Engineering, Chiang Mai University, Chiang Mai 50200, Thailand

²

Office of Research Administration and Department of Industrial Engineering, Faculty of Engineering, Chiang Mai University, Chiang Mai 50200, Thailand

³

Excellence Center in Logistics and Supply Chain Management, Department of Industrial Engineering, Faculty of Engineering, Chiang Mai University, Chiang Mai 50200, Thailand

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(19), 8823; https://doi.org/10.3390/app14198823

Submission received: 31 July 2024 / Revised: 24 September 2024 / Accepted: 25 September 2024 / Published: 30 September 2024

(This article belongs to the Special Issue AI Technologies for eHealth and mHealth)

Download

Browse Figures

Versions Notes

Abstract

:

Diabetic retinopathy (DR) is a potentially blinding complication affecting individuals with diabetes, where early diagnosis and treatment are crucial to preventing vision loss. Recent advances in deep learning have shown promise in automating DR diagnosis, offering faster, more reliable, and cost-effective solutions. Our study employed convolutional neural networks (CNNs) to classify the severity of DR using retinal images from the EyePACS dataset, which includes 35,155 images categorized into five classes. Building on previous research that often classified DR into two classes, such as no DR and varying levels of DR, we found that while these studies typically used models like Inception V3, VGGNet, and ResNet, they focused on simplifying the diagnostic process by reducing the number of classes. However, our approach utilized a smaller, more flexible CNN architecture, allowing for a more detailed classification into five stages of DR. We employed various image preprocessing techniques, including grayscale conversion, background removal, and data augmentation, with our findings indicating that background removal significantly enhanced model performance, achieving a validation accuracy of 90.60%. This underscores the importance of sophisticated data preprocessing in medical imaging, and our study contributes to the ongoing development of automated DR diagnosis, potentially easing the burden on healthcare systems and improving patient outcomes.

Keywords:

diabetic retinopathy classification; deep learning in medical imaging; convolutional neural networks (CNNs); image preprocessing techniques; automated diagnosis

1. Introduction

Diabetic retinopathy (DR) is a severe complication of diabetes that results from prolonged high blood sugar levels, which damage the blood vessels in the retina. If left undiagnosed and untreated, DR can lead to irreversible blindness, making early detection and intervention critical [1]. Damage to the retina can result in blood and fluid leakage, swelling, or abnormal growth of new blood vessels (neovascularization), leading to irreversible vision loss if untreated. In the early stages, DR often presents with no symptoms, but as the disease progresses, patients may experience a range of symptoms, including but not limited to blurred vision, floaters, dark or empty areas in their vision, and decreased night vision [2]. Comprehensive exams, including visual acuity tests, dilated eye exams, tonometry, and optical coherence tomography (OCT), are crucial for evaluating retinal damage and guiding treatment [3].

The prevalence of DR and its potential for severe vision impairment pose significant public health concerns. The economic burden associated with DR includes both direct costs, such as frequent eye exams and treatments, and indirect costs, such as loss of productivity and increased caregiving needs, all of which impact patients’ quality of life [4]. Additionally, DR patients face increased risks of accidents such as falls, as well as mental health challenges, including depression and anxiety [5].

Advances in AI and deep learning offer new possibilities for automating DR diagnosis, providing faster, more accurate, and cost-effective tools [6]. Convolutional neural networks (CNNs) have demonstrated significant success in analyzing retinal images to detect and classify DR. Previous studies often categorized DR into broad groups, such as no DR and varying degrees of severity, to simplify the diagnostic process [7]. Models such as Inception V3, VGGNet, and ResNet have been effective in improving diagnostic accuracy by focusing on key features of the disease [8,9]. However, a major research gap exists in the ability of current models to capture the finer nuances of DR progression, particularly in detecting early-stage disease. Many existing models rely on broad distinctions between disease stages, which can overlook subtle but critical retinal changes, such as microaneurysms and small hemorrhages. This lack of sensitivity in capturing these minor features can result in missed opportunities for early diagnosis and intervention [4,5,6,7,10,11,12,13,14,15,16,17,18,19].

In response to the limitations of existing models in detecting early-stage diabetic retinopathy (DR), our study compares the performance of three convolutional neural network (CNN) models using different image preprocessing techniques and training strategies. We assess models trained on grayscale images, standard RGB images, and RGB images with fewer training epochs. This approach focuses on how advanced preprocessing methods—such as grayscale conversion, background removal, and data augmentation—can enhance the detection of subtle retinal features crucial for early DR diagnosis.

By systematically evaluating these techniques, we aim to address unresolved issues in current research, including the challenges of early-stage detection, inconsistent performance across various image types, and the balance between computational efficiency and diagnostic accuracy. Examining the effects of reduced training epochs also offers insights into optimizing computational resources without sacrificing model performance.

The technical contribution of this study lies in its comparative analysis of preprocessing methods and training efficiencies, providing valuable guidance for developing more effective and efficient DR detection models. Improved early detection can lead to timely interventions and better patient outcomes. Furthermore, reducing training time makes AI-driven diagnostics more accessible in real-world healthcare settings, particularly where computational resources are limited [16].

Given the increasing need for precise and efficient diagnostic tools, our objective is to develop a compact CNN model capable of accurately classifying DR into its five stages using these advanced techniques. By enhancing both accuracy and computational efficiency, we aim to create a model practical for deployment in resource-constrained environments.

Numerous approaches have been proposed to effectively diagnose and classify DR levels using convolutional neural networks (CNNs). This article provides an exhaustive literature review of these techniques, emphasizing the paramount role of the CNN method for diagnostic purposes. From the examined studies, a considerable proportion of data for DR diagnosis was sourced from EyePACS [6,7,8,10,12,14,16,18,20]. This dataset includes a diverse range of retinal images categorized by different DR severity levels, offering a strong foundation for model training. It also presents opportunities to enhance image filtering techniques and improve data comparison, which could optimize diagnostic accuracy. Notably, most of the diagnoses aimed to identify early symptoms or stages of DR, such as No DR, Mild DR, and Moderate DR. The focus on early-stage DR is critical given that all diabetic patients necessitate DR screening, yet the majority do not present with DR [4,5,6,7,10,11,12,13,14,15,16,17,18,19]. Hence, tools that facilitate early-stage DR identification may significantly enhance diagnostic efficiency. Patients diagnosed with DR can then be referred to specialists for subsequent treatment. Table 1 illustrates the diversity of research studies, data sources, the research aims, CNN techniques, and their respective accuracy in diagnosing DR. Several literature studies have demonstrated the high accuracy of different deep learning models, including VGG, ResNet, and Inception V3, in image processing for DR diagnosis.

Table 1 [4,5,6,7,8,10,11,12,13,14,16,17,18,19,20,21,22,23,24,25] provides a comprehensive review of DR research using deep learning, highlighting key authors, data sources, research aims, CNN techniques, and test accuracies. While these studies have advanced our understanding of deep learning’s potential in DR detection, unresolved issues remain, particularly in detecting early-stage DR and balancing accuracy with computational efficiency. Most prior research focuses on RGB images and advanced-stage detection, often overlooking subtle retinal changes critical for early diagnosis.

This study addresses these gaps by evaluating different preprocessing techniques, such as grayscale conversion and fewer epochs, to improve both early detection and computational efficiency. The technical value lies in optimizing the balance between diagnostic accuracy and efficiency, offering a more practical approach for clinical integration. These findings can significantly enhance the practical implementation of AI-driven DR diagnostics, improving early detection and making AI tools more feasible for real-world use, especially in resource-constrained environments.

Table 1. The literature on DR with deep learning.

DR with Deep Learning
No.	Year	Authors	Source of Data	Research Aim	CNN Technique	Accuracy
1	2016	Pratt, Harry et al. (2016) [8]	EyePACS	DR level 0-1-2-3-4	CNN	42.85%
2	2018	Saxena, Gaurav et al. (2018) [10]	EyePACS, Messidor-1 & 2	DR level 0-1-2	InceptionResNetV2	92.95%
3	2018	Kermany, D. S. et al. (2018) [11]	ImageNet (Szegedy et al., 2016 [26])	DR level 0-1	Inception V3	91.00%
4	2019	Sayres, R. et al. (2019) [12]	EyePACS +	DR level 0-1	Inception-V3, V4	87.15%
5	2019	Bellemo, Valentina et al. (2019) [5]	Singapore Integrated Diabetic Retinopathy	DR level 0-1	VGGNet, ResNet	97.30%
6	2019	Wilver Auccahuasi et al. (2019) [13]	DIARETDB1	DR level 0-1	CNN	92.49%
7	2020	Chakravadhanula, Kasyap (2020) [20]	EyePACS	DR level 0-1-3	Inception V3	98.20%
8	2020	Reddy, Shiva Shankar et al. (2020) [14]	EyePACS	DR level 0-1	SVM	96.98%
9	2020	Son, J. et al. (2020) [15]	Seoul + Messidor	DR level 0-1	Clopper-Pearson method	96.32%
10	2020	Abdelsalam, Mohamed M. (2020) [16]	Egypt	DR level 0-1	Image-J Ver. 1.5.0.0within MATLAB	95.52%
11	2020	Rim, Tyler Hyungtaek et al. (2020) [17]	Gangnam, Beijing, SEED and Biobank	DR level 0-1	VGG16	90.00%
12	2020	Xie, Yuchen et al. (2020) [4]	Singapore Integrated Diabetic Retinopathy	DR level 0-1	VGGNet, ResNet, DenseNet	85.65%
13	2020	Katada, Yusaku et al. (2020) [7]	EyePACS, Japan	DR level 0-1	Inception V3	85.05%
14	2021	Hsieh, Y. T. et al. (2021) [18]	EyePACS, NTUH	DR level 0-1-2	VeriSee	96.47%
15	2021	Tham, Yih-Chung et al. (2021) [19]	SEED + BES + CIEMS + BMES++	DR level 0-1	ResNet-50	94.23%
16	2021	Bora, Ashish et al. (2021) [6]	EyePACS, Thai	DR level 0-1-2	Inception-V3	73.93%
17	2023	Dhouibi, Meriamet al. (2023) [22]	Kaggle	DR level 0-1-2-3-4	ResNet50	92.90%
18	2023	Khudair Ali Hassan and Merhej Radhi A (2023) [24]	APTOS	DR level 0-1-2-3-4	Efficient Net B1	86.10%
19	2024	Yaman Atcı, Sükran et al. (2024) [25]	Kaggle	DR level 0-1-2-3-4	Efficient Net B3 and SHAP	85%
20	2024	A. M.Mutawa et al. (2024) [23]	APTOS + DDR	DR level 0-1-2-3-4	Discrete Wavelet Transform + CNN	71.85%
21	2024	Proposed model	EyePACS	DR level 0-1-2-3-4	CNN	90.60%

2. Methods

2.1. Convolutional Neural Network (CNN)

A convolutional neural network (CNN) has the advantage of extracting features for ease of further classification. A CNN is suitable for learning through picture recognition. The neural network, a multilayer perceptron, was first proposed by Yann LeCun [27,28,29]. The main idea of a CNN is to use a special layer called the convolution layer, which extracts parts of the image, such as the outlines of different objects, so that the model can efficiently and accurately learn the image’s characteristics.

LeCun’s LeNet-1 neural network, one of the earliest and most influential neural networks, was originally designed for handwritten digit recognition [28,29]. LeCun used this neural network for handwritten USA letter-based numerical recognition. The images used were handwritten numerical images, and 10 numbers were classified: 0 to 10. The structure is shown in Figure 1.

Figure 2 illustrates an example of a classic CNN architecture known as LeNet-5, which was published in 1998 and used for reading digits on bank checks. This model demonstrates the structure and complexity of CNN models [30].

2.2. Dataset

This study employed the EyePACS dataset, encompassing 35,155 images, each classified according to the severity of diabetic retinopathy (DR), from level 0 (no retinopathy) to level 4 (proliferative retinopathy). For this research, a numerical system, 0, 1, 2, 3, 4, was utilized to denote the severity of DR, as reflected in the accompanying image data. Each grade represented a distinct stage of DR, allowing for an organized and efficient analysis of the disorder’s progression.

Level 0: No apparent retinopathy, 25,849 images.

Level 1: Mild non-proliferative retinopathy, 2438 images: In the earliest stage of diabetic retinopathy, the walls of the blood vessels in the retina weaken. Tiny bulges protrude from the vessel walls, sometimes leaking fluid and blood into the retina. Nerve fibers in the retina may begin to swell, and central vision may be affected.

Level 2: Moderate non-proliferative retinopathy, 5288 images: As the disease progresses, blood vessels that nourish the retina may swell and distort. They may also lose their ability to transport blood. Both conditions cause characteristic changes to the appearance of the retina and may contribute to DME (diabetic macular edema).

Level 3: Severe non-proliferative retinopathy, 872 images: Many more blood vessels are blocked, depriving blood supply to areas of the retina. These areas secrete growth factors that signal the retina to grow new blood vessels.

Level 4: Proliferative retinopathy, 708 images: At this advanced stage, the signals sent by the retina trigger the growth of new blood vessels. This process is called neovascularization. However, these new blood vessels are abnormal and fragile. They grow along the retina and the surface of the clear gel that fills the inside of the eye. By themselves, these blood vessels do not cause symptoms or vision loss. However, they have thin, fragile walls. If they leak blood, severe vision loss and even blindness can result.

2.3. Preprocessing Techniques

In this research, three key preprocessing techniques were implemented across all three models to ready the image data: grayscale conversion via OpenCV, data augmentation, and image background removal using REMBG.

Grayscale conversion: This process was conducted using OpenCV, converting the original-colored images into grayscale. The objective of this conversion was to lessen the computational burden and to avoid inconsistencies arising from the color variance in the images. By standardizing the images to grayscale, the models could focus on distinguishing patterns and features without the additional complexity of color variations.

Data augmentation: To enhance the robustness of the models and combat potential overfitting, data augmentation techniques were applied. These included image manipulations such as rotation, zooming, and horizontal flipping. By introducing these transformations, the model was exposed to a wider variety of data scenarios, helping it generalize better to unseen data and improve its predictive power.

Image background removal: The third preprocessing step was the use of REMBG to remove the backgrounds from the images. By doing so, the model could concentrate solely on the crucial features of the retina, potentially improving its accuracy in predicting the severity of diabetic retinopathy.

These preprocessing steps were instrumental in enhancing the effectiveness and performance of the models in distinguishing and categorizing various levels of diabetic retinopathy severity. The outcome of these preprocessing steps can be visualized in Figure 3, Figure 4, Figure 5 and Figure 6.

2.4. Model Development

In this study, we implemented and analyzed three main deep learning models for diagnosing the severity of diabetic retinopathy (DR) from retinal images. The models share the same architecture, employing convolutional neural networks (CNNs), a well-established method for image-based machine learning tasks.

Training in this research was meticulously set up to ensure the integrity and reliability of the results. The training was executed with a batch size of 128 images, with each image resized to a uniform 180 × 180 pixels. To ensure robust model training and validation, the dataset was split into two parts: 80% for training and 20% for validation. Importantly, this division was conducted before the execution of any image transformations or preprocessing steps to avoid the risk of data leakage. Data leakage, where the model obtains access to information it should not during training, is a common issue in machine learning, which could lead to overly optimistic performance estimates. In this study, ensuring the strict separation of training and validation data means that the model never sees the validation data during its training, eliminating any chance of obtaining prior answers.

Moreover, this study adopted a compact model design to expedite the training process given the computational resources at hand. Utilizing a smaller model not only facilitated rapid training but also allowed for the effective management of resources. This strategy made it possible to conduct extensive experiments without overtaxing the available computational capacity. The resultant model, despite its size, was robust and capable of accurately diagnosing diabetic retinopathy from image data, highlighting the feasibility and efficiency of smaller model architectures for tasks of this nature. The model architecture was built using several layers, each tailored to a specific task, optimizing the network for image feature extraction and classification.

Input Layer: This layer ingests preprocessed images ready for the model.

Conv2D Layers: The first two layers in the architecture are 2D convolutional layers, each with 16 filters. These layers use learned filters to conduct convolution operations on the input data, aiming to capture local features within the image such as edges and corners.

Max Pooling Layer: Following the Conv2D layers is a max pooling layer, which reduces the spatial dimensions of the input by selecting the maximum value in each window defined along the features axis.

More Conv2D and Max Pooling Layers: The architecture continues with an alternating pattern of Conv2D and MaxPooling2D layers. The number of filters in these layers progressively increases (from 16 to 32 and then to 64), each time reducing the spatial dimensions.

Flatten Layer: The 2D matrix produced by the preceding layer is transformed into a 1D vector by a flatten layer, preparing it for input to the subsequent dense layers.

Dense Layers: The flattened output is then passed to a fully connected layer with 128 neurons, which performs classification based on the features extracted by the previous layers.

Dropout Layer (0.5): Following the dense layer is a dropout layer with a dropout rate of 0.5. This means that during each training update, 50% of the neurons are randomly set to zero. We chose a dropout probability of 0.5 because it is widely recognized as an optimal value for preventing overfitting without significantly under-training the model. A dropout rate of 0.5 provides a balance between retaining sufficient network capacity for learning and introducing enough regularization to improve generalization. Compared to lower dropout rates (e.g., 0.1 or 0.2), a 0.5 rate introduces more noise during training, which helps the model avoid becoming too reliant on any particular set of neurons, thus enhancing its ability to generalize to new, unseen data [31].

Output Layer: The final layer in the architecture is another dense layer, serving as the output layer of the model. This layer has five neurons, representing the five classes of DR severity, and outputs the probabilities of the input image belonging to each class.

The complete model, therefore, alternates layers of convolution and max pooling for feature extraction, followed by a flattening operation and dense layers for classification based on these features. In total, the model has 3,991,605 trainable parameters and no non-trainable parameters, which are explained in Figure 7 and Table 2.

This study was conducted to experimentally build and compare three distinct models to identify the one with superior performance. While all three models share the same architecture, their primary difference lies in the type of input data they handle: the first model uses grayscale image data, the second uses normal color image data, and the third employs normal color image data but with fewer epochs during training.

Model 1 (Grayscale Image Data): This model was trained for 200 epochs using grayscale image data. Subsequently, the same model was retrained with data augmentation applied to the original dataset for another 200 epochs. Finally, the background was removed from the same dataset, and the training continued the initial model for 10 more epochs. The resulting model and its experimental results comprise Model 1.

Model 2 (Normal Image Data): This model was constructed similarly to Model 1, but normal color image data was used instead of grayscale images. The model was first trained for 200 epochs, then retrained with augmented data for another 200 epochs. Subsequently, the background was removed from the same dataset, and the training continued the initial model for 10 additional epochs. The resulting model and its experimental results comprise Model 2.

Model 3 (Normal Image Data with Fewer Epochs): Again, the procedure was similar to Models 1 and 2, but this model was trained for fewer epochs. The model was first trained for 40 epochs, then retrained with augmented data for another 40 epochs. After the background was removed from the same dataset, the training continued the initial model for an additional 10 epochs. The resulting model and its experimental results comprise Model 3.

Upon completion of training the three models, we performed a comparative analysis to evaluate their performance based on specific metrics such as validation accuracy and loss values. Further details and results from this comparison can be seen in Figure 7.

Please note that while all models have been trained and tested with the same number of epochs for fairness, the number of epochs was reduced in Model 3 to investigate the impact of fewer training iterations on model performance. Furthermore, the variation in data preprocessing techniques (grayscale conversion and background removal) provides a more comprehensive understanding of their effect on the model’s performance for DR diagnosis.

2.5. Evaluation Confusion Matrix

In this study, accuracy served as the principal metric for performance evaluation. However, we also utilized the confusion matrix, a robust tool that provides a comprehensive representation of a classification model’s performance. The confusion matrix offers a detailed overview of the model’s predictions in comparison to the actual labels.

For the purpose of this research, a “positive instance” refers to an image that has been correctly identified as depicting diabetic retinopathy (DR), whereas a “negative instance” refers to an image that has been correctly identified as not depicting DR. With these definitions in mind, the confusion matrix allows us to derive the following performance metrics:

True Positive (TP): The count of positive instances correctly identified as positive.
True Negative (TN): The count of negative instances correctly identified as negative.
False Positive (FP): The count of negative instances incorrectly identified as positive (Type I error).
False Negative (FN): The count of positive instances incorrectly identified as negative (Type II error).

Using these values, we calculated the following performance measures:

Accuracy (ACC%) = (TP + TN)/(TP + TN + FP + FN) × 100

(1)

Precision (PRE) = TP/(TP + FP) × 100

(2)

Sensitivity (SEN), Recall, True Positive Rate (TPR) = TP/(TP + FN) × 100

(3)

Specificity (SPE) = TN/(TN + FP) × 100

(4)

F1-score = 2 × (SEN × PRE)/(SEN + PRE)

(5)

By considering these metrics, we obtained a comprehensive understanding of the model’s performance, taking into account both positive and negative predictions. The confusion matrix is crucial in unveiling the model’s strengths and weaknesses in distinguishing between different classes. For the purposes of this research, both the confusion matrix and validation accuracy were used to interpret the results.

3. Results

During this research, we performed a comparative analysis utilizing three different models. These models were evaluated on a laptop equipped with an Nvidia GeForce RTX 3070 Ti Laptop 16 GB GPU and 32 GB RAM. We gauged the model’s performance based on crucial metrics such as loss, accuracy, precision, recall, and the F1-score.

Model 1, constructed with gray image data, data augmentation, and background removal, was initially trained using gray image data for 200 epochs. This led to a loss of 7.622 and an accuracy of 70.86% A subsequent training period of 200 epochs with data augmentation resulted in an increased loss of 8.8518 and a marginal decrease in accuracy to 70.64%. However, a notable performance improvement was observed following training with background removal data for 10 epochs. This process resulted in a significant reduction in loss to 1.5001 and a considerable increase in accuracy to 90.39%. Consequently, Model 1 displayed satisfactory results, achieving an accuracy of 90.39%. Figure 8 and Table 3 illustrate the experimental results, confusion metrics, and F1-score for Model 1.

Model 2, which utilized normal image data, data augmentation, and background removal, was initially trained with normal image data for 200 epochs. This resulted in a loss of 1.2343 and an accuracy of 0.7325. An additional 200 epochs of data augmentation increased the loss to 6.0087 but also marginally enhanced the accuracy to 0.7330. After training the model with background removal data for 10 epochs, the loss decreased to 1.2560, and the accuracy reached an impressive 0.9060. As a result, Model 2 also produced satisfactory results, with an accuracy of 90.60%. The experimental results, confusion metrics, and F1-score for Model 2 can be seen in Figure 9 and Table 4.

Lastly, Model 3, which also employed normal image data, data augmentation, and background removal but with fewer epochs, was initially trained with normal image data for 30 epochs. This yielded a loss of 1.0508 and an accuracy of 0.7302. Further training with 40 epochs of data augmentation resulted in an increased loss of 4.0852 and a decreased accuracy to 0.7144. The final round of training with background removal data for 10 epochs brought the loss to 1.2684 and an accuracy of 0.8764. Despite the reduction in the number of epochs, Model 3 yielded less satisfactory results compared to the other two models. The experimental results, confusion metrics, and F1-score for Model 3 are illustrated in Figure 10 and Table 5.

Upon completing the training of the three models, we conducted a comprehensive comparative analysis to evaluate their performance based on specific metrics such as validation accuracy and loss values. Model 1 utilized grayscale images with data augmentation and background removal, achieving a validation accuracy of 90.39% and a loss of 0.2560. This indicates that converting images to grayscale effectively reduces computational complexity while retaining essential features necessary for accurate classification. Model 2, employing color images with the same augmentation and background removal techniques, slightly outperformed Model 1 with a validation accuracy of 90.60% and a lower loss of 0.2450. The inclusion of color information provided additional features that enhanced the model’s ability to distinguish between different stages of diabetic retinopathy.

In contrast, Model 3, which was identical to Model 2 in preprocessing but trained with fewer epochs (40 instead of 200), showed a decreased validation accuracy of 87.64% and a higher loss of 0.2684. The reduced performance of Model 3 underscores the importance of sufficient training epochs to allow the model to fully learn and generalize complex patterns associated with diabetic retinopathy. These findings suggest that while both grayscale and color images can be effectively used alongside data augmentation and background removal to enhance model performance, adequate training duration is crucial for optimal results. Therefore, Model 2 is most suitable when the highest accuracy is required, leveraging the richness of color information and thorough training. Model 1 offers a balance between accuracy and computational efficiency, making it suitable for situations with limited resources. Model 3 might be appropriate when training time or computational power is constrained, accepting a trade-off in accuracy for faster deployment. Table 6 presents a summary of the models’ performances.

4. Discussion

The results obtained in this study demonstrate the potential of deep learning models in the automated diagnosis of diabetic retinopathy (DR). Through the use of grayscale and normal image data in conjunction with data augmentation and background removal techniques, the models’ performances were significantly enhanced.

Model 1, using grayscale images, achieved a validation accuracy of 90.39%. The removal of color information likely helped the model focus on critical retinal features, improving accuracy.

Model 2, trained with normal RGB images, slightly outperformed Model 1 with an accuracy of 90.60%. This marginal improvement may be due to the additional features provided by color information, although it increased the model’s complexity and loss.

Model 3, trained with fewer epochs, exhibited a lower accuracy of 87.64%, showing the importance of adequate training duration. This model still achieved satisfactory results, suggesting that in time-constrained environments, fewer training epochs can be an acceptable compromise, though performance is slightly reduced.

Comparing our results with previous studies highlights the effectiveness of our proposed models. Our models outperform several recent works, such as Khudair et al. [24], who reported an 86.10% accuracy using EfficientNet B1 on the APTOS dataset; Yaman et al. [25], who achieved 85% accuracy using EfficientNet B3 and SHAP; and Mutawa et al. [23], who obtained 71.85% accuracy with a Discrete Wavelet Transform combined with a CNN approach. Although Dhouibi et al. [22] achieved a slightly higher accuracy of 92.90% using ResNet50 on the Kaggle dataset, our models are notably more compact in size. This demonstrates that our grayscale and augmentation techniques not only enhance performance but also contribute to creating smaller, more efficient models.

However, it is essential to consider the generalizability of these results. The models were trained on the EyePACS dataset, which may not represent the full diversity of retinal images from different geographic locations or medical institutions. Testing on datasets like Messidor or APTOS, or data collected from local hospitals, could further validate these findings and ensure robustness across varying populations. Additionally, exploring other preprocessing techniques and architectures might enhance the model’s capability even further.

Our study demonstrates the effectiveness of our deep learning models in aiding the diagnosis of diabetic retinopathy (DR). By achieving high accuracy with a compact model size, we offer a practical solution that can enhance diagnostic accuracy in clinical settings, especially in resource-constrained environments. This advancement highlights the strength of our approach and contributes significantly to the field by providing an efficient tool for early DR detection. While further research is needed to optimize performance and confirm applicability across diverse populations, our work represents a meaningful step forward in integrating AI tools into healthcare. It is important to view these models as supportive aids for healthcare professionals rather than replacements, as human expertise remains essential in medical diagnoses.

5. Conclusions

This study shows that grayscale images and background removal significantly improve model performance for DR detection. By eliminating irrelevant background elements, the models were able to focus on critical retinal features and lesions, leading to improved performance. Model 1 achieved a validation accuracy of 90.39%, while Model 2, using normal RGB images, slightly outperformed it with a validation accuracy of 90.60%. Despite these results, Model 3, which was trained with fewer epochs, showed the importance of adequate training duration for optimal performance, with a validation accuracy of 87.64%.

A key finding of this study is that simple yet effective techniques, such as grayscale conversion and background removal, can greatly enhance model accuracy, especially in compact models with just over three million parameters. This is particularly relevant for resource-constrained environments where computational power and training time may be limited. Furthermore, the results indicate that while grayscale images reduce computational complexity and maintain high accuracy, additional color information can provide slight improvements in performance when ample computational resources are available.

Comparing our models with previous works, such as Dhouibi et al. (2023) [22] and Khudair et al. (2023) [24], shows that our models achieved competitive accuracy, particularly in DR classification into five levels. These results suggest that our models could play a crucial role in early DR diagnosis and treatment, providing a simpler alternative to more complex architectures.

However, for real-world application, further validation is needed on diverse datasets beyond EyePACS to ensure that the models are robust across different regions and patient populations. This future research will be critical in ensuring the models’ reliability in varied clinical settings. Despite these limitations, this work demonstrates the value of using efficient deep learning models with advanced preprocessing techniques in medical imaging tasks, potentially paving the way for more accessible and scalable diagnostic tools in healthcare.

Author Contributions

Conceptualization, C.S. and K.L.; methodology, C.S. and K.L.; software, C.S. and K.L.; validation, C.S. and S.P.; formal analysis, C.S. and S.P.; investigation, C.S., S.P. and T.A.; data curation, C.S. and K.L.; writing—original draft preparation, C.S., K.L., S.P. and T.A.; writing—review and editing, C.S. and K.L.; visualization, C.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research work was partially supported by Chiang Mai University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data in this research paper will be shared upon request with the corresponding author.

Acknowledgments

This work was supported by the Excellence Center in Logistics and Supply Chain Management, Department of Industrial Engineering, Faculty of Engineering, Chiang Mai University.

Conflicts of Interest

The authors declare no conflicts of interest.

References

NHS. Diabetic Retinopathy. Available online: https://www.nhs.uk/conditions/diabetic-retinopathy/ (accessed on 10 September 2024).
American Academy of Ophthalmology. Diabetic Retinopathy Symptoms. Available online: https://www.aao.org/eye-health/diseases/diabetic-retinopathy-symptoms (accessed on 10 September 2024).
Clinic, M. Diabetic Retinopathy Diagnosis. 2022. Available online: https://www.mayoclinic.org/diseases-conditions/diabetic-retinopathy/diagnosis-treatment/drc-20371616 (accessed on 10 September 2024).
Xie, Y.; Nguyen, Q.D.; Hamzah, H.; Lim, G.; Bellemo, V.; Gunasekeran, D.V.; Yip, M.Y.T.; Lee, X.Q.; Hsu, W.; Lee, M.L.; et al. Artificial intelligence for teleophthalmology-based diabetic retinopathy screening in a national programme: An economic analysis modelling study. Lancet Digit. Health 2020, 2, e240–e249. [Google Scholar] [CrossRef] [PubMed]
Bellemo, V.; Lim, Z.W.; Lim, G.; Nguyen, Q.D.; Xie, Y.; Yip, M.Y.; Hamzah, H.; Ho, J.; Lee, X.Q.; Hsu, W.; et al. Artificial intelligence using deep learning to screen for referable and vision-threatening diabetic retinopathy in Africa: A clinical validation study. Lancet Digit. Health 2019, 1, e35–e44. [Google Scholar] [CrossRef]
Bora, A.; Balasubramanian, S.; Babenko, B.; Virmani, S.; Venugopalan, S.; Mitani, A.; de Oliveira Marinho, G.; Cuadros, J.; Ruamviboonsuk, P.; Corrado, G.S.; et al. Predicting the risk of developing diabetic retinopathy using deep learning. Lancet Digit. Health 2021, 3, e10–e19. [Google Scholar] [CrossRef]
Katada, Y.; Ozawa, N.; Masayoshi, K.; Ofuji, Y.; Tsubota, K.; Kurihara, T. Automatic screening for diabetic retinopathy in interracial fundus images using artificial intelligence. Intell.-Based Med. 2020, 3, 100024. [Google Scholar] [CrossRef]
Pratt, H.; Coenen, F.; Broadbent, D.M.; Harding, S.P.; Zheng, Y. Convolutional Neural Networks for Diabetic Retinopathy. Procedia Comput. Sci. 2016, 90, 200–205. [Google Scholar] [CrossRef]
Gulshan, V.; Peng, L.; Coram, M.; Stumpe, M.C.; Wu, D.; Narayanaswamy, A.; Venugopalan, S.; Widner, K.; Madams, T.; Cuadros, J.; et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA 2016, 316, 2402–2410. [Google Scholar] [CrossRef] [PubMed]
Saxena, G.; Verma, D.K.; Paraye, A.; Rajan, A.; Rawat, A. Improved and robust deep learning agent for preliminary detection of diabetic retinopathy using public datasets. Intell.-Based Med. 2020, 3, 100022. [Google Scholar] [CrossRef]
Kermany, D.S.; Goldbaum, M.; Cai, W.; Valentim, C.C.S.; Liang, H.; Baxter, S.L.; McKeown, A.; Yang, G.; Wu, X.; Yan, F.; et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell 2018, 172, 1122–1131.e9. [Google Scholar] [CrossRef] [PubMed]
Sayres, R.; Taly, A.; Rahimy, E.; Blumer, K.; Coz, D.; Hammel, N.; Krause, J.; Narayanaswamy, A.; Rastegar, Z.; Wu, D.; et al. Using a Deep Learning Algorithm and Integrated Gradients Explanation to Assist Grading for Diabetic Retinopathy. Ophthalmology 2019, 126, 552–564. [Google Scholar] [CrossRef] [PubMed]
Wilver, A.; Edward, F.; Fernando, S.; Juanita, C.; Monica, D.; Elizabeth, O. Recognition of hard exudates using Deep Learning. Procedia Comput. Sci. 2020, 167, 2343–2353. [Google Scholar] [CrossRef]
Reddy, S.S.; Sethi, N.; Rajender, R.; Mahesh, G. Extensive analysis of machine learning algorithms to early detection of diabetic retinopathy. Mater. Today Proc. 2020, in press. [CrossRef]
Son, J.; Shin, J.Y.; Kim, H.D.; Juang, K.-H.; Park, K.H.; Park, S.J. Development and Validation of Deep Learning Models for Screening Multiple Abnormal Findings in Retinal Fundus Images. Ophthalmology 2020, 127, 85–94. [Google Scholar] [CrossRef]
Abdelsalam, M.M. Effective blood vessels reconstruction methodology for early detection and classification of diabetic retinopathy using OCTA images by artificial neural network. Inform. Med. Unlocked 2020, 20, 100390. [Google Scholar] [CrossRef]
Rim, T.H.; Lee, G.; Kim, Y.; Tham, Y.-C.; Lee, C.J.; Baik, S.J.; Kim, Y.A.; Yu, M.; Deshmukh, M.; Lee, B.K.; et al. Prediction of systemic biomarkers from retinal photographs: Development and validation of deep-learning algorithms. Lancet Digit. Health 2020, 2, e526–e536. [Google Scholar] [CrossRef] [PubMed]
Hsieh, Y.-T.; Chuang, L.-M.; Jiang, Y.-D.; Chang, T.-J.; Yang, C.-M.; Yang, C.-H.; Chan, L.-W.; Kao, T.-Y.; Chen, T.-C.; Lin, H.-C.; et al. Application of deep learning image assessment software VeriSee™ for diabetic retinopathy screening. J. Formos. Med. Assoc. 2020, 120, 165–171. [Google Scholar] [CrossRef] [PubMed]
Tham, Y.-C.; Anees, A.; Zhang, L.; Goh, J.H.L.; Rim, T.H.; Nusinovici, S.; Hamzah, H.; Chee, M.-L.; Tjio, G.; Li, S.; et al. Referral for disease-related visual impairment using retinal photograph-based deep learning: A proof-of-concept, model development study. Lancet Digit. Health 2021, 3, e29–e40. [Google Scholar] [CrossRef]
Chakravadhanula, K. A Smartphone-Based Test and Predictive Models for Rapid, Non-Invasive, and Point-of-Care Monitoring of Ocular and Cardiovascular Complications Related to Diabetes. Inform. Med. Unlocked 2020, 24, 100485. [Google Scholar] [CrossRef]
Aschbacher, K.; Yilmaz, D.; Kerem, Y.; Crawford, S.; Benaron, D.; Liu, J.; Eaton, M.; Tison, G.H.; Olgin, J.E.; Li, Y.; et al. Atrial fibrillation detection from raw photoplethysmography waveforms: A deep learning application. Heart Rhythm. O₂ 2020, 1, 3–9. [Google Scholar] [CrossRef]
Dhouibi, M.; Salem, A.K.; Saidi, A.; Saoud, S. Acceleration of convolutional neural network based diabetic retinopathy diagnosis system on field programmable gate array. IJ-ICT 2023, 12, 214–224. [Google Scholar] [CrossRef]
Mutawa, A.M.; Al-Sabti, K.; Raizada, S.; Sruthi, S. A Deep Learning Model for Detecting Diabetic Retinopathy Stages with Discrete Wavelet Transform. Appl. Sci. 2024, 14, 4428. [Google Scholar] [CrossRef]
Khudair, A.H.; Radhi, A.M. Diabetes Diagnosis Using Deep Learning. Iraqi J. Sci. 2024, 65, 443–454. [Google Scholar] [CrossRef]
Atcı, Ş.Y.; Güneş, A.; Zontul, M.; Arslan, Z. Identifying Diabetic Retinopathy in the Human Eye: A Hybrid Approach Based on a Computer-Aided Diagnosis System Combined with Deep Learning. Tomography 2024, 10, 215–230. [Google Scholar] [CrossRef] [PubMed]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
LeCun, Y.Y. Bengio, and G. Hinton, Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
LeCun, Y.; Boser, B.E.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.E.; Jackel, L.D. Handwritten digit recognition with a back-propagation network. In Advances in Neural Information Processing Systems 2; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1990; pp. 396–404. [Google Scholar]
LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
Le Cun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]

Figure 1. An example of LeNet-1.

Figure 2. An example of LeNet-5 [30].

Figure 3. Normal picture.

Figure 4. Grayscale conversion using OpenCV.

Figure 5. Data augmentation.

Figure 6. Image background removal using REMBG.

Figure 7. Workflow of model training 1, 2, and 3.

Figure 8. The confusion matrix of Model 1-3.

Figure 9. The confusion matrix of Model 2-3.

Figure 10. The confusion matrix of Model 3-3.

Table 2. Architecture of the main model.

Layer (Type)	Input Shape	Output Shape	Param
InputLayer	(None, 180, 180, 3)	(None, 180, 180, 3)	0
Conv2D	(None, 180, 180, 3)	(None, 180, 180, 16)	448
Conv2D	(None, 180, 180, 16)	(None, 180, 180, 16)	2320
MaxPooling2D	(None, 180, 180, 16)	(None, 90, 90, 16)	0
Conv2D	(None, 90, 90, 16)	(None, 90, 90, 32)	4640
MaxPooling2D	(None, 90, 90, 32)	(None, 45, 45, 32)	0
Conv2D	(None, 45, 45, 32)	(None, 45, 45, 64)	18,496
MaxPooling2D	(None, 45, 45, 64)	(None, 22, 22, 64)	0
Flatten	(None, 22, 22, 64)	(None, 30,976)	0
Dense	(None, 30,976)	(None, 128)	3,965,056
Dropout	(None, 128)	(None, 128)	0
Dense	(None, 128)	(None, 5)	645

Table 3. The performance metrics comparison for Model 1 and sub-models.

Model 1
	Model 1-1			Model 1-2			Model 1-3
Level of DR	Precision	Recall	F1-Score	Precision	Recall	F1-Score	Precision	Recall	F1-Score
0	0.76	0.93	0.83	0.75	0.93	0.83	0.91	0.98	0.95
1	0.11	0.02	0.03	0.12	0.02	0.04	0.94	0.76	0.84
2	0.25	0.13	0.17	0.25	0.12	0.17	0.83	0.8	0.82
3	0.25	0.03	0.05	0.15	0.02	0.03	0.9	0.31	0.46
4	0.1	0.01	0.01	0.29	0.03	0.05	0.89	0.1	0.19
Accuracy	70.86%			70.64%			90.39%

Table 4. The performance metrics comparison for Model 2 and sub-models.

Model 2
	Model 2-1			Model 2-2			Model 2-3
Level of DR	Precision	Recall	F1-Score	Precision	Recall	F1-Score	Precision	Recall	F1-Score
0	0.78	0.94	0.85	0.76	0.96	0.85	0.9	0.98	0.94
1	0.19	0.02	0.04	0.12	0.01	0.02	0.92	0.72	0.81
2	0.37	0.23	0.28	0.36	0.12	0.18	0.85	0.76	0.8
3	0.25	0.08	0.13	0.27	0.06	0.1	0.88	0.31	0.46
4	0.47	0.18	0.26	0.5	0.11	0.18	1	0.18	0.3
Accuracy	73.25%			73.30%			90.60%

Table 5. The performance metrics comparison for Model 3 and sub-models.

Model 3
	Model 3-1			Model 3-2			Model 3-3
Level of DR	Precision	Recall	F1-Score	Precision	Recall	F1-Score	Precision	Recall	F1-Score
0	0.75	0.96	0.85	0.76	0.93	0.84	0.88	0.98	0.93
1	0.19	0.01	0.01	0.13	0.02	0.04	0.9	0.6	0.72
2	0.32	0.11	0.17	0.29	0.15	0.2	0.82	0.7	0.76
3	0.4	0.04	0.07	0.3	0.04	0.07	0.8	0.28	0.41
4	0.73	0.05	0.1	0.41	0.07	0.12	0.86	0.16	0.27
Accuracy	73.02%			71.44%			87.64%

Table 6. Validation accuracy of different models and techniques in image processing.

Model	Detailed	Epochs	Validation Accuracy
1	grayscale images	200	70.86%
	data augmentation	200	70.64%
	removed backgrounds	10	90.39%
2	normal picture	200	73.25%
	data augmentation	200	73.30%
	removed backgrounds	10	90.60%
3	normal picture	40	73.02%
	data augmentation	40	71.44%
	removed backgrounds	10	87.64%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Suedumrong, C.; Phongmoo, S.; Akarajaka, T.; Leksakul, K. Diabetic Retinopathy Detection Using Convolutional Neural Networks with Background Removal, and Data Augmentation. Appl. Sci. 2024, 14, 8823. https://doi.org/10.3390/app14198823

AMA Style

Suedumrong C, Phongmoo S, Akarajaka T, Leksakul K. Diabetic Retinopathy Detection Using Convolutional Neural Networks with Background Removal, and Data Augmentation. Applied Sciences. 2024; 14(19):8823. https://doi.org/10.3390/app14198823

Chicago/Turabian Style

Suedumrong, Chaichana, Suriya Phongmoo, Tachanat Akarajaka, and Komgrit Leksakul. 2024. "Diabetic Retinopathy Detection Using Convolutional Neural Networks with Background Removal, and Data Augmentation" Applied Sciences 14, no. 19: 8823. https://doi.org/10.3390/app14198823

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Diabetic Retinopathy Detection Using Convolutional Neural Networks with Background Removal, and Data Augmentation

Abstract

1. Introduction

2. Methods

2.1. Convolutional Neural Network (CNN)

2.2. Dataset

2.3. Preprocessing Techniques

2.4. Model Development

2.5. Evaluation Confusion Matrix

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI