Deep Red Lesion Classification for Early Screening of Diabetic Retinopathy

Ashraf, Muhammad Nadeem; Hussain, Muhammad; Habib, Zulfiqar

doi:10.3390/math10050686

Open AccessArticle

Deep Red Lesion Classification for Early Screening of Diabetic Retinopathy

by

Muhammad Nadeem Ashraf

^1,2

,

Muhammad Hussain

³

and

Zulfiqar Habib

^1,*

¹

Department of Computer Science, COMSATS University Islamabad, Lahore 54700, Pakistan

²

Department of Computer Science and Information Technology, The University of Lahore, Lahore 54000, Pakistan

³

Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(5), 686; https://doi.org/10.3390/math10050686

Submission received: 26 December 2021 / Revised: 6 February 2022 / Accepted: 8 February 2022 / Published: 23 February 2022

(This article belongs to the Special Issue Computer Graphics, Image Processing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Diabetic retinopathy (DR) is an asymptotic and vision-threatening complication among working-age adults. To prevent blindness, a deep convolutional neural network (CNN) based diagnosis can help to classify less-discriminative and small-sized red lesions in early screening of DR patients. However, training deep models with minimal data is a challenging task. Fine-tuning through transfer learning is a useful alternative, but performance degradation, overfitting, and domain adaptation issues further demand architectural amendments to effectively train deep models. Various pre-trained CNNs are fine-tuned on an augmented set of image patches. The best-performing ResNet50 model is modified by introducing reinforced skip connections, a global max-pooling layer, and the sum-of-squared-error loss function. The performance of the modified model (DR-ResNet50) on five public datasets is found to be better than state-of-the-art methods in terms of well-known metrics. The highest scores (0.9851, 0.991, 0.991, 0.991, 0.991, 0.9939, 0.0029, 0.9879, and 0.9879) for sensitivity, specificity, AUC, accuracy, precision, F1-score, false-positive rate, Matthews’s correlation coefficient, and kappa coefficient are obtained within a 95% confidence interval for unseen test instances from e-Ophtha_MA. This high sensitivity and low false-positive rate demonstrate the worth of a proposed framework. It is suitable for early screening due to its performance, simplicity, and robustness.

Keywords:

computer-aided diagnosis; diabetic retinopathy; red lesions; convolutional neural networks; deep residual networks; skip connections

1. Introduction

Diabetes mellitus (DM) is a chronic metabolic problem that can severely affect various human organs [1,2], including eyes, due to vision-threatening DR [3,4]. According to the International Diabetes Federation (IDF), 463 million people have been affected by DM, and by 2045 this number might rise to 700 million [5]. DM rarely spares any tissue in the human body [6]. Retinopathy, neuropathy, cardiovascular, and nephropathy are potential complications due to DM.

DR is a leading cause of preventable blindness among working-age adults [3,7,8,9]. It is usually asymptomatic in its early stage [10], and late identification leads to a substantial loss of vision [11,12]. Microaneurysms (MAs) are the early signs of DR, which may leak to cause hemorrhages (HEs). Small hemorrhages are of varying size and appear similar to MAs. Small HEs are usually known as “small red dots”. These spots are referred to as red lesions of DR and are abbreviated as HMAs collectively [3,13,14,15,16,17].

Red lesions are tiny spots with no definite boundaries. The intensities and color of these lesions are similar to background areas and blood vessels (the major distractors) inside a retina. The tiny lesions may develop anywhere inside the retina. Sometimes these are connected to blood vessels and cannot be identified easily. The poor contrast and uneven illumination in fundus images make identification even more challenging.

Fundus images are captured in a non-invasive and cost-effective manner. Thus, these photographs are usually analyzed to diagnose DR and many other ocular diseases [10]. Early screening of DR patients, by classifying red lesions, is a tedious task due to these issues. However, it is crucial to prevent blindness among a large number of diabetic patients.

Manual diagnosis creates a huge burden on ophthalmologists. An automated system can assist them to diagnose DR to initiate treatment in a timely way. Due to its significance, researchers have considered the retina and proposed different artificial intelligence-based systems instead of focusing on other ophthalmology branches [18].

The first automated technique was proposed by Lay in 1983 [19]. It was further extended by Baudoin et al. [20]. Afterward, several classical methods have been formulated to diagnose and grade DR by computing handcrafted features to classify different lesions. The classical techniques are not found to be robust during clinical trials due to the poor representation power of the less discriminative handcrafted features. Modern deep CNNs have emerged as a powerful tool to compute the most discriminative features. These are superior to classical methods. The depth of a CNN plays a vital role in successful image analysis; however, the literature shows that applying deep CNNs for medical image analyses is not a trivial task.

Due to the unavailability of a large amount of annotated data, millions of learnable parameters of these models cannot be trained from scratch. Fine-tuning a pre-trained CNN model, by transferring prior knowledge, is usually suggested in these situations to avoid local minima and saddle points [21]. However, a successful knowledge transfer depends on the resemblance of the source and target domains/tasks [22,23]. Making a CNN architecture adaptive is beneficial to train for new tasks. Performance degradation and vanishing/exploding gradient issues are experienced when deep models are trained with a minimal dataset. This study shows that architectural enhancements can effectively overcome these issues. Another hurdle to fairly using deep CNN for DR diagnosis is due to the hidden internal decision-making process. A visual interpretation would be useful to apply CNNs for DR diagnoses while maintaining trust [24].

This paper presents a deep CNN-based framework to classify red lesions for early screening of DR patients. While the detection and counting of individual lesions are useful for DR grading, it is not within the scope of this work. The following are the main contributions/novelties brought by the current study.

The architecture of a pre-trained CNN model is enhanced to address the above-mentioned issues. Instead of globally applying the CNN to a fundus image, it is applied to small regions for a precise analysis. This also contributes to increasing the number of training instances to effectively train deep CNNs. The suggested modifications are simple, robust, and are found effective to early diagnose DR despite small and less prominent lesions. Such modifications would be useful to enhance many CNN architectures to solve problems in related domains. The obtained results for five lesion-level annotated benchmark datasets [13,25,26,27,28] were better than state-of-the-art methods in terms of well-known metrics, which is favorable for practical utilization in an automated DR diagnosis. The internal decision-making of the CNN was interpreted by computing gradient-weighted class activation mapping (Grad-CAM) [29] to qualitatively analyze each classification decision for the clinical trials. The proposed algorithm was formulated after reviewing the existing automated methods for early DR diagnosis, with an emphasis on red lesions.

The literature reveals that computer-aided diagnosis (CAD) for DR can be broadly divided into two categories; classical handcrafted and modern automated feature extraction-based techniques using CNNs. The old methods usually undergo preprocessing, segmentation, feature computation, and classification phases. Retinal images are preprocessed to enhance DR lesions. The candidate regions are then extracted by thresholding, filtering, and morphological-based methods for feature computation to classify them using a classifier. Modern deep CNN-based techniques compute the most discriminative features in end-to-end manners. However, the tasks of the preprocessing phase are usually performed for both categories. This phase has been reviewed comprehensively by Ashraf et al. [30].

The classical approaches either considered MAs [26,31,32] or HEs [33] individually, while both lesions were also detected simultaneously [14,15,34,35,36,37,38,39,40,41,42,43,44,45]. These lesions were detected by extracting a set of potential candidates in the first stage and then refining this set with a classifier, trained over the traditional handcrafted features in the second stage [14,15,46]. Table 1 summarizes the handcrafted-based red lesion detection.

The main advantage of the classical methods is due to their machine learning models, which can be trained using a small number of medical images. Blood vessels are usually removed before computing handcrafted features. False detection is mainly due to the leftover vessel’s pixels. To overcome this problem, retinal vessels were not segmented by various methods [15,40,41,42,43]. Traditional handcrafted-based methods are extremely dependent on the (shape, size, texture, intensity, color, etc.) properties of less conspicuous red lesions. Thus, these techniques are not found to be very robust for clinical validation. Reliable automated feature extraction-based modern methods have contributed to overcoming the limitations of classical strategies.

Deep CNNs have been applied to diagnose DR [47,48,49], usually by analyzing fundus images globally [50,51,52,53,54,55,56,57]. However, small image patches are also used to train a deep CNN model for the detection of DR lesions [16,17,33,58]. The most relevant methods [16,17] are briefly reviewed below. The results obtained in those works are summarized in Table 2.

Orlando et al. [16] suggested a hybrid approach by computing handcrafted and CNN-based features classified using random forest (RF) [16]. A custom six-layered CNN model was suggested to train (from scratch) over the augmented set of 32 × 32 image patches. By combining handcrafted and CNN features, the results were better than their individual achievements. Blood vessels were segmented by applying morphology during candidate region extraction. This can be risky for the accurate detection of red lesions [15,40,41,42,43]. Zago et al. [17] suggested a dual CNN-based method (using a LeNet5 [59] for patch selection and a pre-trained VGG16 [60] was fine-tuned using the selected patches) for the raw localization of red lesions by classifying 65 × 65 small image patches. A huge number of patches was extracted after down-sampling the original images to 512 × 512. This down-sampling may increase the loss of tiny lesions. A dual CNN strategy can be less efficient than a single model for clinical validation.

The literature shows that, due to the ability to compute the most discriminative features, CNN-based techniques are better than classical methods. However, applying deep CNNs for DR diagnosis demands further architectural reforms. This study aimed to devise an effective automated early-DR screening method to assist specialists to prevent blindness among a large number of diabetic patients. The hidden decision-making process of the CNN was shown by computing Grad-CAM while diagnosing COVID-19 from chest images [61], which is useful to solve the current task.

2. Materials and Methods

An effective early DR diagnosis is suggested using the power of a deep CNN model. Contrary to earlier methods [16,17], a single suitable CNN is used by strengthening its architecture to be fine-tuned by a limited amount of data.

2.1. Datasets

Six datasets of fundus images are available publically [13,25,26,27,28,62]. Four datasets are annotated at lesion [13,25,26,28] while the remaining two [27,62] are annotated at the image level. For DR screening, lesion-level annotations are suitable to accurately classify injuries. Thus, all lesion-level annotated datasets were analyzed to select a suitable choice to train the proposed method, while the remaining three of them will be used for cross-validation at the lesion level. An extensively used image-level annotated dataset [27] was also considered, only to compare the per-image results with state-of-the-art methods.

2.1.1. e-Ophtha_MA

An e-Ophtha [13] contains non-mydriatic fundus images, acquired from more than 25,000 examinations. The subset, e-Ophtha_MA, consists of 148 unhealthy images, containing approximately 1300 MAs or small hemorrhages, annotated/verified by experts on a per-pixel basis, along with 233 healthy images.

2.1.2. Retinopathy Online Challenge (ROC)

The ROC [26] consists of 100 annotated images. Only ground truth for the 50 training images is publicly available. There are 336 MAs available in all 37 unhealthy training images, while the remaining 13 are healthy images without any DR issues.

2.1.3. Standard Diabetic Retinopathy Database Calibration Level 1 (DiaRetDB1) v2.1

This database contains 84 pathological and 5 healthy images, split into training/testing sets at a 28/61 ratio [25]. In total, 182 lesion-level annotated red lesions are found in only 45 images [45]. Figure 1 shows a sample image by marking various DR lesions and landmarks. The deprecated versions (DIARETDB0 and DIARETDB1) are not considered in this study.

2.1.4. Indian Diabetic Retinopathy Image Dataset (IDRiD)

The IDRiD is divided into three parts [28]. The segmentation part contains 81 unhealthy lesion-level annotated images, distributed into training/testing sets at a 54/27 ratio. The test set contains an equal amount of 27 annotated MAs and HEs. The grading part contains unhealthy images, annotated at the image-level for level 0 (No DR) to 4 (severe DR) according to the DR scale [63], while 134/34 healthy images are only available in the grading part for training/testing.

2.1.5. Messidor

This dataset [27] contains 1200 image-level annotated images, marked as four categories of DR. R0 refers to healthy, while R1, R2, and R3 indicate the mild, moderate, and severity levels of DR, respectively. Another variant, Messidor-2, is also available with only referable and non-referable labels. Thus, it was not considered for the current DR screening task.

The above facts reveal that a large quantity of per pixel-based lesion-level annotated MAs and small HEs are only available in the e-Ophtha_MA dataset. It contains the most difficult cases of red lesions. Training and testing sets are not separated by the vendors. However, a small number of training instances are available in the other lesion-level annotated datasets [25,26,28]. Due to the explicit split, very few fundus images were used to generate the training instances for CNNs in earlier algorithms [16,17]. e-Ophtha_MA is suitable to train and validate the proposed algorithm by splitting it according to experimental requirements. Table 3 summarizes the properties and utilization of all public datasets used in the current study.

2.2. The Proposed Method

To devise an effective and efficient CNN-based early screening system, six famous pre-trained CNNs (AlexNet [64], VGG16 [60], GoogLeNet [65], Inception-v3 [66], ResNet50 [67], and DenseNet [68]) were fine-tuned through transfer learning, coupled with hyper-parameter tuning to select the most suitable model. The best performing one, ResNet50, was considered for further refinements. A modified version of ResNet50 was given the name “DR-ResNet50” to differentiate it from the baseline model.

The proposed strategy consisted of preprocessing, patch generation, and classification of patches as healthy or unhealthy cases. Post-processing included the visual interpretation of classification decisions through a Grad-CAM and associating it back onto the fundus image. Figure 2 illustrates an overview of the proposed method. This required fewer stages compared to earlier classical methods [14,15,34,35,36,37,38,39,40,41,42,43,44].

2.2.1. Preprocessing

The main objective is to focus only inside the FOV to speed up further processing. To avoid the loss of tiny red lesions, fundus images were not extensively preprocessed. Blood vessels were not segmented to minimize false identification due to leftover vessel pieces.

The FOV boundaries were extracted by computing FOV masks, which was generated by converting a color fundus image into grayscale, due to less noise, to binarize the image by thresholding equal to the minimum gray level if it was non-zero, or 0.02 otherwise. The noisy regions were removed by morphologically opening and closing operations with a disc-shaped structuring element (size 5). The outcome is further refined by morphologically closing and eroding operations with disc-shaped structuring elements of sizes 3 and 9, respectively. All these values were empirically selected due to their optimal results. The FOV boundary was extracted by subtracting the above erosion results from those that were morphologically close to generate small image patches to train and validate the CNN models.

2.2.2. Patch Generation

Patch generation consists of patch extraction followed by data augmentation to produce data instances. Because a limited number of images are available in the e-Ophtha_MA dataset, the 200 × 200 small patches, centered at the annotated lesions, are extracted from fundus images using their corresponding masks. Figure 3 shows the patch generation process.

The size of the patch was selected by analyzing the dimensions of the red lesions in all four lesion-level annotated datasets. The benefits of using this patch size are three-fold. First, it allows recovering, not only the candidate itself but also the surrounding area. So, it helps a CNN to capture both the candidate’s internal features and information about its shape, border, and context. Second, this size is suitable to pass through the deep CNN models that apply convolution without padding, which reduces the size [69]. Third, a large number of small-sized image patches can be efficiently processed by deep architectures.

The extracted patches were labeled according to their annotation instead of assuming the probability of designating any patch as a lesion. The patches were extracted without down-sampling the fundus images [17] to avoid the loss of tiny lesions. Rather than down-sampling, the image patches are up-sampled for size normalization before feeding them to a CNN. As such, tiny lesions become more prominent for their accurate classification.

A total of 1293 images patches, having at least one red lesion, were extracted from all 148 unhealthy images of the e-Ophtha_MA. Data instances were augmented by shifting the centroid to 50 pixels towards the up, down, left, and right directions, and were then flipped vertically and horizontally to produce fifteen patches against the single labeled lesion. This procedure is shown in Figure 3a using colored boxes. A few augmented patches, with centers that were shifted outside the FOV (Figure 3a), were discarded to generate 17,320 unhealthy instances. An equal number of same-sized patches were generated from the 233 healthy images to finally produce 34,640 augmented instances from both classes to fine-tune the CNNs without class imbalance issues at the dataset level. Each instance was saved by assigning it a unique name (Figure 3b) to refer back during post-processing.

By using the small patches, the defect of uneven illumination becomes less significant. Although an attempt was made to explicitly balance the image level by computing a large median filter, contrast local adaptive histogram equalization (CLAHE) was applied to the small patches to improve the intensity variations within the patches. However, no improvements were found by fixing both these defects. Thus, these were ignored in this work.

To use the proposed framework for clinical validation in real-time, the sliding box with a fifty-pixels overlap was applied to extract 200 × 200 patches from a fundus image by keeping the center of patches inside the FOV, to classify them as healthy/unhealthy regions. The patches may have a black portion outside the FOVs, just like the training patches (Figure 3b), but the proposed framework will not skip lesions at FOV boundaries. The overlapping ensures that the lesions at the boundary of extracted patches are included in one of the overlapped regions.

2.2.3. Dataset Distribution (Train/Development/Test Sets)

The dataset of small image patches is divided into the training, validation, and testing sets by distributing an equal number of instances from the healthy and unhealthy classes.

Scheme-1 spits the data into training/testing sets at a 90/10 ratio. The train set is further divided into training/validation subsets using the same 90/10 ratio to generate 28,058 training, 3118 validation, and 3464 test instances. This scheme ensures that separate validation and test sets will contain unseen instances. Thus, Scheme-1 instances were used to perform different experiments to validate and test the proposed method.

The second scheme (Scheme-2) was to use all the data to train and test the proposed technique in a 10-fold cross-validation manner by splitting instances into disjointed folds. It was only applied to increase the number of training instances by generating 31,176/3464 instances in each fold to train/test a suitable pre-trained CNN model for red lesion classification.

2.2.4. CNN Training

Fine-tuning through transfer learning is usually considered when a huge amount of training data is not available to train a deep model from scratch [21]. The success of knowledge transfer depends on the distance and dis-similarity of the source and target data [22,23]. However, the potential to apply it in medical imaging has been shown by Tajbakhsh et al. [21]. All six pre-trained models were fine-tuned by replacing the last three layers. ResNet50 performed better than the others due to its superior architecture.

2.2.5. Baseline ResNet Architecture

The ResNet50 model is based on residual blocks connected by skip connections to address the degradation problem in a very deep CNN model [67]. Skip connections allow a direct flow of feature maps from earlier to deeper layers, which is beneficial for the learning process [70]. In ResNets, these shortcuts allow the model to learn an identity function to confirm that the deeper layer would perform at least as well as the upper layer. This architecture won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2015 [71] and also produced good results in the current task; thus, it was further fine-tuned by freezing the initial layers.

2.2.6. Freezing Initial Layers

A CNN learns features hierarchically by learning the local and global features at the initial and deeper layers, respectively. Low-level local features are common in images of every domain but are quite significant to compute domain-specific global features. To prevent any change in already-learned local features, initial layers are frozen during knowledge transformation. This also speeds up the learning process as fewer learnable parameters are required to update in only un-frozen layers. Freezing more layers would improve the efficiency, but an adequate amount of source knowledge would not be compatible with the new task.

To find the best ratio between the trained and frozen layers, to make the model more adaptive, an iterative approach was adopted from [21] to freeze initial layers up to certain residual blocks to maintain the core structure of the residual architecture of the ResNet50 model.

2.2.7. Architectural Enhancement

The baseline architecture of ResNet50 is enhanced by introducing skip connections and replacing/adding the new layers to reinforce the existing structure to increase its effectiveness. The modified version, the DR-ResNet50 model, is shown in Figure 4.

Reinforced Skip Connections

A plain CNN classifies images by considering high-level global features but ignores low-level features that contain rich information of the fine-grained structures [72]. The local features can be combined with global features by directly connecting the initial layers with subsequent deeper layers using skip connections [73] to produce better results [72,74]. As the activations of the intermediate layers also contain important clues about the local features of the objects in the images, these intermediate layers are also integrated with deeper layers in various manners [75,76,77].

In this study, the features of intermediate layers were fused with the semantic features of subsequent deeper layers through two reinforced skip connections (RSK1 and RSK2). Contrary to the baseline, these reinforced skip connections allow skipping at least four residual blocks. Their connectivity details are given in Section 3.2. The residual mapping,

H_{R S K} (x_{0}),

fit by a reinforced skip connection by skipping n layers, can be expressed as:

H_{R S K} (x_{0}) : = F (x_{n}) + x_{n} + x_{0},

(1)

where

x_{0}

is an input to the reinforced skip connection

H_{R S K} (x_{0})

. Since

H (x) : = F (x) + x

[67]. Thus, for n layers, the residual function will be:

H_{R S K} (x_{0}) : = H (x_{n}) + x_{0} .

(2)

In terms of an aggregate function,

G (x_{0})

, skipped by the reinforced shortcut connection for all n layers, and the residual mapping can be written as:

H_{R S K} (x_{0}) : = G (x_{0}) + x_{0}

(3)

With the above modification, the residual blocks were reinforced by reusing the local features of intermediate layers with global features at deeper layers during a forward pass of the information flow. The new pathways also contributed to reducing the degradation problem of deep CNN models. During backpropagation, these pathways allow errors to directly propagate back to update learnable parameters of CNNs without vanishing/exploding gradient issues.

Both RSK1 and RSK2 are shown in Figure 4 by dotted lines to indicate the dimension mismatch at their sources and in their target blocks. This was resolved by introducing the 1 × 1 convolution. To normalize each input channel across a mini-batch, a batch normalization layer was used between convolutional and activation layers while applying the suggested reinforced shortcuts. The modified architecture (DR-ResNet50) was further refined by altering its deeper layers to accurately classify tiny red lesions.

Layers Modification

A globally applied average pooling layer was replaced by the max-pooling layer to down-sample the data dimensions by dividing the input into pooling regions to compute the maximum of each region. Considering the maximum values instead of the average is more promising to classify less-prominent tiny red lesions. This also contributes to making the proposed CNN model translation-invariant, which is more suitable for the classification of red lesions that appear anywhere on the retina of diabetic patients.

While choosing a suitable pre-trained CNN model, the default fully connected layer of ResNet50 was already changed by a new fc layer with two output neurons to solve the current two-class classification problem. Similarly, the classification layer was already replaced by a new classification layer to retrain the Softmax classifier on the new task of binary classification. The class output layer was replaced earlier by a new layer using the default error loss function.

This loss function is highly sensitive to class imbalance issues. It has not been found effective at detecting small objects to solve DR problems in the past [58,78]. Instead of cross-entropy loss, a dice loss function was applied by Chudzik et al. [58] to handle true negative cases, while classifying each pixel as either a lesion or a non-lesion. Hu et al. [78] improved the cross-entropy loss function to emphasize the difficult instances to effectively detect small objects and to overcome class imbalance issues.

In the current study, there was no class imbalance issue at the dataset level. However, the ratio of the lesion and non-lesion pixels was not balanced in the unhealthy instances due to varying sizes of lesions. Compared to ordinary binary classification tasks, the current problem is more challenging. The proposed CNN was required to train to map an input patch to the corresponding label for all possible lesion locations within a patch. Thus, this binary classification problem is rather a regression problem in disguise, due to the underlying regression nature of the task. Hence, the sum-of-squared-error (SSE) loss is the natural choice to compute the error between predicted and targeted outputs. Thus, the default output layer is replaced by introducing a custom output layer to compute the SSE loss function. For predictions Y and training targets T, SSE loss L between Y and T is computed by the expression:

L = \frac{1}{N} \sum_{n = 1}^{N} \sum_{i = 1}^{k} {(Y_{n i} - T_{n i})}^{2},

(4)

where N is the number of observations in a mini-batch and K is the number of classes. The SSE loss is differentiable, i.e., smooth everywhere to optimize the model parameters using straightforward gradient-based methods. However, SSE loss is generally suitable for solving regression problems but was adapted to solve the current classification job in the following manner.

SSE loss measures the error between two continuous random variables. As the Softmax classifier predicts continuous scores, the intermediate results are converted into class labels by thresholding in the final classification stage. The continuous scores are used to compute SSE loss to optimize the CNN parameters using stochastic gradient descent with momentum (SGDM) to classify the input instances and finally produce a binary decision. Along with the many advantages, there are limitations in using SSE loss.

For instance, outliers in the dataset may dominate the optimization process because each error is squared to compute the SSE loss. Per pixel-based lesion-level annotated dataset was used to effectively train the suggested CNN model to deal with such issues. The obtained results prove the worth of using the SSE loss in the above-discussed manners. The performance was analyzed by computing many metrics.

2.2.8. Evaluation Protocol

The proposed strategy was evaluated on unseen test instances from e-Ophtha_MA. To cross-validate on a per-lesion basis, the 200 × 200 small image patches are extracted from DiaRetDB1 v2.1, ROC, and the segmentation part of the IDRiD datasets. For image-level evaluation, the Messidor dataset and the grading portion of IDRiD were used for cross-validation. For the Messidor dataset, the healthy images {R0} versus unhealthy categories {R1, R2, and R3} were used in this study in the same manner as considered in earlier works [15,16,17]. Sixty-nine unhealthy test images from the DR grading part of the IDRiD dataset are used to classify the image of category {0} versus {1, 2, 3, and 4} for healthy and unhealthy cases. The effectiveness of the suggested framework was evaluated both quantitatively and qualitatively.

Training performance was monitored by validation accuracy. The bias and variance issues were also analyzed to identify overfitting/under-fitting problems. To qualitatively measure the testing ability of the proposed framework on unseen test instances, all well-known performance metrics are computed. These include the computation of accuracy (Acc.), confusion matrix [79], precision, sensitivity (Se.), F1-score, Matthews correlation coefficient (MCC) [80], specificity (Spec.), kappa coefficient (K) [81,82], and the area under the receiving operating characteristics curve [83] (AUC) in terms of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) [84].

For the qualitative analysis, Grad-CAM was computed for each classification decision by calculating the gradient of the final classification score of the final convolutional feature map. The places where this gradient was large were exactly the regions where the final score depended mostly on the data. Such a high dependency region is marked in red on the heat map, as depicted in Figure 4.

2.2.9. Experimental Setup

All experiments are performed on a computer with an Intel Core i7 CPU @ 2.54 GHz, 16-GB RAM, and a NVIDIA GeForce RTX-2060 graphic processing unit (GPU) with 6-GB dedicated memory. MS-Windows 10 (Education edition) was installed on the system. MATLAB 2020a was used to develop the software modules in this study by implementing deep learning, computer vision, and neural network toolboxes. The pre-trained CNN models were loaded with their learnable parameters trained over the ImageNet dataset that contains natural images. The extracted 200 × 200 small image patches are up-sampled by applying a bi-cubic interpolation to normalize their size according to the input requirements of a particular pre-trained CNN model to fine-tune for the current task.

3. Results

Experiments were performed to fine-tune the pre-trained models to select the best performing CNN for architectural reform. The pseudocode for the main flow of execution is given in Appendix A.

3.1. Fine-Tuning

Six famous pre-trained CNN models were fine-tuned with instances of small patches, distributed by Scheme-1. Their maximum test accuracies (Table 4) were achieved by tuning various parameters. ResNet50 performed better than all other models. The highest results are obtained by setting various parameters using the following values.

The fc layer of the ResNet50 model was configured to classify two classes by setting its “Weight-Learn-Rate-Factor” and “Bias-Learn-Rate-Factor” to 20 to speed up the learning process in this layer without any degradation. The model was trained for 30 epochs, with a mini-batch size equal to 32 to evaluate the gradient of the loss function, using default cross-entropy for two-class classification with mutually exclusive classes. The learnable parameters were updated with an initial learning rate set to 0.001, using the stochastic gradient descent as a solver with a momentum value equal to 0.9. All these values were adjusted empirically by performing various experiments. Unless stated otherwise, the same settings were used to perform further experiments. The best performing ResNet50 model was considered for further investigations.

The initial layers of a ResNet50 model are frozen up to six different levels, from 3rd, 4th, 5th, …, 8th residual blocks to find a sufficient amount of source knowledge that must be preserved to train for the current task. The results, in Table 5, show that the performance and efficiency were improved by freezing the initial 58 layers up to the 5th residual block (i.e., RB5, as shown in Figure 4). The baseline architecture of a fine-tuned ResNet50 model was modified to enhance its capability to effectively diagnose DR.

3.2. DR-ResNet50 Architecture

The baseline ResNet50 was further strengthened by combining local and global features using two reinforced skip connections. RSK1 was created to directly add the activations of the 7th residual block with the 13th residual block by skipping five blocks. RSK2 directly added the activations of the 11th block into the last 16th residual block by skipping four blocks. Suitable configurations were found empirically using the experiments. The results obtained by introducing RSK1 and RSK2 are presented in the first two rows of Table 6.

The fourth last layer, i.e., the grand average pooling layer of a default ResNet50 model, is replaced by a global max-pooling (GMP) layer. The 3rd row of Table 6 shows the results achieved by introducing the GMP layer along with RSK1 and RSK2. The improvement in the results demonstrates its significance in solving the current problem. Afterward, the default cross-entropy loss function was replaced by SSE loss. This further boosted the performance, as shown by the results in the 4th row of Table 6. The model converged with lesser epochs. Thus, instead of training for the full 30 epochs, a DR-ResNet50 model was trained by setting an early stopping option.

To further increase the number of training instances, the two remaining experiments were performed by splitting the data instances according to Scheme-2 to train DR-ResNet50 using 10-fold cross-validation. The training plot in Figure 5 shows that there were no bias or variance issues. The model was successfully converged after only eight epochs without suffering from overfitting or under-fitting problems. These promising results were obtained by using the same hyperparameter settings, i.e., the initial learning rate equal to 0.001 and mini-batch size equal to 32.

It took 9 h, 39 min, and 53 s to train the suggested model with 31,176 training instances of small image patches. All 3464 unseen test cases were predicted within 156 s. On average, 45 milliseconds were required to analyze a small 200 × 200 image patch. Thus, a complete 2000 × 2000 high-resolution fundus image can be analyzed approximately in 450 milliseconds. Thus, screening of a patient could be accomplished by analyzing the fundus images of both eyes within one second. Such efficient computer-aided assistance would be beneficial for ophthalmologists to reduce their burden for screening DR patients at an early stage.

To further boost the efficiency and performance of the suggested framework, various other hyperparameter values were also considered. However, efficiency could not be further improved without any performance de-generation. More time is usually required to converge a model for smaller learning rate values. On the other hand, higher values may lead to sub-optimal results. A very small mini-batch size usually produces noisy estimates; however, a bigger size demands more resources to load a large number of training instances in the memory for a particular execution environment (CPU or GPU).

The performance of the proposed model on unseen test instances from e-Ophtha_MA images was measured by computing the confusion matrix, as shown in Figure 6. For critical evaluations, many well-known metrics were also evaluated within a 95% confidence interval (CI).

The mean values of the results, in terms of commonly computed performance measures, are summarized in the 5th row of Table 6. Various other metrics were also computed to comprehensively evaluate the suggested model. For this experiment, the mean values obtained for precision, F1-score, false-positive rate, Matthews’s correlation coefficient, and the kappa coefficient were equal to 0.9910, 0.9939, 0.0029, 0.9879, and 0.9879, respectively. These results were based on a per-lesion basis.

The per-image evaluation was also measured for unseen test instances from e-Ophtha_MA to evaluate the overall risk related to the entire image. The results are mentioned in the 6th row of Table 6, and the corresponding ROC curve is shown in Figure 7. The area under the ROC curve (AUC) is useful to measure the classifier’s ability to distinguish between two classes. The image-level results were compared with state-of-the-art methods.

For qualitative assessments, a visual interpretation was computed using Grad-CAM for each classification decision. The most challenging successful and unsuccessful cases for healthy and unhealthy classes are illustrated in Figure 8 and Figure 9, respectively. This visual representation helps to qualitatively analyze the correctness of the system, to use the proposed automated method for clinical validation with full confidence.

To assist ophthalmologists in efficient retinal image analyses, the classification decisions were also projected back on the original fundus images, as shown by the blue and white boxes in Figure 10. This was accomplished by using the names assigned to a specific patch during patch generation (Figure 3b) to associate back to its actual position on the fundus image. The proposed framework was also cross-validated on DiaRetDB1 v2.1, ROC, IDRiD, and Messidor datasets on a per-lesion and per-image basis. The obtained results are summarized in Table 7.

It can be inferred from the results obtained by the above experiments that the performance of the modified version of ResNet50, i.e., DR-ResNet50, was improved significantly. The suggested modifications are simple, robust, and were found to be very effective in enhancing a deep CNN architecture to make it more adaptive to the new task and to effectively train it with a small training set to diagnose tiny and less-prominent red lesions due to DR without vanishing/exploding gradients or performance degradation issues. The results are very promising to consider the suggested method for clinical trials.

4. Discussion

To devise an effective early DR screening system, the power of deep CNN is utilized in this work. Due to the unavailability of large training datasets, training a deep CNN from scratch is not feasible. On the other hand, only fine-tuning through transfer learning is not useful due to unrelated natural and medical imaging domains. To address these issues, architectural reforms are suggested to fine-tune CNNs, even when the source and target domains are not related to each other. Six famous pre-trained CNN models are considered to select the most suitable architecture. In 2018, a related study was conducted by Wan et al. [56] to globally classify images [62] for DR grading. In the current study, the models are fine-tuned over the dataset of small 200 × 200 image patches to accurately classify red lesions.

The results (Table 4) of all six fine-tuned CNN models depict that errors are reduced with depth and versatile architectures. AlexNet, VGG, and GoogLeNet consist of 8, 16, and 22 layers, respectively. VGG19 was also fine-tuned, but its performance was not found good for the current task. The bias and variance issues were also higher than the VGG16 model. Thus, VGG19 is not considered in this study. ResNet50 performs better than other models and the parameter complexity is also less than VGG nets. The ResNet50 and DenseNet models are much deeper than the remaining CNNs.

Depth is important to learn complex functions, but deep models are difficult to train. Although DenseNet is even deeper than ResNet50 and features are reused in all dense blocks, its performance is not found to be better than ResNet50; the available smaller number of data instances could not sufficiently fulfill the training requirements of the huge number of learnable parameters of DenseNet. The second argument for the better results of ResNet50 is due to its superior architecture. Residual blocks are connected through skip connections to learn residual functions regarding layer inputs instead of learning unreferenced functions that are usually learned by all other CNNs. The features of the previous layer are summed up with deeper layers, whereas, the previous layer’s features are concatenated with the deeper layers in the DenseNet. There is only a single input of each layer in the ResNet50; however, the number of inputs for the ith layer is equal to “i” in a DenseNet. The dense connections are used to address the vanishing/exploding gradient problem, but ResNets use stochastic depth. The power of DenseNet is due to its depth, whereas ResNets use the strength of feature reuse. Due to these advantages, ResNet50 performed better than the others models and is selected to further enhance its generalization ability.

Preserving the local features by freezing the initial layers is also found to be effective to compute the domain-specific features of fundus images. The highest results (Table 5) are achieved by freezing the initial 58 layers of the baseline ResNet50 model. This shows that the local features up to these layers, computed earlier during its full training from scratch on natural images of the ImageNet dataset, are useful to compute semantic features for the current task. Table 5 also illustrates that freezing more layers causes a decrease in performance because deeper layers are computing domain-specific semantic features for the new task. The suggested fine-tuning contributed to adapting the ResNet50 architecture to the new problem of DR classification.

The learnable parameters of the next layers in two residual blocks (RB6 and RB7) are updated with the new data of small patches to learn features for red lesion classification. Figure 4 shows these two blocks in comparatively dark blue colors to indicate their more adaptive and optimized layers after fine-tuning.

The features of intermediate layers are reused by fusing them with global features to compute the most discriminative features using two reinforced skip connections (RSK1 and RSK2). Additional pathways have contributed to reducing the degradation and vanishing/exploding gradient problems. The results in the first two rows of Table 6 indicate improvements due to these amendments. Their target residual blocks, i.e., RB13 and RB16, are shown in Figure 4 by even darker/darkest blue to indicate even more optimized layers for the current task. Along with these enhancements, the last four layers are further refined to accurately classify healthy and unhealthy cases.

The default global average pooling (GAP) layer is replaced by a global max-pooling (GMP) layer to reduce data dimensions and to predict the less prominent tiny red lesions anywhere in the given sample patch. It is evident from the results, reported in the 3rd row of Table 6, that the performance is improved by using the max-pooling layer. The bias and variance issues are also reduced further by its implementation.

The fully connected (fc) and Softmax classification layers were already replaced by the new layers to train them completely for the current DR classification. SSE loss is introduced in a custom output layer to accurately classify small 200 × 200 patches with an unbalanced ratio of the lesion and non-lesion pixels. The results, reported in the 4th row of Table 6, confirm that the performance is further boosted by using the SSE loss. The bias and variance issues are also fixed by this modification. The improvements in results justify all the suggested modifications to solve the current task effectively.

To further increase the performance of the suggested DR-ResNet50 model, it is trained with more training instances, generated by Scheme-2. The per-lesion and per-image evaluations in 3464 unseen test instances are mentioned in the 5th and 6th rows of Table 6. It can be analyzed from the 4th, and 5th rows of Table 6 that pre-lesion results are slightly improved using Scheme-2, but the obtained results are not too different because approximately six thousand extra training instances (31,176 – 28,058 = 6118) were used in Scheme-2. The suggested model is not overfitting with the small amount of training data. This fact is also confirmed by the plotting training progress. No bias or variance issues are found in this training plot in Figure 5.

The 5th and 6th rows of Table 6 show that there is no big difference between per-lesion and per-image results. Per-image evaluation is computed to judge the overall risk related to the entire image by applying the same protocol used by Niemeijer et al. [14]. The slight increase in sensitivity results, as shown in the 6th row of Table 6, would be due to more red lesions on a per-image basis. Similarly, the false positives are increased due to the increased number of healthy regions on a per-image basis.

The performance is quantitatively evaluated by computing well-known metrics. Sensitivity measures the true positive rate (TPR) to evaluate how well the model performs to diagnose a disease while specificity estimates the true negative rate (TNR). However, there is a trade-off between these performance measures. Table 6 indicates that the higher specificity values are obtained against the comparatively lower sensitivity results. An effective automated system must be able to accurately diagnose disease. Thus, higher sensitivity is more important than specificity for an effective DR diagnosis. The confusion matrix in Figure 6 demonstrates the overall performance of the suggested method. The false-negative rate is less than 1% (i.e., 0.9%), while the false positive rate is 0.1% on unseen test instances. The ROC plot, shown in Figure 7, and the obtained AUC results, further verify the effectiveness of the suggested method.

A CNN model works as a black box by hiding its internal working. The performance of the suggested model is assessed qualitatively by analyzing the hardest successful and unsuccessful cases from both categories, as already shown in Figure 8 and Figure 9.

Figure 8a,b depicts the true positive and true negative cases that are successfully classified by the proposed method. It can be seen that a positive case (a-1) due to the presence of a red lesion, indicated by a green arrow, is classified by the proposed method accurately. The corresponding heat map of the Grad-CAM on the right side of the input instance highlights the lesion area in red. It is quite natural to see more than one lesion in a 200 × 200 image patch, as shown in the case of (a-2) in Figure 8. The corresponding visual interpretation by Grad-CAM indicates that the system makes the classification decision due to both lesions. It means that the system correctly classifies the image patch as unhealthy and also identifies all lesions inside a patch. In this manner, the patch classification is exactly analogous to a per-lesion evaluation. The case in (a-3) shows that a proposed strategy is successfully classifying a less-prominent lesion. Similarly, in the case of (a-4), the boundary lesion is classified by the system accurately. Thus, Grand-CAM visualization is assisting in giving an idea about the appearance of the red lesions inside a patch. However, using Grand-CAM for lesion localization is not appropriate because it generates a heat map to highlight the image portion that is dominated to make a particular decision. Such dominated areas are also indicated in the Grad-CAM for healthy instances, as shown in all cases of Figure 8b. Along with a large majority of successful cases, there are very few false classification cases from healthy and unhealthy classes, as illustrated in Figure 9.

Figure 9a shows a few difficult instances of false-negative cases (c-1, c-2, c-3, and c-4) due to extremely poor contrast of the tiny red lesions. The false-positive cases are illustrated in Figure 9b; however, these false identifications are in favor of DR screening. In the case of (d-1), two bright lesions are present in extremely dark surroundings. As dark red lesions are absent in this small patch, the patch was labeled as healthy but the suggested method accurately identified them as unhealthy patches. This indicates that the proposed CNN model can classify healthy and unhealthy cases due to dark red and bright lesions of DR. In the case of (d-2), the false-positive case is due to extensive intensity variations of background pixels. In the case of (d-3), the false-positive decision is caused by the vessel segment near the boundary of the image patch. The dark spot in the case of (d-4) is identified by the system as a red lesion of DR. This lesion would probably not be marked by experts in the ground truth of the dataset.

Qualitative investigation shows that the proposed framework is effective in successfully classifying a large amount of healthy and unhealthy cases. The identifications of a few false-positive cases are also in favor of diagnosing the disease. However, in the case of d-3, the system is distracted by a small piece of the vessel inside the small patch. All the false-negative cases are due to the extremely poor contrast of red lesions. Although the objective of an automated DR diagnosis is not to successfully identify all cases. This aspect may be addressed in the future to further improve the performance of the proposed method.

The cross-validation results on four public datasets (Table 7) show that the sensitivity values are higher than 80%, which is as per the recommendations of the international guidelines for DR screening [85]. According to the British Diabetic Association, London, the value of the specificity must be greater than 87 for the retinopathy screening. However, a slight decrease in cross-validation results is observed as compared to e-Ophtha_MA because images from different sources are being used to cross-validate the suggested model. In the case of the IDRiD, the difference is greater than with other datasets. The probable reason for this is due to the utilization of fundus images from different groups of people (Indian population) to validate the suggested model. Another possibility of even more degradation in the case of the IDRiD dataset would be the use of images from different parts of this dataset. Such issues might be tackled by training a deep CNN over a large variety of fundus images from patients from different ethnicities and geographical locations. However, the performance of the suggested method is still higher than the state-of-the-art deep CNN-based methods.

The comparison of image-level results in terms of sensitivity and AUC within 95% of the CI is summarized in Table 8. The variations in the obtained results lie in smaller intervals. Contrary to other methods, the specificity value in the current experiments is not fixed to 50% to assess the actual capability of the suggested method. In the case of a Messidor dataset, the performance of the two experts is also compared with the suggested method. The obtained results are better than others except for Expert A.

In 2005, a single traditional method [14] achieved 100% sensitivity on a per-image basis (Table 1); however, the method only achieved 30% sensitivity on a per lesion basis. This anomaly was argued by the authors as some images might contain more red lesions than others. Most of the traditional CADs are only limited to the testing environment due to the poor representation abilities of handcrafted features. To avoid the constraints of handcrafted-based CADs, a more reliable CNN-based technique is suggested in this study.

The domain adaption, performance degradation, and vanishing/exploding gradient issues of a deep CNN model are adequately addressed by proposing interesting modifications. Such alterations will also be useful to enhance many CNNs to solve related problems with minimal datasets. Promising results demonstrate the effectiveness and robustness of the proposed DR-ResNet50 framework to recommend practical implementations to reduce the burden of eye specialists. The visual clarification of each classification also supports the practical utilization of the suggested method for clinical trials with more satisfaction. It would be helpful to develop a reliable healthcare system by assisting ophthalmologists to save many diabetic patients from blindness.

Along with the above experiments, the Dropout layer was also applied in the DR-ResNet50 after the fc layer with 60%, 50%, and 40% dropout to regularize the suggested model. But this amendment could not further improve the highest achieved results. One of the possibilities would be the use of batch normalization layers that have already regularized this architecture. Similarly, leaky ReLU and center loss were not found to be effective in further boosting the performance of a DR-ResNset50 model. Thus, their outcomes are not reported in the final results.

DR-ResNet50 was also used as a feature extractor by extracting the features from the last fc layer. A support vector machine (SVM) was applied to classify these features by using the radial bases function (RBF) kernel after optimizing its parameters. The false-positive rate (FPR) was slightly reduced by this effort; however, the overall accuracy could not be enhanced, because features were extracted after fine-tuning the DR-ResNet50 model with the Softmax classifier, due to its better performance in end-to-end learning. For this reason, the SVM might be unable to further improve the performance. Instead of using Softmax in the fine-tuning of DR-ResNet50, the implementation of SVM as a classifier could be useful for improvements. Considering an ensemble classification by deploying more advanced classifiers might be beneficial for further improvements.

Due to the advantages of SSE loss, it was also applied in the rest of the renowned pre-trained models, i.e., AlexNet, VGG16, GoogLeNet, Inception-v3, and DenseNet, to fine-tune them. Their performances were also improved compared to the default cross-entropy loss, but these models could not conquest the highest results achieved by the proposed model, DR-ResNet50. These facts also certify the strength of SSE loss to solve current red lesion classifications.

The major limitation of applying a CNN for medical image analysis is the unavailability of a large variety of annotated fundus images. Generative adversarial networks (GANs) would be useful to generate them. The suggested model is also distracted by extremely poor contrast of red lesions in the presence of distractors. More effective contrast enhancement might be useful to correctly classify them for early screening of DR.

5. Conclusions

A deep CNN-based framework is suggested to classify red lesions for early DR diagnoses. Training a CNN model from scratch or fine-tuning it through transfer learning is challenging due to the un- availability of large datasets. These issues are addressed by modifying the architecture of a pre-trained CNN model. A sufficient amount of prior knowledge is preserved by freezing initial layers to adapt the architecture to solve the current task. Performance degradation and vanishing/exploding gradient issues are addressed by combining the local and global features through two skip connections. The experiments revealed that the skip connections effectively improve the performance of the proposed DR-ResNet50 model. A max-pooling layer also played a vital role in classifying healthy and unhealthy cases. The SSE loss is found more effective in fine-tuning the modified CNN. The visual interpretation of each result shows that the model draws its decisions from features extracted from the regions of the red lesions. Compared to state-of-the-art methods, the proposed method is found to be more effective in producing promising results. The method gives a high sensitivity with low false-positive rates, which is significant for earlier screening of DR. However, generative adversarial networks will be investigated in the future to produce a large variety of fundus images to train the proposed framework meritoriously. For accurate classification of red lesions with extremely poor contrast in the presence of distractors, the formulation of a better contrast enhancement technique is also included in future work.

Author Contributions

Conceptualization, M.H. and Z.H.; Data curation, M.N.A.; Formal analysis, M.N.A. and M.H.; Funding acquisition, Z.H.; Investigation, M.N.A.; Methodology, M.N.A. and M.H.; Project administration, Z.H.; Resources, Z.H.; Supervision, M.H. and Z.H.; Writing—original draft, M.N.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the PDE-GIR project which has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska Curie grant agreement No 778035. The research was also supported under Researchers Supporting Project number (RSP-2019/109) King Saud University, Riyadh, Saudi Arabia.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The publicly archived datasets [13,25,26,27,28,62] have been analyzed during this study and are already cited in Section 2.1.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

This appendix contains pseudocode for the main flow of execution to classify both red lesions for early screening of DR patients.

Table A1. Pseudocode shows various steps to classify red lesions for DR screening.

FOV Extraction

Small patch generation

Extract patches centroid at the lesions from unhealthy images

Extract adjacent patches centered within FOVs from healthy images

Data augmentation and distribution into train, validation, and test sets using Scheme 1 or 2

Choosing a suitable pre-trained CNN model through transfer learning

Development of DR-ResNet50 model

Training phase

Fine tunning through transfer learning coupled with

Hyper-parameters settings

Freezing initial layers

Architectural modifications

Reinforced Skip connections

Deep layers modifications

Replace GAP with GMP

Replace Softmax for a new task

Replace fc for binary classification

Replace output layer with SSE loss

Training and validation with performance monitoring

Testing phase

Load trained DR-ResNet50 model

Prediction on unseen Test instances

if ‘cross-validation’ then

Extract all overlapped patches within FOVs

Prediction to cross-validate on different datasets

Post-processing to evaluate performance by computing quantitative metrics and Grad-CAM

References

Baena-Díez, J.M.; Peñafiel, J.; Subirana, I.; Ramos, R.; Elosua, R.; Marín-Ibañez, A.; Guembe, M.J.; Rigo, F.; Tormo-Díaz, M.J.; Moreno-Iribas, C.; et al. Risk of Cause-Specific Death in Individuals With Diabetes: A Competing Risks Analysis. Diabetes Care 2016, 39, 1987. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Cho, N.H.; Shaw, J.E.; Karuranga, S.; Huang, Y.; da Rocha Fernandes, J.D.; Ohlrogge, A.W.; Malanda, B. IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045. Diabetes Res. Clin. Pract. 2018, 138, 271–281. [Google Scholar] [CrossRef] [PubMed]
Wong, T.Y.; Aiello, L.P.; Ferris, F.; Gupta, N.; Kawasaki, R.; Lansingh, V.; Maia, M.; Mathenge, W. International Council of Ophthalmology (ICO): Updated 2017 Guidelines for Diabetic Eye Care; International Council of Ophthalmology (ICO): San Francisco, CA, USA, 2017. [Google Scholar]
Cheung, N.; Mitchell, P.; Wong, T.Y. Diabetic retinopathy. Lancet 2010, 376, 124–136. [Google Scholar] [CrossRef]
Williams, R.; Colagiuri, S.; Almutairi, R.; Montoya, P.A.; Abdul, B.; Beran, D.; Besançon, S.; Bommer, C.; Borgnakke, W.; Boyko, E.; et al. International Diabetes Federation. IDF Diabetes Atlas, 9th ed.; International Diabetes Federation: Brussels, Belgium, 2019. [Google Scholar]
Stiefelhagen, M.P. Mehr als eine Stoffwechselerkrankung: Der Diabetes verschont kaum ein Organ. MMW Fortschritte Medizin 2019, 161, 21–22. [Google Scholar]
Klein, B.E.K. Overview of Epidemiologic Studies of Diabetic Retinopathy. Ophthalmic Epidemiol. 2007, 14, 179–183. [Google Scholar] [CrossRef]
Scanlon, P.H. Diabetic retinopathy. Medicine 2015, 43, 13–19. [Google Scholar] [CrossRef]
Butler, J.M.; Guthrie, S.M.; Koc, M.; Afzal, A.; Caballero, S.; Brooks, H.L.; Mames, R.N.; Segal, M.S.; Grant, M.B.; Scott, E.W. SDF-1 is both necessary and sufficient to promote proliferative retinopathy. J. Clin. Investig. 2005, 115, 86–93. [Google Scholar] [CrossRef] [Green Version]
Abràmoff, M.D.; Garvin, M.K.; Sonka, M. Retinal imaging and image analysis. IEEE Rev. Biomed. Eng. 2010, 3, 169–208. [Google Scholar] [CrossRef] [Green Version]
Karadeniz, S.; Zimmet, P.; Aschner, P.; Belton, A.; Cavan, D.; Jalang’o, A.; Gandhi, N.; Hill, L.; Makaroff, L.; Mesurier, R.L.; et al. Diabetes Eye Health: A guide for Health Care Professionals; International Diabetes Federation: Brussels, Belgium, 2015. [Google Scholar]
Klonoff, D.C.; Schwartz, D.M. An economic analysis of interventions for diabetes. Diabetes Care 2000, 23, 390. [Google Scholar] [CrossRef] [Green Version]
Decencière, E.; Cazuguel, G.; Zhang, X.; Thibault, G.; Klein, J.C.; Meyer, F.; Marcotegui, B.; Quellec, G.; Lamard, M.; Danno, R.; et al. TeleOphta: Machine learning and image processing methods for teleophthalmology. IRBM 2013, 34, 196–203. [Google Scholar] [CrossRef]
Niemeijer, M.; Ginneken, B.v.; Staal, J.; Suttorp-Schulten, M.S.A.; Abramoff, M.D. Automatic detection of red lesions in digital color fundus photographs. IEEE Trans. Med. Imaging 2005, 24, 584–592. [Google Scholar] [CrossRef] [Green Version]
Seoud, L.; Hurtut, T.C.J.; Cheriet, F.J.M.; Langlois, P. Red lesion detection using dynamic shape features for diabetic retinopathy screening. IEEE Trans. Med. Imaging 2016, 35, 1116–1126. [Google Scholar] [CrossRef] [PubMed]
Orlando, J.I.; Prokofyeva, E.; Del Fresno, M.; Blaschko, M.B. An ensemble deep learning based approach for red lesion detection in fundus images. Comput. Methods Programs Biomed. 2018, 153, 115–127. [Google Scholar] [CrossRef] [Green Version]
Zago, G.T.; Andreão, R.V.; Dorizzi, B.; Teatini Salles, E.O. Diabetic retinopathy detection using red lesion localization and convolutional neural networks. Comput. Biol. Med. 2020, 116, 103537. [Google Scholar] [CrossRef] [PubMed]
Schmidt-Erfurth, U.; Sadeghipour, A.; Gerendas, B.S.; Waldstein, S.M.; Bogunovic, H. Artificial intelligence in retina. Prog. Retin. Eye Res. 2018, 67, 1–29. [Google Scholar] [CrossRef] [PubMed]
Laÿ, B. Analyse Automatique Des Images Angio Fluorographiques Au Cours De La Retinopathie Diabetique. Ph.D. Thesis, Centre of Mathematical Morphology, Paris School of Mines, Paris, France, 1983. [Google Scholar]
Baudoin, E.C.; Laÿ, B.J.; Klein, J.C. Automatic detection of microaneurysms in diabetic fluorescein angiographies. Rev. D’Épidémiol. Sante Publique 1984, 32, 254–261. [Google Scholar]
Tajbakhsh, N.; Shin, J.Y.; Gurudu, S.R.; Hurst, R.T.; Kendall, C.B.; Gotway, M.B.; Liang, J. Convolutional neural networks for medical image analysis: Full training or fine Tuning? IEEE Trans. Med. Imaging 2016, 35, 1299–1312. [Google Scholar] [CrossRef] [Green Version]
Azizpour, H.; Razavian, A.S.; Sullivan, J.; Maki, A.; Carlsson, S. From generic to specific deep representations for visual recognition. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA, 7–12 June 2015; pp. 36–45. [Google Scholar]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Wang, Y.; Wang, H.; Peng, Z. Rice diseases detection and classification using attention based neural network and bayesian optimization. Expert Syst. Appl. 2021, 178, 114770. [Google Scholar] [CrossRef]
Kauppi, T.; Kalesnykiene, V.; Sorri, I.; Raninen, A.; Voutilainen, R.; Kamarainen, j.; Lensu, L.; Uusitalo, H. Diabetic Retinopathy Database and Evaluation Protocol (DiaRetDB1 V2.1); Machine Vision and Pattern Recognition Laboratory, Lappeenranta University of Technology: Lappeenranta, Finland, 2009. [Google Scholar]
Niemeijer, M.; Ginneken, B.v.; Cree, M.J.; Mizutani, A.; Quellec, G.; Sanchez, C.I.; Zhang, B.; Hornero, R.; Lamard, M.; Muramatsu, C.; et al. Retinopathy Online Challenge: Automatic Detection of Microaneurysms in Digital Color Fundus Photographs. IEEE Trans. Med. Imaging 2010, 29, 185–195. [Google Scholar] [CrossRef]
Decencière, E.; Zhang, X.; Cazuguel, G.; Lay, B.; Cochener, B.; Trone, C.; Gain, P.; Ordonez, R.; Massin, P.; Erginay, A.; et al. Feedback on a publicly distributed image database: The Messidor Database. Image Anal. Stereol. 2014, 33, 231–234. [Google Scholar] [CrossRef] [Green Version]
Prasanna, P.; Samiksha, P.; Ravi, K.; Manesh, K.; Girish, D.; Vivek, S.; Fabrice, M. Indian Diabetic Retinopathy Image Dataset (IDRiD). IEEE Dataport 2018. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Ashraf, M.N.; Hussain, M.; Habib, Z. Review of Various Tasks Performed in the Preprocessing Phase of a Diabetic Retinopathy Diagnosis System. Curr. Med. Imaging 2020, 16, 397–426. [Google Scholar] [CrossRef] [PubMed]
Wu, B.; Zhu, W.; Shi, F.; Zhu, S.; Chen, X. Automatic detection of microaneurysms in retinal fundus images. Comput. Med. Imaging Graph. 2017, 55, 106–112. [Google Scholar] [CrossRef]
Yadav, D.; Kumar Karn, A.; Giddalur, A.; Dhiman, A.; Sharma, S.; Muskan; Yadav, A.K. Microaneurysm Detection Using Color Locus Detection Method. Measurement 2021, 176, 109084. [Google Scholar] [CrossRef]
Grinsven, M.J.J.P.v.; Ginneken, B.v.; Hoyng, C.B.; Theelen, T.; Sánchez, C.I. Fast Convolutional Neural Network Training Using Selective Data Sampling: Application to Hemorrhage Detection in Color Fundus Images. IEEE Trans. Med. Imaging 2016, 35, 1273–1284. [Google Scholar] [CrossRef]
Sinthanayothin, C.; Boyce, J.F.; Williamson, T.H.; Cook, H.L.; Mensah, E.; Lal, S.; Usher, D. Automated detection of diabetic retinopathy on digital fundus images. Diabet. Med. 2002, 19, 105–112. [Google Scholar] [CrossRef] [Green Version]
Larsen, M.; Godt, J.; Larsen, N.; Lund-Andersen, H.; Sjølie, A.K.; Agardh, E.; Kalm, H.; Grunkin, M.; Owens, D.R. Automated Detection of Fundus Photographic Red Lesions in Diabetic Retinopathy. Investig. Ophthalmol. Vis. Sci. 2003, 44, 761–766. [Google Scholar] [CrossRef]
Grisan, E.; Ruggeri, A. Segmentation of Candidate Dark Lesions in Fundus Images Based on Local Thresholding and Pixel Density. In Proceedings of the Proceedings of the 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS2007), Lyon, France, 23–26 August 2007; pp. 6735–6738. [Google Scholar]
García, M.; Sánchez, C.I.; López, M.I.; Díez, A.; Hornero, R. Automatic Detection of Red Lesions in Retinal Images Using a Multilayer Perceptron Neural Network. In Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS2008), Vancouver, BC, Canada, 20–25 August 2008; pp. 5425–5428. [Google Scholar]
Zhang, B.; Wu, X.; You, J.; Li, Q.; Karray, F. Hierarchical Detection of Red Lesions in Retinal Images by Multiscale Correlation Filtering; SPIE: Orlando, FL, USA, 2009; Volume 7260. [Google Scholar]
Saleh, M.D.; Eswaran, C. An automated decision-support system for non-proliferative diabetic retinopathy disease based on MAs and HAs detection. Comput. Methods Programs Biomed. 2012, 108, 186–196. [Google Scholar] [CrossRef]
Ashraf, M.N.; Habib, Z.; Hussain, M. Texture Feature Analysis of Digital Fundus Images for Early Detection of Diabetic Retinopathy. In Proceedings of the 11th International Conference on Computer Graphics, Imaging and Visualization: New Techniques and Trends, CGIV, Singapore, 6–8 August 2014; pp. 57–62. [Google Scholar]
Ashraf, M.N.; Habib, Z.; Hussain, M. Computer Aided Diagnose of Diabetic Retinopathy; LAP LAMBERT Academic Publishing: Deutschland, Germany, 2015. [Google Scholar]
Srivastava, R.; Wong, D.W.K.; Duan, L.; Liu, J.; Wong, T.Y. Red Lesion Detection In Retinal Fundus Images Using Frangi-Based Filters. In Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015; pp. 5663–5666. [Google Scholar]
Srivastava, R.; Duan, L.; Wong, D.W.K.; Liu, J.; Wong, T.Y. Detecting retinal microaneurysms and hemorrhages with robustness to the presence of blood vessels. Comput. Methods Programs Biomed. 2017, 138, 83–91. [Google Scholar] [CrossRef]
Xiao, Z.; Zhang, X.; Geng, L.; Zhang, F.; Wu, J.; Tong, J.; Ogunbona, P.O.; Shan, C. Automatic non-proliferative diabetic retinopathy screening system based on color fundus image. Biomed. Eng. Online 2017, 16, 122. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Colomer, A.; Igual, J.; Naranjo, V. Detection of Early Signs of Diabetic Retinopathy Based on Textural and Morphological Information in Fundus Images. Sensors 2020, 20, 1005. [Google Scholar] [CrossRef] [Green Version]
Walter, T.; Massin, P.; Arginay, A.; Ordonez, R.; Jeulin, C.; Klein, J.C. Automatic detection of microaneurysms in color fundus images. Med. Image Anal. 2007, 11, 555–566. [Google Scholar] [CrossRef] [PubMed]
Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.; van Ginneken, B.; Sanchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef] [Green Version]
Suzuki, K. Overview of deep learning in medical imaging. Radiol. Phys. Technol. 2017, 10, 257–273. [Google Scholar] [CrossRef] [PubMed]
Asiri, N.; Hussain, M.; Al Adel, F.; Alzaidi, N. Deep learning based computer-aided diagnosis systems for diabetic retinopathy: A survey. Artif. Intell. Med. 2019, 99, 101701. [Google Scholar] [CrossRef] [Green Version]
Haloi, M. Improved Microaneurysm Detection using Deep Neural Networks. arXiv 2015, arXiv:1505.04424. [Google Scholar]
Pratt, H.; Coenen, F.; Broadbent, D.M.; Harding, S.P.; Zheng, Y. Convolutional Neural Networks for Diabetic Retinopathy. Procedia Comput. Sci. 2016, 90, 200–205. [Google Scholar] [CrossRef] [Green Version]
Gulshan, V.; Peng, L.; Coram, M.; Stumpe, M.C.; Wu, D.; Narayanaswamy, A.; Venugopalan, S.; Widner, K.; Madams, T.; Cuadros, J.; et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA 2016, 316, 2402–2410. [Google Scholar] [CrossRef]
Abramoff, M.D.; Lou, Y.; Erginay, A.; Clarida, W.; Amelon, R.; Folk, J.C.; Niemeijer, M. Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning. Investig. Ophthalmol. Vis. Sci. 2016, 57, 5200–5206. [Google Scholar] [CrossRef] [Green Version]
Gargeya, R.; Leng, T. Automated Identification of Diabetic Retinopathy Using Deep Learning. Ophthalmology 2017, 124, 962–969. [Google Scholar] [CrossRef]
Quellec, G.; Charriere, K.; Boudi, Y.; Cochener, B.; Lamard, M. Deep image mining for diabetic retinopathy screening. Med. Image Anal. 2017, 39, 178–193. [Google Scholar] [CrossRef] [Green Version]
Wan, S.; Liang, Y.; Zhang, Y. Deep convolutional neural networks for diabetic retinopathy detection by image classification. Comput. Electr. Eng. 2018, 72, 274–282. [Google Scholar] [CrossRef]
Voets, M.; Møllersen, K.; Bongo, L.A. Reproduction study using public data of: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. PLoS ONE 2019, 14, e0217541. [Google Scholar] [CrossRef]
Chudzik, P.; Majumdar, S.; Caliva, F.; Al-Diri, B.; Hunter, A. Microaneurysm detection using fully convolutional neural networks. Comput. Methods Programs Biomed. 2018, 158, 185–192. [Google Scholar] [CrossRef] [PubMed]
LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 2015 International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Panwar, H.; Gupta, P.K.; Siddiqui, M.K.; Morales-Menendez, R.; Bhardwaj, P.; Singh, V. A deep learning and grad-CAM based color visualization approach for fast detection of COVID-19 cases using chest X-ray and CT-Scan images. Chaos Solitons Fractals 2020, 140, 110190. [Google Scholar] [CrossRef] [PubMed]
Cuadros, J.; Bresnick, G. EyePACS: An Adaptable Telemedicine System for Diabetic Retinopathy Screening. J. Diabetes Sci. Technol. 2009, 3, 509–516. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wu, L.; Fernandez-Loaiza, P.; Sauma, J.; Hernandez-Bogantes, E.; Masis, M. Classification of diabetic retinopathy and diabetic macular edema. World J. Diabetes 2013, 4, 290–294. [Google Scholar] [CrossRef] [PubMed]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 6–8 December 2013; pp. 1097–1105. [Google Scholar]
Szegedy, C.; Wei, L.; Yangqing, J.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper With Convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Huang, G.; Liu, Z.; Maaten, L.v.d.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
Schirrmeister, R.T.; Springenberg, J.T.; Fiederer, L.D.J.; Glasstetter, M.; Eggensperger, K.; Tangermann, M.; Hutter, F.; Burgard, W.; Ball, T. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 2017, 38, 5391–5420. [Google Scholar] [CrossRef] [Green Version]
Drozdzal, M.; Vorontsov, E.; Chartrand, G.; Kadoury, S.; Pal, C. The Importance of Skip Connections in Biomedical Image Segmentation. In Deep Learning and Data Labeling for Medical Applications; Springer: Cham, Switzerland, 2016; pp. 179–187. [Google Scholar]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
Hua, Y.; Mou, L.; Zhu, X.X. LAHNet: A Convolutional Neural Network Fusing Low- and High-Level Features for Aerial Scene Classification. In Proceedings of the IGARSS 2018–2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 4728–4731. [Google Scholar]
Bishop, C.M. Neural Networks For Pattern Recognition; Oxford University Press: Oxford, UK, 1995. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks For Semantic Segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Liu, Y.; Guo, Y.; Georgiou, T.; Lew, M.S. Fusion that matters: Convolutional fusion networks for visual recognition. Multimed. Tools Appl. 2018, 77, 29407–29434. [Google Scholar] [CrossRef] [Green Version]
Agrawal, P.; Girshick, R.; Malik, J. Analyzing the Performance of Multilayer Neural Networks for Object Recognition. In Proceedings of the Computer Vision–ECCV 2014; Springer International Publishing: Cham, Switzerland, 2014; pp. 329–344. [Google Scholar]
Yoo, D.; Park, S.; Lee, J.; In So, K. Multi-scale pyramid pooling for deep convolutional representation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA, 7–12 June 2015; pp. 71–80. [Google Scholar]
Hu, K.; Zhang, Z.; Niu, X.; Zhang, Y.; Cao, C.; Xiao, F.; Gao, X. Retinal vessel segmentation of color fundus images using a multiscale convolutional neural network with an improved cross-entropy loss function. Neurocomputing 2018, 309, 179–191. [Google Scholar] [CrossRef]
Stehman, S.V. Selecting and interpreting measures of thematic classification accuracy. Remote Sens. Environ. 1997, 62, 77–89. [Google Scholar] [CrossRef]
Matthews, B.W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta (BBA) Protein Struct. 1975, 405, 442–451. [Google Scholar] [CrossRef]
McHugh, M.L. Interrater reliability: The kappa statistic. Biochem. Med. 2012, 22, 276–282. [Google Scholar] [CrossRef]
Landis, J.R.; Koch, G.G. The measurement of observer agreement for categorical data. Biometrics 1977, 33, 159–174. [Google Scholar] [CrossRef] [Green Version]
Swets, J.A. ROC analysis applied to the evaluation of medical imaging techniques. Investig. Radiol. 1979, 14, 109–121. [Google Scholar] [CrossRef]
Hajian-Tilaki, K. Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation. Casp. J. Intern. Med. 2013, 4, 627–635. [Google Scholar]
Chakrabarti, R.; Harper, C.A.; Keeffe, J.E. Diabetic retinopathy management guidelines. Expert Rev. Ophthalmol. 2012, 7, 417–439. [Google Scholar] [CrossRef]
Sánchez, C.I.; Niemeijer, M.; Dumitrescu, A.V.; Suttorp-Schulten, M.S.A.; Abràmoff, M.D.; van Ginneken, B. Evaluation of a Computer-Aided Diagnosis System for Diabetic Retinopathy Screening on Public Data. Investig. Ophthalmol. Vis. Sci. 2011, 52, 4866–4871. [Google Scholar] [CrossRef]

Figure 1. (a) DR lesions along with healthy landmarks (blood vessels and optic discs). (b) Extracted region of interest (ROI) containing red lesions.

Figure 2. An overview of a proposed deep CNN-based red lesion classification technique for early DR screening.

Figure 3. (a) A 200 × 200 patch extraction from fundus image. (b) A few generated patches are shown with their specific naming conventions for reference in the future.

Figure 4. A schematic diagram of the proposed method, showing the DR-ResNet50 architecture to classify red lesions of DR using small image patches.

Figure 5. Training progress of a proposed DR-ResNet50 model on the e-Ophtha_MA dataset.

Figure 6. Confusion matrix results were obtained using the 3460 test instances by splitting the data using Scheme-2. (a) Percentage of two-class predictions, and (b) a total number of true and false classified instances from both classes.

Figure 7. Per-image evaluation on the e-Ophtha MA for DR screening.

Figure 8. Successfully classified cases of small patches from e-Ophtha_MA images along their Grad-CAMs. (a) True positive cases, (b) true negative cases.

Figure 9. Unsuccessfully classified cases of small patches from e-Ophtha_MA images along their Grad-CAMs. (a) False negative cases, (b) false positive cases.

Figure 10. Back association of extracted patches on a fundus image for region marking according to classification decisions. The blue box depicts unhealthy while the white box represents healthy regions classified by the system.

Table 1. Handcrafted features-based DR diagnosis by considering red lesions.

Ref.	Major Aspects	Results
Sinthanayothin et al., 2002 [34]	A recursive region growing method to segment both vessels and red lesions.	Se.%	Spec.%	ACC.%	AUC%
Sinthanayothin et al., 2002 [34]		78	89	--	--
Larsen et al., 2003 [35]	Fundus images were obtained after dilation of the pupil.	96.7	71.4	--	--
Niemeijer et al., 2005 [14]	Computed sixty-eight handcrafted features. Per-image and per-lesion achievements are reported in the 1st and 2nd row of results.	100	87	--	--
Niemeijer et al., 2005 [14]		30	--	--	--
Grisan and Ruggeri 2007 [36]	Local thresholding and pixel density-based detection.	--	--	94 for HEs	--
Garc’ıa et al., 2008 [37]	Introduced a feature selection and a neural network, suggested by Grisan and Ruggeri [36], for image-based classification.	--	--	80	--
Zhang et al., 2009 [38]	A multi-scale correlation filtering and dynamic-thresholding technique.	--	Avg. FP = 0.1856	--	--
Saleh and Eswaran 2012 [39]	Size and shape-based features. Results in the 1st and 2nd row are of MAs and HEs, respectively.	84.31	93.63	--	--
Saleh and Eswaran 2012 [39]		87.53	95.08	--	--
Ashraf et al., 2014 [40] and 2015 [41]	Textures features analysis by local binary pattern [40], and higher-order statistical features [41].	87.48	85.99	86.15	87
Ashraf et al., 2014 [40] and 2015 [41]		87.92	87.54	87.44	87
Srivastava et al., 2015 [42] and 2017 [43]	Frangi filters-based features.	AUC for MAs = 97 & HEs = 87 [42], and 0.92 for both lesions [43].
Seoud et al., 2016 [15]	Dynamic shape-based features.	--	--	--	89.9
Xiao et al., 2017 [44]	A phase congruency-based MAs detection. HEs detection by k- mean clustering and SVM.	ACC. 92 and 93 for MAs and HEs.
Colomer et al., 2020 [45]	DR lesions were differentiated by combining texture and morphological information.	75.61	75.62	--	83.30

Table 2. Deep CNN-based early DR diagnosis considering both types of red lesions.

Ref.	Major Aspects	DR Screening Results		Datasets
Orlando et al., 2018 [16]	Handcrafted and CNN-based features were combined to detect red lesions.	AUC%	Se.	e-Ophtha_MA Messidor
		90.31	--
		89.32	0.916
Zago et al., 2020 [17]	Dual CNN models have trained over 65 × 65 image regions.	91.2	0.940	Messidor IDRiD
Zago et al., 2020 [17]	Dual CNN models have trained over 65 × 65 image regions.	81.8	0.841	Messidor IDRiD

Table 3. Summary of five public datasets used in the current study.

Dataset	Number of Images	Resolution	Special Procedures	Dataset Distribution	Purpose
e-Ophtha_MA [13]	381	1440 × 960 and 2048 × 1360	Preprocessing for FOV extraction. The 200 × 200 patch generation. Data augmentation	Whole set	Training, Validation and Testing
DiaRetDB1 v2.1 [25]	89	1500 × 1152	Preprocessing for FOV extraction. The 200 × 200 patch generation.	Test set	Cross-Validation
ROC [26]	100	768 × 576, 1058 × 1061, and 1389 × 1383		Training set
IDRiD [28]	516	4288 × 2848		Test set
Messidor [27]	1200	1440 × 960, 2240 × 1488, and 2304 × 1536		Test set

Table 4. Highest achieved results of various CNN models after fine-tuning.

Sr.	Model	ACC (%)	Bias	Variance
1	AlexNet	90.06	Very High	Very High
2	VGG16	91.57	High	High
3	GoogLeNet	92.08	Moderate	High
4	Inception-v3	90.93	Moderate	High
5	ResNet50	93.45	Mild	Moderate
6	DenseNet	92.86	Moderate	Moderate

Table 5. Performance of a fine-tuned Resnet50 model for 30 epochs after freezing different initial layers.

Freeze up to	Residual Blocks	Training Time (Minutes)	Se. (%) in (95 % CI)	Spec. (%) in (95 % CI)	AUC (%) in (95 % CI)	ACC (%) in (95 % CI)
No Freeze	0	2176	89.66 (89.60–89.72)	94.99 (94.98–95)	93.42 (93.40–93.44)	93.45 (93.41–93.49)
Layer 36	3	2167	91.06 (91.02–91.10)	97.99 (97.98–98)	94.73 (94.70–94.76)	94.71 (94.70–94.72)
Layer 48	4	2160	91.44 (91.35–91.53)	98.13 (97.57–98.69)	94.30 (93.76–94.84)	95.02 (94.50–95.54)
Layer 58	5	2156	91.94 (91.58–92.30)	98.66 (98.57–98.75)	95.66 (95.32–96)	95.68 (94.86–96.50)
Layer 68	6	2154	89.66 (82.82–96.5)	98.27 (97.5–99.04)	93.97 (94.5–93.44)	93.94 (95.50–93.38)
Layer 78	7	2151	90.13 (88.26–92)	97.41 (95–99.82)	93.78 (92.56–95)	93.97 (92.94–95)
Layer 90	8	2146	85.96 (85.92–86)	97.51 (96.15–98.87)	91.74 (91.48–92)	91.71 (91.42–92)

Table 6. Performance of a DR-ResNet50 model on the e-Ophtha_MA after fine-tuning and architectural modifications of baseline ResNet50.

Exp.	Evaluation Type	Data Split Scheme	Enhancement	Loss Function	Bias and Variance	Se. (95 % CI)	Spec. (95 % CI)	AUC 95 % CI)	ACC (95 % CI)
1	Per Lesion	1	RSK1	Cross Entropy	Low	0.9267 (0.925–0.9284)	0.9890 (0.988–0.99)	0.9578 (0.9576–0.958)	0.9579 (0.9577–0.9581)
2			RSK1 and RSK2		Very Low	0.9349 (0.933–0.9368)	0.9886 (0.9885–0.9887)	0.9675 (0.967–0.968)	0.9676 (0.968–0.9672)
3			RSK1, RSK2, and GMP		Very Low	0.9496 (0.945–0.951)	0.9890 (0.988–0.99)	0.9678 (0.9666–0.969)	0.9679 (0.966–0.969)
4			RSK1, RSK2, and GMP	SSE	None	0.9815 (0.98–0.983)	0.9972 (0.9971–0.9973)	0.9893 (0.989–0.9896)	0.9893 (0.989–0.9896)
5	Per Lesion	2	RSK1, RSK2, and GMP	SSE	None	0.9850 (0.982–0.988)	0.9911 (0.991–0.992)	0.9910 (0.99–0.992)	0.9910 (0.99–0.992)
6	Per Image	2	RSK1, RSK2, and GMP	SSE	None	0.9851 (0.9821–0.9881)	0.9910 (0.99–0.992)	0.9910 (0.99–0.992)	0.9910 (0.99–0.992)

Table 7. Cross-validation of various publicly available datasets for DR screening.

Dataset	Evaluation Type	Se. (95 % CI)	Spec. (95 % CI)	AUC (95 % CI)	ACC (95 % CI)
DiaRetDB1 v2.1	Per Lesion basis	0.9498 (0.9410–0.9586)	0.9660 (0.957–0.975)	0.9578 (0.955–0.9606)	0.9448 (0.9551–0.9603)
ROC		0.9270 (0.917–0.937)	0.9230 (0.917–0.929)	0.925 (0.916–0.934)	0.925 (0.916–0.934)
IDRiD		0.8665 (0.866–0.8670)	0.8046 (0.8041–0.8051)	0.8355 (0.8347–0.8363)	0.8356 (0.8347–0.8365)
Messidor	Per image basis	0.9421 (0.9411–0.9431)	0.8940 (0.891–0.897)	0.9185 (0.905–0.932)	0.9186 (0.906–0.9312)
IDRiD	Per image basis	0.8426 (0.8398–0.8454)	0.8044 (0.7907–0.8181)	0.8235 (0.8176–0.8294)	0.8235 (0.8076–0.8394)

Table 8. Comparison with state-of-the-art deep learning-based methods for DR screening.

Dataset	Method	Se. in (95 % CI)	AUC in (95 % CI)
e-Ophtha_MA	Orlando et al. [16]	--	0.9031
e-Ophtha_MA	This Work	0.985 (0.982–0.988)	0.991 (0.988–0.993)
IDRiD	Zago et al. [17]	0.841 (0.753-.948)	0.818 (0.742–0.898)
IDRiD	This Work	0.8426 (0.8398–0.8454)	0.8235 (0.8176–0.8294)
Messidor	Expert A [86]	0.945	0.922 (0.902–0.936)
	Expert B [86]	0.912	0.865 (0.789–0.925)
	Orlando et al. [16]	0.916 (0.894–0.943)	0.893 (0.875–0.912)
	Zago et al. d [17]	0.940 (0.921–0.959)	0.912 (0.897–0.928)
	This Work	0.942 (0.941–0.943)	0.918 (0.905–0.932)

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ashraf, M.N.; Hussain, M.; Habib, Z. Deep Red Lesion Classification for Early Screening of Diabetic Retinopathy. Mathematics 2022, 10, 686. https://doi.org/10.3390/math10050686

AMA Style

Ashraf MN, Hussain M, Habib Z. Deep Red Lesion Classification for Early Screening of Diabetic Retinopathy. Mathematics. 2022; 10(5):686. https://doi.org/10.3390/math10050686

Chicago/Turabian Style

Ashraf, Muhammad Nadeem, Muhammad Hussain, and Zulfiqar Habib. 2022. "Deep Red Lesion Classification for Early Screening of Diabetic Retinopathy" Mathematics 10, no. 5: 686. https://doi.org/10.3390/math10050686

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Red Lesion Classification for Early Screening of Diabetic Retinopathy

Abstract

1. Introduction

2. Materials and Methods

2.1. Datasets

2.1.1. e-Ophtha_MA

2.1.2. Retinopathy Online Challenge (ROC)

2.1.3. Standard Diabetic Retinopathy Database Calibration Level 1 (DiaRetDB1) v2.1

2.1.4. Indian Diabetic Retinopathy Image Dataset (IDRiD)

2.1.5. Messidor

2.2. The Proposed Method

2.2.1. Preprocessing

2.2.2. Patch Generation

2.2.3. Dataset Distribution (Train/Development/Test Sets)

2.2.4. CNN Training

2.2.5. Baseline ResNet Architecture

2.2.6. Freezing Initial Layers

2.2.7. Architectural Enhancement

Reinforced Skip Connections

Layers Modification

2.2.8. Evaluation Protocol

2.2.9. Experimental Setup

3. Results

3.1. Fine-Tuning

3.2. DR-ResNet50 Architecture

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI