1. Introduction
Ocular diseases have sparked remarkable interest because they are serious and influence a large portion of the population. An example of a relevant ocular disease is diabetic retinopathy (DR) [
1], because it may cause blindness in working-age patients. DR is a progressive pathology that can change from mild to severe non-proliferative disease, and it is an important factor contributing to blindness. This disease affects up to 80% of those who have suffered from diabetes for 10 years or more [
2]. Due to increases in life expectancy and other factors, the number of people affected by exudates is growing [
3]. Exudates are one of the main symptoms of DR, caused by the leakage of proteins and lipids from damaged blood vessels in the retina. They can be recognized as yellow lesions distributed on the interior surface of the eye, with different sizes in different locations, according to the severity of the retinopathy (
Figure 1a). From a practical point of view, it must be noted that DR does not show salient symptoms in the early stages [
4]; therefore, it is a challenge to identify the disease early enough to give a better chance for successful treatment [
5]. To diagnose and implement a treatment, screening checks are required that, at present, require professional clinicians. On the other hand, automated mass screening tasks are an active area of research with promising potential in the field of computer vision [
3]. Other examples of artifacts that may appear in the retina include hemorrhage, the escape of blood from vessels that appears as red lesions, and different types of pus [
2]. The World Diabetes Foundation estimates that more than 640 million people will suffer from diabetes by the year 2040 [
6].
In the machine-learning literature, many clustering methods have been presented, and each one has advantages and disadvantages. Mainly, these methods are accomplished either with crisp (hard) or fuzzy (soft) clustering. The first defines specific boundaries between clusters; hence, each pixel strictly belongs to a particular cluster. Fuzzy clustering defines data as belonging to more than one cluster with different membership values; as the membership value increases, there is a higher confidence in assigning the object to the correct cluster [
7]. It is convenient to use fuzzy clustering in non-deterministic problems, such as overlapping intensities, poor contrast, and noise homogeneities variation. This motivated us to use the fuzzy C-means clustering (FCM) technique to locate the exudates in fundus images, as they can benefit from non-deterministic properties. FCM clustering is becoming more and more popular in many disciplines, for its merits as it is an unsupervised clustering tool that is easy to implement [
8], while various applications proved its superiority with respect to other well-known algorithms, such as K-means [
9,
10].
Numerous algorithms were developed to detect DR semi-automatically or fully automatically. To the best of our knowledge, they have not been applied to tracking DR progress over time. Typically, the early stages of DR do not show particular symptoms, but they are predominantly present by the time the number and severity of the disease increase [
4]. Tracking of DR requires the ability to define exudates as landmarks, to match and to compare their status periodically through exact identification.
A supervised classification to detect hard exudates (HE) in human fundus images was proposed in Reference [
1]; that method was implemented by applying a contrast adaptive histogram equalization preprocessing to the fundus images and extracting a set of influential features (i.e., standard deviation of the intensity, mean, intensity, size, edge strength, and compactness of the segmented candidates). Those features were used in an echo state neural network, to classify the segments into pathological or normal regions. A set of 50 normal/pathological fundus images was selected; 35 of the images were used for training and 15 for testing. A system to detect diabetic macular edema through automatic detection of exudates by using the publicly available datasets MESSIDOR, DIARETDB1, and HEI-MED1 (which included 69 patients with different levels of diabetic macular edema) was introduced in Reference [
11]. It was designed to automatically detect exudates with some confidence level, in order to separate true/false positives, depending on color-space information and wavelet analysis. A subsequent combination of feature sets and classifiers (nearest neighbor, naïve Bayes, support vector machine, and random forests) was evaluated, to select an appropriate classification strategy. The final results were evaluated by hold-one-out and cross-validation.
Exudate segmentation approaches are based mainly on thresholding, region growing, classification techniques, and mathematical morphology. The detection of hard exudates in Reference [
12] was based on morphological features; moreover, post-processing techniques were implemented to distinguish exudates from other bright artifacts, such as cotton–wools and the optic disc. In Reference [
13], hard exudates were detected on two hundred proprietary retinal images by using four color and 25 texture features fed to the fuzzy support vector machine capable of avoiding the effect of outliers, noise, and artifacts. The optic disc was first detected and excluded, as its color is quite similar to hard exudates. It was emphasized in Reference [
14] that numerous studies are devoted to detecting exudates automatically, in order to provide decision support and to reduce ophthalmologist workload. Exudate-detection approaches were categorized into thresholding, region growing, clustering, morphology, classification, and others. A list of 71 algorithms with corresponding sensitivity, specificity, and accuracy performances were presented.
A modified new optimized mathematical morphological approach to hard exudates for diabetic retinopathy screening was described in Reference [
15]. The method utilizes multi-space-based intensity features, geometric features, a gray-level co-occurrence matrix-based texture descriptor, and a gray-level-size matrix-based texture descriptor to construct handcrafted features. A deep convolutional neural network learns the deep information of HE, and a random forest identifies HE within the candidate ones. Four datasets were considered (three of them are publicly available) in the validation process, and the results were compared with the manual labeled ground truth on the pixel and image levels. A recent work [
16] presented a weakly supervised multitask architecture with three stages to segment bright and red lesions on fundus images. The data were augmented through a random combination of one or more transformations, and a compromised patch extraction of 256 × 256 pixels was used. Five datasets were used for training and testing for pixel- and image-level validation.
These methods can be applied to the evident diseases, while the proposed approach intends to help the physician follow DR in the early stages, and its progress over time.
Section 2 of this manuscript is devoted to describing the materials and methods of preprocessing.
Section 3 introduces our methodology, which in turn is divided into three steps: detecting expected exudates, features extraction, and FCM clustering. Experimental results, discussion, and conclusions are reported in
Section 4 and
Section 5.
2. Materials and Methods
Four retinal datasets relevant to exudates were used to train and evaluate the method: DIARETDB0/1, IDRID, and e-optha datasets. The Standard Diabetic Retinopathy Databases Calibration Levels 0/1 (known as DIARETDB0/1) [
4] were used to train and also to experimentally validate our automatic exudate detection method. These datasets are publicly available and collect 1152 × 1500 true-color loss-less images obtained by using a digital fundus camera with a 50° field of view.
The DIARETDB0 dataset consists of 130 fundus images, of which 20 present normal cases and 110 contain signs of DR (hard exudates, soft exudates, microaneurysms, hemorrhages, and neovascularization). The data correspond to practical situations and can be used to evaluate the general performance of diagnosis methods. This dataset is referred to as “calibration level 0 fundus images”, and it was published in 2006. On the other hand, the DIARETDB1 dataset consists of 89 fundus images, of which 84 contain at least mild non-proliferative signs (microaneurysms) of DR and 5 are considered to be normal cases which do not contain any sign of DR, according to all experts who participated in the evaluation. The data correspond to a good (not necessarily typical), practical situation. This dataset is referred to as “calibration level 1 fundus images”, published in 2007. Both of these datasets can be used to evaluate the general performance of diagnostic methods.
The ground truth in DIARETDB0 was defined precisely in each image, including both the center coordinates and the type of each pus as an exudate/hemorrhage/cotton–wool area, while, in DIARETDB1, the ground truth was prepared by four experts, showing the affected regions and not specifying the infected areas. A set of random images from these datasets containing hard exudates was segmented manually by our expert ophthalmologist; the set included 19 DIARETDB0 and another 10 DIARETDB1 images affected by DR.
Figure 1 shows a sample image from DIARETDB1, together with its ground truth provided in the dataset and created by our ophthalmologist.
The ground truths introduced in the two datasets are not suitable for our methodology. The former reports only the center of the exudate, while the latter indicates the affected region in a rough way. A comparison was made between our expert and the DIARETDB1 ground truth (GT), showing 95.7% sensitivity and 90.8% specificity.
Another dataset used to evaluate our method was the popular Indian Diabetic Retinopathy Image Dataset (IDRID). The use of this dataset was required for testing the evaluation process, and it was used as an additional testing dataset, without being incorporated into the training process. It was also used to confirm and compare the reliability of our proposed methodology on images outside of the training. The dataset included 81 colored retinal images (jpg files) with resolutions of 4288 × 2848 pixels with
fields of view devoted to hard exudates. These images were annotated at the pixel level, in ground truth segmentations that were provided as a binary mask (tif files). The use of 80% of the whole dataset as training and 20% as testing was suggested in Reference [
17]. As this work is devoted to the detection of hard exudates, 50 images were selected from this dataset for testing.
The fourth dataset used to test and evaluate our proposed algorithm was e-optha [
18], which included colored fundus images (jpg files), in addition to the binary masks (png files) that provided exact contoured annotations by an expert ophthalmologist that were revised by another expert. Moreover, healthy patients without lesions were also provided. The dataset contained 47 images with exudates and 35 images without exudates, as well as four different sizes of the images, ranging from 1440 × 960 to 2544 × 1696 pixels. All images were acquired under the same view angle of
. A set of 25 random images was selected.
Exudates, optic discs, and some vessels (particularly when there was a light reflex) all exhibit a similar yellowish color and can be partially similar in shape, meaning that the tool can make some mistakes quite easily. Hence, ignoring the presence of optic discs and vessels in the image was expected to improve the overall classification performance. Optic disc and vessel detection methods are illustrated in the following.
2.1. Preprocessing—Optic Disc Segmentation
Noticing that the brightness of the optic disc (OD) is dominating with respect to the rest of the retina, many authors of papers suggested automated techniques for detecting the OD in fundus images; for instance, texture descriptors and a regression-based method to locate the OD were described in Reference [
19]. Here we chose to use the intensity channel (I) from the HSI color space. Then, the median filter was applied, to reduce the presence of small objects, outliers, and artifacts, while preserving the main edges. To obtain a better contrast within the image, the contrast-limited adaptive histogram equalization (CLAHE) [
1] was applied. Morphological Closing with a disk-shaped structuring element (SE) was enough to eliminate blood vessels and to locate the OD. The size of the SE was defined through testing a series of values, concluding that 15 pixels was the most appropriate size for the considered dataset.
The resulting image was then binarized by using the global image threshold based on the standard deviation approach; the process of thresholding may be particularly useful to remove unnecessary details or variations by focusing on highlighting details that are of interest.
Although the binary image contained other structures rather than the OD, all small structures could be deleted by using the label filtering, while keeping the largest area object which represented the OD. A technique involving labeling connected pixels was used: It isolated the individual objects by using the eight connected neighborhoods and label propagation. After labeling the connected areas in the image, all components having less than 700 pixels were removed on each image, thus obtaining the OD alone. The effect of the whole pipeline of operations can be seen in
Figure 2.
2.2. Preprocessing—Vessel Segmentation
Retinal vessels present a wide range of colors, usually from red to yellow, that make some cases very similar to exudates. To solve this confusing problem, vessels need to be automatically located and then excluded from the images. We verified that the green channel in the RGB color space holds the most instructive information because the vessels appear considerably darker than the background. The red and blue channels point out noise and low contrast, respectively. This motivated many researchers, including us, to focus on the green channel. To enhance the contrast in this image, we carried out the CLAHE preprocessing. A successive Gaussian filter was applied on this image, to get rid of noise and irrelevant details (
Figure 3a,b). Gaussian filtering was accomplished in two steps: firstly, removing bright details smaller than a threshold, by applying area opening, and secondly, by applying a Gaussian filter, for noise reduction.
To refine the vessels, a top-hat filter was implemented by using a single-disk SE with radii from 5 to 15 pixels [
20], producing an image containing the parts that were smaller than the SE and were darker than their surroundings. Blood vessels appeared as clear elongated objects, while the background was black, as shown in
Figure 3c. An image with good contrast has sharp differences between black and white. Hence, contrast adjustment was implemented to remap image intensity values to the full display range of the data type. A preliminary segmented image was obtained again through a standard deviation thresholding algorithm, followed by a mathematical opening operator. The Erosion, as a part of opening operator, typically eliminates small bright objects, and the subsequent Dilation tends to restore the shape of the objects that remain, hence preserving the structures. The same label filter technique previously discussed was used in order to delete the remaining non-vessel structures of small sizes, where, in this case, any area containing less than 300 pixels was removed for each image. The final result for the blood vessel network after implementing the label filtering superimposed on the colored retinal image is shown in
Figure 3f.
4. Experimental Results and Discussion
Testing our method on the four publicly available datasets, which are relevant to hard exudates, makes the method more reliable. The overall number of images used to train and evaluate the performance was 114 images for both testing and training. The randomly selected images were of good quality, no patient was duplicated, and disease stratification was represented. The collection of images for testing was distributed as follows: 25 from DIARETDB0/1, 50 from IDRID, and finally 25 from the e-optha dataset. Notice that, in supervised classifiers, a huge set of GT is needed to define and calibrate the methodology, while even with a reasonably small number of chosen images, the FCM was well calibrated and efficiently classified exudates.
In order to verify the correctness of the proposed methodology, we compared the output against the valuation provided by our expert ophthalmologist for the DIARETDB0/1 dataset, while for the other datasets, IDRID and e-optha, the provided ground truths were used. The habitually used performance measurements were used to report performance: sensitivity, specificity, positive predictive value, and accuracy. Indeed, labeling each pixel with respect to the ground truth leads to the following cases: true positive (TP), when a pixel is properly considered an exudate; false positive (FP), when a pixel is wrongly considered an exudate; false negative (FN), when a pixel is wrongly considered a non-exudate; and true negative (TN), when a pixel is properly considered a non-exudate. A high value of TP indicates correct identification of exudates [
21,
26,
27,
28]. The method of evaluation is in need of further performance measures, which can be derived from the state pixels to define the overall performance of the methodology.
In this paper, two strategies were implemented to evaluate the abilities of our approach. The classical method, to define the four classes (TP, TN, FP, and FN), was achieved by summing up the corresponding pixels between the GT and the resulting segmented image according to their mutual relation. Another method of evaluating the performance of exudate segmentation in retinal fundus images was, however, driven by the expert. The ophthalmologists did not totally agree with each other in their manual GT segmentation regarding exudates for many reasons: the state of the image (clear, blur, sharpen, brightness, outliers, etc.), a complication of the case, the spherical shape of the fundus, etc. This inconsistency in the GT segmentation is evident in other retinal issues, such as vessels: segmentation, crossings, and bifurcations [
29], in addition to the optic disc and other landmarks of the retina. This makes the segmentation task quite difficult because no definite and general-purpose rules are followed. Indeed, the authors of References [
15,
16,
30,
31] believe that it is more reasonable to consider those connected segmented areas as TP, even though they partially intersect with the GT under particular constraints [
32]. In other words, those segmented exudates are considered to be TP, even if the contours are not matched exactly, if and only if the connected components of the segmented portions touch the GT and satisfy a defined ratio of overlapping.
An example is given in the following: Let us suppose a GT and a corresponding segmentation are given as presented in
Figure 9a,b, and then each overlapping region between a part of the GT and a segmented portion should exceed a factor
; therefore, the neighboring pixels connected to this part of the relevant GT and segment will be considered as TP pixels [
32]. In
Figure 9, two regions are affected by the extension: the lower part of the segmented portion of the image which overlaps the GT by 2 pixels from a total region of 5 pixels, making a ratio
conversely, the same part of the GT overlaps a portion of the relevant segmented image; this intersection is constructed of 2 pixels out of a total region of 4 pixels representing the GT, making a ratio of
. As the criteria are satisfied in both cases, the connected pixels of this portion of the segmented image, as well as the GT, are estimated as TP. On the other hand, the upper-left part similarly achieved the criteria, and, hence, it is considered as TP (the upper-left segmented portion intersects the GT with a ratio of
the GT intersects the relevant segmented part with a ratio of
). Hybrid validation techniques (pixel by pixel and extended technique [
32]) were implemented here to evaluate and compare our methodology with others under the same circumstances.
Sensitivity () evaluates the ability of the methodology to properly detect exudates. Specificity () evaluates the ability to properly detect non-exudates. Accuracy () evaluates the ability to properly detect exudates and non-exudates. Positive Predictive Value () is the correctness degree to detect exudate pixels out of all detected exudate pixels.
FCM was selected as the clustering tool because fundus image properties are not unified and standard: Variations in colors, the presence of noise, and a fuzzy look due to the eye spherical shape are drawbacks that complicate the task. The common dataset DIARETDB0/1, expressly made for exudate detection, was considered the benchmark. The preliminary exclusion of the optic disc and the vessels reduces/prevents false detection of the exudates. A brute-force approach indicated the best feature set, considering the 77 features shown in
Table 1. We kept the following four features: hue, entropy, and standard deviation for both the intensity in the HSI color space and the Y channel from the XYZ color space (
Figure 10).
Standard deviation of intensity, with various window sizes of 3 × 3, 7 × 7, 11 × 11, 15 × 15, and 17 × 17 pixels, was convolved with the expected exudate binary images. Results showed that the highest TP values were obtained for a standard deviation window of 3 × 3 pixels. As the window size increased, small exudate areas were not properly detected; consequently, this loss of exudates reduces the sensitivity of the system. In the case of FP, the value of FP acted in an opposite way to TP; as the standard deviation window size increased, the number of FP values decreased, which negatively affected the specificity. To show the effect of variant window size on detecting the exudates, an example is introduced in
Figure 11.
As in the case of standard deviation of intensity, entropy was implemented with different window sizes, from 3 × 3 to 17 × 17 pixels. Increasing the size of the window affected small exudate areas that started to disappear, so the TP value was decreasing and had a negative influence on sensitivity. The FP value decreased as the size got larger, which means the misclassification data decreased. Again, TP and FP contradicted each other. Since TP is more influential on the performance of the method, it is convenient to keep the TP as high as possible, by choosing the smallest window size.
As a side effect of the standard deviation/entropy of intensity, the surrounding pixels of the optic disc are sometimes mistakenly defined as exudates, due to the fact that color properties around the optic disc look similar, to some extent, to exudates. To eliminate this drawback, a disk-shaped SE around the optic disc was executed. An illustration is depicted in
Figure 12. In order to define the best SE size, a SE ranging from 25 to 100 pixels was applied on all training images, showing the best overall TP values when the SE was 45 pixels.
A set of nine training images from the DIARETDB0/1 that satisfied the abovementioned criteria was chosen randomly. Recalling that FCM is an unsupervised classifier, these training images were used to manipulate and fit the parameters. The remaining images in DIARETDB were used to evaluate the performance of the proposed methodology.
To reduce FP exudates, majority rule was involved to make use of the output of the four features together, taking into consideration the following priorities: hue, standard deviation, and then entropy of the HSI intensity. The standard approach to calculate TP, FP, TN, and FN without expanding was followed, and consequently Se, Sp, and Acc were measured for each image used in the testing datasets. The performance regarding the DIARETDB came to be as follows: Se = 83.2%, Sp = 98.5%, and Acc = 98.3%. These pixel data are treated as landmark information that can be forwarded as a time series of retina images of the same patient, so as to perform a follow-up.
The extended area approach, which is used to evaluate statistical measures, is presented here to compare the performance of our approach regarding the different used datasets, as in
Table 2: The average values are quite similar, except in the case of sensitivity, perhaps because the training data were defined on DIARETDB images; hence, parameters fit this dataset more accurately. Though a marginal difference is acceptable, this difference might also be influenced by the variant image sizes of the IDRID and the dark nature of the e-optha images. Moreover, results including the worst samples (i.e., minimum value) are due to the very low contrast between exudates and the background. The relatively low values of the standard deviation promote the algorithm stability.
Figure 13 compares the average values of specificity, sensitivity, and accuracy for the datasets showing stable outputs, which is credited to the methodology. That might indicate the ability to identify exudates in other different suitable datasets.
A review of the results of the training and testing data showed good performance with acceptable variation in extracting the exudates, which can be interpreted as the differences among the datasets; for instance, the e-optha dataset has darker images, while the IDRID dataset has high-quality larger-size dimensions. A numerical comparison of the average performances for the proposed method is shown in
Table 3, where each dataset of the used three measures is illustrated explicitly. The first comparison presented here is made to those studies that used FCM to achieve automatic exudates detection, similar to ours (
Table 3). Consequently, this provides ophthalmologists with an additional tool with minimal effort to diagnose and treat a part of ophthalmic diseases [
33]. The proposed approach is comparable to others for any of the used datasets, especially for
Sp and
Acc.
Our methodology outperforms the state-of-the-art approaches in both specificity and accuracy, under the same datasets. As in this work, the performance is measured pixel by pixel, and we notice good results with average
Se = 78.4% for the testing data; on the other hand, the training data came to have
Se = 83.2%. This decrease is due to Image #11, which had 60.1% sensitivity. A typical output of the algorithm is represented in
Figure 14, noting that the FCM classified those probable exudate pixels into
C1, representing non-exudates, and
C2, representing exudate pixels.
The proposed algorithm showed remarkably high specificity, which shows its ability to avoid recognizing a non-exudate pixel as an exudate. However, the sensitivity had a lower value due to those exudates with low intensity, which are still hard to detect perfectly [
34]. For the sake of completeness, we report for the DIARETDB dataset the best (Image #27 with
Se = 99.2% / 99.8%; Image #16 with
Acc = 97.6%/99.0%) and the worst cases (Image #16 with
Se = 62.9%/76.0%; Image #27 with
Acc = 93.7%/94.9%) evaluated, using both pixel by pixel and the extended technique [
32]. The same two images present opposite behaviors depending on the measure used (
Figure 15). On the other hand, the obtained overall performance of the algorithm on average came to be as follows:
Se = 83.3%,
Sp = 99.2%, and
Acc = 99.1%. The performance of all tested images in DIARETDB is shown in
Figure 16. Finally,
Table 4 compares state-of-the-art algorithms [
15,
16,
35] and our proposed one.