Enhancing Cataract Detection through Hybrid CNN Approach and Image Quadration: A Solution for Precise Diagnosis and Improved Patient Care

Nguyen, Van-Viet; Lin, Chun-Ling

doi:10.3390/electronics13122344

Open AccessArticle

Enhancing Cataract Detection through Hybrid CNN Approach and Image Quadration: A Solution for Precise Diagnosis and Improved Patient Care

by

Van-Viet Nguyen

and

Chun-Ling Lin

^*

Department of Electrical Engineering, Ming Chi University of Technology, New Taipei City 243, Taiwan

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(12), 2344; https://doi.org/10.3390/electronics13122344

Submission received: 7 May 2024 / Revised: 29 May 2024 / Accepted: 14 June 2024 / Published: 15 June 2024

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Cataracts, characterized by lens opacity, pose a significant global health concern, leading to blurred vision and potential blindness. Timely detection is crucial, particularly in regions with a shortage of ophthalmologists, where manual diagnosis is time-consuming. While deep learning and convolutional neural networks (CNNs) offer promising solutions, existing models often struggle with diverse datasets. This study introduces a hybrid CNN approach, training on both full retinal fundus images and quadrated parts (i.e., the fundus images divided into four segments). Majority voting is utilized to enhance accuracy, resulting in a superior performance of 97.12%, representing a 1.44% improvement. The hybrid model facilitates early cataract detection, aiding in preventing vision impairment. Integrated into applications, it supports ophthalmologists by providing rapid, cost-efficient predictions. Beyond cataract detection, this research addresses broader computer vision challenges, contributing to various applications. In conclusion, our proposed approach, combining CNNs and image quadration enhances cataract detection’s accuracy, robustness, and generalization. This innovation holds promise for improving patient care and aiding ophthalmologists in precise cataract diagnosis.

Keywords:

cataract detection; deep learning; convolutional neural networks; retinal fundus images; image quadration; majority voting

1. Introduction

Cataract is a condition characterized by the development of an opacity of the lens on the eye, resulting in progressively blurred vision and potential blindness. According to the “World Report on Vision” of the World Health Organization [1,2], cataract-related vision impairment affects approximately 65.2 million people worldwide. This number is the second highest in the list of diseases that affect vision loss, only after unaddressed refractive errors such as myopia or hypermetropia (123.7 million). Hatch et al. suggested that approximately 4.3 million cataract surgeries will be required annually in the United States by 2036 [3]. Early cataract detection by ophthalmologists can prevent blindness and reduce the need for expensive treatments before the cataracts reach an advanced stage. However, the manual detection process is a time-consuming process that demands substantial human resources and professional expertise from ophthalmologists. Moreover, the shortage of ophthalmologists worldwide, particularly in low-income countries [4], poses a significant challenge for detecting and treating eye diseases, including cataracts.

Currently, with the exponential advancements in deep learning, several studies [5,6,7,8] have suggested a cataract detection technology leveraging deep learning and convolutional neural networks (CNNs) using retinal fundus images. These methods offer high accuracy, low cost, and easy applicability. However, these existing methods still rely on the traditional approach of training the CNN model with full retinal fundus images. This approach has certain drawbacks:

Firstly, these models may struggle to identify subtle patterns in the data because they typically use a holistic approach to process the entire image. Retinal fundus images are intricate, containing various features and structures such as retinal blood vessels, anomalies, and spots [9]. These features may exhibit variations across different regions, some of which can be exceedingly subtle and require a high level of visual acuity to capture. By processing the entire image, models may find it challenging to distinguish these subtle nuances due to the amalgamation of all of this information. In contrast, focusing on specific regions may make it more likely for the model to capture these subtle patterns as it can conduct a more in-depth analysis of the areas of interest [10].
Furthermore, training on the entire image can lead to weaker model resistance to occlusion and noise. Retinal fundus images can be subject to various interferences, such as eyelashes, tears, spots, and the presence of ocular blood vessels, which might partially obscure the structures of interest or introduce noise [11,12]. By training on the entire image, the model might perform poorly in handling these interferences because it may struggle to distinguish which parts are genuine features and which are disturbances.
Lastly, the model may face challenges in generalizing to new, unseen data as it has been trained on a limited set of examples. Retinal fundus images exhibit a wide diversity, with patients of different ages, ethnicities, and disease stages displaying markedly distinct image characteristics. If the model has only been trained on a specific subset of data, it may have difficulty adapting to the broad variations in retinal fundus images. This can result in a decrease in performance when dealing with new, unseen data as the model lacks the necessary diversity to accommodate variations in different scenarios. Therefore, the model needs better generalization capabilities to handle retinal fundus images from various sources [13].

Furthermore, note that the difficulties outlined are not exclusive to the analysis of retinal fundus images but extend to various other computer vision tasks. Recognizing subtle patterns, handling occlusions and noise, and achieving robust generalization are common hurdles encountered in the broader field of computer vision. In the realm of computer vision, the fundamental challenge lies in the ability to capture intricate details while managing obstructed or noisy areas. These challenges are prevalent across different tasks, including object recognition, scene understanding, and medical image analysis. Consequently, it is imperative to acknowledge that the issues discussed in the context of retinal fundus image analysis are representative of the broader challenges faced within computer vision, and addressing them can yield significant benefits across a diverse range of applications. Despite the shared nature of these challenges, it is important to recognize that each task may possess its own unique characteristics and complexities. In the case of retinal fundus image analysis, there is a critical imperative for early disease detection, coupled with variations in retinal features among individuals. Thus, while the challenges are shared, the specific context and objectives of a given task may necessitate tailored solutions and strategies to effectively surmount these obstacles.

This study presents an innovative hybrid approach for detecting cataracts in retinal fundus images leveraging convolutional neural networks (CNNs). The methodology entails training five distinct CNN models incorporating both the complete image dataset and its corresponding quadrated parts (i.e., the fundus images divided into four segments). Final predictions are derived through majority voting based on outputs from these models. This novel approach combines the strengths of CNNs with the benefits of segmenting fundus images, enhancing the model’s ability to discern subtle patterns and handle occlusions and noise. Additionally, the journal version of the research extends the original conference [14] presentation by offering a comprehensive exploration. It introduces the hybrid approach in detail, discusses experimental findings, explores potential clinical applications, identifies research challenges, proposes future directions, and concludes with reflective insights. Moreover, the journal version strengthens the method’s validation by comparing its performance across different databases, further affirming its efficacy. This thorough analysis bolsters confidence in the approach’s reliability, marking a significant contribution to the field of ophthalmology and medical image analysis.

Experimental results underscored the superiority of the suggested hybrid approach over the conventional practice of training CNN models exclusively with complete image datasets. In comparison to the current state-of-the-art studies, this research achieved significantly improved performance, demonstrating its potential to advance cataract detection. The proposed hybrid model achieved an impressive accuracy of 97.12%, signifying a 1.44% enhancement as compared to that of the conventional approach of training CNN models by using complete image datasets. This innovation is particularly significant in the context of cataract detection, where early diagnosis is critical in preventing vision impairment and blindness. By incorporating this algorithm into a cataract detection application, the study aimed to provide robust support to ophthalmologists for accurate cataract diagnosis in patients. Cataracts, being a substantial ocular condition, can result in impaired vision and potential blindness. Timely detection is of paramount importance for prompt treatment and blindness prevention. Furthermore, the hybrid CNN model introduced in this study offers advantages such as speed, convenience, and cost-efficiency in cataract prediction, making it a promising tool for enhancing patient care and ophthalmology practice.

2. Related Works

Cataract detection represents a vital area of research within the fields of ophthalmology and medical image analysis. Over time, a variety of methods and approaches have been developed to tackle the challenge of identifying cataracts in retinal fundus images. In this article, we endeavor to provide a thorough exploration of these methodologies, shedding light on their respective strengths and limitations.

2.1. Traditional Image Processing Techniques

Traditional techniques such as thresholding and morphological operations have been foundational in cataract detection by extracting handcrafted features like texture, color, and shape characteristics from retinal images [15,16]. However, these methods often struggle with variations in image quality and complexity, limiting their diagnostic accuracy.

2.2. Machine-Learning-Based Approaches

Machine learning techniques have also found applications in cataract detection, falling into two main categories:

Supervised Learning: Algorithms like support vector machines (SVMs) and decision trees are trained on labeled datasets to identify cataract-affected regions [17,18,19]. These models rely on the quality and quantity of the training data, and their performance is highly dependent on the features provided by the data.
Deep Learning: Convolutional Neural Networks (CNNs) have become transformative in cataract detection, offering superior performance by automatically learning intricate features from retinal images [5,6,7,8].

These models excel in handling complex patterns and variations. For instance, studies have shown that deep learning models can achieve high accuracy rates, sometimes exceeding those of human experts in specific tasks [9].

2.3. Ensemble and Hybrid Models

Combining multiple models or techniques, such as traditional feature extraction with deep learning models, can enhance accuracy. Ensemble techniques like majority voting or stacking leverage the strengths of various methods, improving robustness and performance [20,21]. Moreover, they help mitigate the limitations that individual methods may exhibit. For example, by combining models, it is possible to reduce the risk of overfitting and improve generalization to new data.

2.4. Quadration-Based Approaches

Segmenting retinal images into quadrants helps pinpoint cataract-affected areas, allowing for detailed analysis of specific regions. However, these methods might miss the broader context of the entire image, presenting challenges in fine-grained cataract detection [22,23].

2.5. Dataset and Preprocessing Techniques

The success of cataract detection models hinges on the quality and scale of the datasets. Techniques such as image enhancement and normalization improve the quality of input data, essential for training accurate models [24].

2.6. Recent Advances in Ophthalmic Disease Diagnosis Techniques

Recent advancements include the use of Generative Adversarial Networks (GANs) for data augmentation, addressing the issue of limited datasets. For example, E. Y. Choi et al. demonstrated the effectiveness of GANs in generating synthetic retinal images, enhancing model robustness and generalization capabilities [25].

The landscape of cataract detection methods has evolved from traditional image processing techniques to advanced deep learning models. Deep learning, particularly the utilization of CNNs, has exhibited remarkable potential for achieving high accuracy. However, challenges related to dataset quality, model generalization, and the need for extensive clinical validation persist. Ensemble and hybrid models provide a pathway to enhance the reliability and accuracy of cataract detection. The choice of method often hinges on specific clinical requirements, available data, and computational resources.

The method developed in this study falls under the category of hybrid models, combining various models or strategies to enhance performance and robustness. In particular, this method uses five different CNN models, with each model utilizing both the complete image dataset and its corresponding four quadrated parts (i.e., the fundus images divided into four segments). The final prediction is determined using a majority voting mechanism, considering the outputs of these five models.

This innovative approach fully leverages the advantages of hybrid models by integrating the robust capabilities of CNNs with the benefits of image quadration, ultimately improving the performance and reliability of cataract detection. The use of a majority voting mechanism contributes to enhanced model reliability, as it consolidates opinions from different models, reducing the risk of misclassifications.

The strength of hybrid models lies in their ability to integrate various methods and strategies to compensate for their respective limitations. In this approach, the utilization of multiple CNN models helps to better adapt to the complexity and diversity of retinal fundus images. The comprehensiveness of this method makes it a more holistic solution and holds the potential for superior performance in the field of cataract detection.

In conclusion, this is an innovative hybrid model approach that enhances the overall model performance by integrating results from multiple sub-models. It provides a powerful and reliable tool for cataract detection, with the potential to positively affect patients’ eye health and ophthalmic medical practice.

3. Methods

This section introduces the dataset resource used in the experiment, along with several pre-processing methods used to enhance fundus images. Additionally, it presents a methodology for creating variant datasets that were used to train hybrid CNN models for cataract detection. Furthermore, the section elaborates on the development of a window application by using the Qt Design tool. This application served as a platform where users could input fundus images of patients and receive predictions from the trained model. By following this comprehensive approach, the section aims to provide readers with a holistic understanding of the dataset, pre-processing techniques, model training, and the practical implementation of a user-friendly interface using the Qt Design 3.8 tool.

3.1. Experiment Dataset

The dataset played a critical role as it provided the necessary information for the model to learn and recognize patterns, particularly when training a CNN for image classification tasks. In this study, two fundus image datasets were collected from open sources: Dataset 1 (https://www.kaggle.com/datasets/andrewmvd/ocular-disease-recognition-odir5k, accessed on 15 May 2023) and Dataset 2 (https://www.kaggle.com/datasets/jr2ngb/cataractdataset, accessed on 15 May 2023), to train the CNN model for cataract detection. These datasets contain various types of eye diseases such as cataract, glaucoma, diabetes, hypertension, and normal. However, this study focused exclusively on cataract detection by using fundus images, and thus, only cataract and normal fundus images were collected. The normal fundus images represented fundus images without any eye disease. The details of the datasets are presented Table 1.

As shown in Table 1, the number of normal fundus images was approximately three times greater than that of cataract images. When the dataset was imbalanced, meaning that there was a significant difference in the number of samples across classes, it could lead to a biased model performance. The model tended to favor the majority class because of its dominance in the training data while struggling to effectively learn and recognize patterns from the minority class. To address this imbalance problem, this study randomly reduced the number of normal fundus images to match the number of cataract images in the dataset. The final size of the dataset is presented in Table 2, consisting of a total of 1388 fundus images, including both normal and cataract images. The training and testing datasets were divided in an 8:2 ratio, resulting in an original training dataset size of 1110 fundus images, with 888 images allocated for training and 222 images for validation in an 8:2 ratio. Additionally, the testing dataset consisted of 278 images.

3.2. Image Preprocessing Methods

Image preprocessing plays a critical role in optimizing data before training them with CNN models. Inconsistent data can result in time-consuming training, and unnecessary features may be trained, leading to reduced model efficiency. Preprocessing involves preparing and refining images to ensure that they are in a suitable format for analysis.

With the application of preprocessing techniques, image quality can be enhanced, pixel values can be normalized, and noise or artifacts that may hinder accurate pattern recognition can be reduced. This step is essential for improving the robustness and reliability of the CNN model’s performance. The details of the image preprocessing methods applied in this study are shown in Figure 1. After the application of these methods, the images in the dataset underwent resizing to the dimensions of 224 × 224 pixels before being fed into the CNN models. This size was chosen to match the requirement for the input image size of the DenseNet121 model.

3.2.1. Recircle and Crop Black Areas

The dataset was collected from open source, and it contained fundus images with inconsistent sizes and shapes (Figure 2). Training a CNN model for image classification is a computationally intensive task that requires significant resources for image feature extraction. Therefore, it was important and necessary to remove unnecessary areas from the input images to improve the training efficiency and reduce the training time of the model. An example of fundus images with unnecessary black areas is shown in Figure 1b, and the results of applying the crop black area method are shown in Figure 1b. As shown, the crop black area method successfully eliminated the unnecessary black area in comparison to the original image.

The crop black area method (https://www.kaggle.com/code/taindow/pre-processing-train-and-test-images, accessed on 18 April 2024) involved converting the original image to grayscale, resulting in an image with two pixel values: 255 and 0. By applying a threshold to compare the pixel quality of each row and column, this study identified the outline of the object. The remaining areas were then considered background or areas unrelated to the object. The results of this process are illustrated in Figure 1b. By removing the unnecessary black areas from the original image, this study provided a basis for the model to extract features more efficiently. This eliminated the need to process irrelevant black regions that did not contain details relevant to the detection of diseases in the fundus image.

Because of the existence of incomplete circular shapes in multiple fundus images within this study’s dataset (Figure 1b), a technique called recircle (https://www.kaggle.com/code/taindow/pre-processing-train-and-test-images, accessed on 18 April 2024) was utilized to standardize the shapes of diverse fundus images, ensuring a consistent circular form (Figure 1b).

3.2.2. Contrast Limited Adaptive Histogram Equalization (CLAHE) on Green Channel

Contrast limited adaptive histogram equalization (CLAHE) [26] is an image enhancement technique that improves the contrast and details while preserving the local characteristics. It is a modified version of the traditional histogram equalization (HE) method.

The CLAHE method involved dividing the image into smaller regions known as tiles or patches. Rather than equalizing the histogram of the entire image, CLAHE performed histogram equalization on each tile individually (Figure 3). Figure 3a displays the histogram of the original image, while Figure 3b demonstrates the redistribution achieved by CLAHE across the histogram, resulting in a more balanced enhancement. Upon the application of CLAHE, the overall contrast of the image was improved, enhancing the visibility of details and structures. This technique helped to reveal important features that might be obscured in low-contrast regions or overshadowed by strong intensity variations.

In this study, instead of applying CLAHE on all RGB channels, the CLAHE method was individually used to enhance the contrast of dataset by application on a single green channel (Figure 1). Figure 1d illustrates the original fundus image, while Figure 1d showcases the image obtained after applying CLAHE particularly to the green channel. The decision to focus on the green channel was based on previous studies that have shown that compared with the blue and red channels, the green channel contains a greater amount of information and features [27]. Furthermore, applying CLAHE specifically to the green channel has been proven to yield greater performance than applying it on the entirely RGB channel [28,29]. As depicted in Figure 1, this study started by splitting all RGB fundus images within the dataset to obtain their three individual channels: red, green, and blue. Next, the CLAHE method was applied to enhance the contrast of the single green channel. Lastly, the enhanced green channel was merged with the original red and blue channels. By repeating this step for all fundus images in the dataset, this study obtained a contrast-enhanced fundus image dataset.

3.2.3. Image Normalization

Image normalization is a method of scaling and transforming the pixel values of an image to a specific range. The purpose of normalization is to ensure that the input data have zero mean and equal variance, which helps to stabilize and improve the training process of CNNs. In this study, the pixel values of the fundus image were scaled to the range [0, 1] by dividing each pixel value by 255. This scaling process required fewer computational resources than working with values in larger ranges. This is particularly important when training large-scale CNN models on limited computational resources, as it can help speed up the training and inference process. Figure 4 demonstrates the effect of image normalization on pixel values. Figure 4a presents the pixel values of the original image, while Figure 4b showcases the image after undergoing the process of normalization with pixel scaling.

3.2.4. Data Augmentation

Data augmentation is a technique to expand the scale of the training dataset by applying diverse transformations or reforms to the presented data [30], which can upgrade the performance and overview of CNN models.

As this study had a limited amount of data, it was necessary to use techniques that increased the size of the training dataset, prevent overfitting, and enhance the diversity of image structures through the model’s training process. The data augmentation methods utilized in this study are showcased in Figure 1g, encompassing random horizontal flip, random rotation of 30°, and zooming in. In this study, data augmentation techniques, such as random rotation (30°), zoom-in, and horizontal flip, were applied to the entire dataset before any subdivision of images. These augmentations were performed to enhance dataset variability and improve model generalization. After augmentation, the dataset was partitioned into training, validation, and testing sets. The dataset in this study composition is provided in Table 2, revealing a total of 1388 training datasets. The application of these techniques led to a threefold increase in the size of the training dataset, resulting in 2664 images (Table 3).

In this study, we employed random rotation and zoom-in techniques to augment the dataset. The random rotation was set to 30° based on empirical evidence indicating its effectiveness in enhancing dataset variability without significantly distorting the anatomical structures of the fundus images. This fixed angle strikes a balance between dataset augmentation and computational efficiency, ensuring that the augmented images remain realistic and useful for training.

For the zoom-in augmentation, a consistent zoom-in factor was applied. This approach focuses uniformly on finer details, helping the model learn to recognize small features consistently. It also avoids the potential risk of overfitting that might arise from using multiple zoom scales, thus maintaining the generalizability of the model. Additionally, this method simplifies the augmentation process, ensuring computational efficiency during training.

3.3. Transfer Learning with Pretrained CNN Model

3.3.1. CNN Model Selection

The selection of a suitable model is crucial for training a CNN model for an image classification task. The chosen model architecture selection can have a significant effect on the performance and effectiveness of the trained model. There are various types of CNN models in Keras Application [31]. DenseNet is a type of CNN architecture introduced in 2017 by Huang et al. [32]. The DenseNet architecture connects each layer to every other layer in a feed-forward fashion, unlike traditional feed-forward connections where each layer is linked only to the subsequent layer. This results in densely connected blocks of layers, where each layer receives input from all of the preceding layers and passes its output to all of the subsequent layers.

DenseNet is a family of neural network architectures, and the naming convention is based on the number of layers in the network. The most used DenseNet architectures are DenseNet-121, DenseNet-169, and DenseNet-201. DenseNet-121, for example, has 121 layers, DenseNet-169 has 169 layers, and DenseNet-201 has 201 layers. While the transition layers and dense blocks remain the same across different architectures, the number of layers in each model’s dense block varies. The number of layers in a DenseNet architecture determines its depth and capacity. Deeper and more complex architectures have higher capacity and can learn more complex features. However, they are more prone to overfitting and require more computational resources for training. In this study, DenseNet121 was chosen as a training model, because of the reasonable size of the model for such a small dataset as this study had.

Thus, this study selected the DenseNet121 model and used transfer learning with our dataset. The transfer learning technique will be introduced in the next section. Overall, DenseNet-121 is a favorable choice for transfer learning with a small dataset because of its parameter efficiency, strong feature extraction capabilities, and pre-training on large-scale datasets such as ImageNet. With its dense connectivity, the network can effectively capture and leverage information from a limited number of samples without overfitting. By utilizing transfer learning, the pretrained DenseNet-121 model provided a solid foundation, as it had learned representations from millions of labeled images from datasets such as ImageNet. This enabled the model to classify the testing data well and mitigate the risk of overfitting. To objectively evaluate this model selection, this study performed experiments to compare the performance of other models with the DenseNet121 model.

3.3.2. Transfer Learning Technique

In the deep learning field, transfer learning is a technique that involves leveraging the knowledge acquired from one task or domain to enhance performance on a different but related task or domain. In transfer learning, a pretrained model (in this study, it was DenseNet121) that has been trained on a large dataset, is used as an initial start for a new task, rather than training a model from scratch.

The goal of transfer learning is to transfer and apply the knowledge and learned representations acquired by a model from a source task (in this study, DenseNet121) to a target task, even if the datasets or tasks are different. Instead of learning the features from scratch, transfer learning enables the model to benefit from the previously learned features and patterns. Figure 5 demonstrates how transfer learning works, where the pre-trained model serves as a feature extractor for the target task, and only the final layers are trained particularly for the new task.

By utilizing transfer learning, the model can leverage the learned representations from the source task, which was trained on a large dataset, to improve performance on the target task with a potentially smaller dataset. This approach saves computational resources and training time, while still achieving good results.

3.4. Proposed Hybrid CNN Models and Cataract Detection Application

This study suggested a novel approach that involved the fusion of complete features and four quadrated features extracted from the fundus image dataset to detect cataracts. The fundus image was partitioned into four quadrants, generating four partial images and resulting in five distinct input datasets. Following uniform preprocessing steps, each dataset was input into five separate CNN models of identical structure, yielding predictions from each model. Ultimately, the majority voting technique was used to derive the conclusive classification outcome. This hybrid approach leverages the strengths of ensemble learning, where multiple CNN models are trained on both complete and quadrated image datasets. By combining the predictions from these models through majority voting, our method not only enhances the robustness and accuracy of cataract detection but also significantly reduces the risk of overfitting and improves generalization. The well-designed user interface (UI) further facilitates ease of use and accessibility, making the application user-friendly for both ophthalmologists and patients. This ensemble strategy ensures that the model captures both global and local features of the fundus images, leading to superior performance compared to single-model approaches.

3.4.1. Subdivide Original Image into Partials

Rather than relying on a solitary complete fundus image for the CNN model training and predictions, this study endeavored to augment accuracy by partitioning the initial fundus images into four discrete quadrats, as illustrated in Figure 6 (Partials Datasets including Top-1, Bottom-1, Top-2, and Bottom-2). This cropping procedure yielded five distinct datasets: one encompassing the entire image and four encompassing partial images, each associated with specific regions of the fundus images. This division into quadrats into smaller units enhanced the detection of diseases localized within specific areas of the fundus images [14,33]. Following the preprocessing steps, which included Recircle and crop black areas, Contrast Limited Adaptive Histogram Equalization (CLAHE) on the Green Channel, and data augmentation, each image was then subdivided into these four quadrats. This process ensured that the augmented variability was maintained across all image quadrats. By doing so, we generated five distinct input datasets: one for the complete image and four for the segmented parts. The complete image had a resolution of 224 × 224, and after quadrat division, each of the four partial images was also resized to 224 × 224. Each of these datasets was used to train separate CNN models. The final predictions were obtained using a majority voting mechanism, which consolidated the outputs from these five models.

3.4.2. Hybrid CNN Model Development

Figure 6 depicts the architecture of the proposed algorithm. The dataset’s fundus images underwent preprocessing procedures (as shown in Figure 1) to yield an enhanced dataset. Subsequently, transfer learning was applied through five DenseNet121 CNN models, each having supplementary layers integrated atop the base model (DenseNet121) for finetuning. These models were trained utilizing five distinct fundus image datasets, including a complete fundus image dataset and four partial datasets, each targeting specific image regions. The application of these five identical CNN models facilitated the extraction of diverse features, ensuring a comprehensive grasp of the fundus images.

Following this, a majority voting technique was utilized to combine the predictions from these five models and produce definitive classification results. This voting based on the majority technique, pooling predictions from multiple models, offered a direct route to enhancing the overall performance, as illustrated in Figure 6. The technique of majority voting is shown in the Figure 6. The performance of both the five individual models and the hybrid model will be discussed in the subsequent Section 4.

3.4.3. Environment and Hyperparameter Setting

TensorFlow is an open-source machine learning framework developed by Google Brain Team and released in 2015 [34]. It is one of the most popular and widely used libraries for building and training machine learning models. TensorFlow-GPU is a version of the TensorFlow machine learning library optimized for use with graphical processing units (GPUs). GPUs are specialized hardware that excel in performing matrix operations required for deep learning. With the use of TensorFlow-GPU, machine learning models can be trained faster than using the CPU-only version of TensorFlow. This is due to the parallel processing capability of GPUs, enabling more efficient computation. In this study, the training process utilized the Python programming language and the PyCharm integrated development environment (IDE) on the NVIDIA GeForce RTX 3060 GPU.

Hyperparameters play a crucial role in the training process as they directly affect the performance of the model. Through iterative adjustments, this study identified the optimal values of the hyperparameters of the five models. The corresponding optimal values, presented in Table 4, consistently yielded the best performance in our study.

3.4.4. Cataract Detection Application

The purpose of developing this application was to create user-friendly and reliable software that allows users to input retinal fundus images. The trained model embedded within the application analyzed and extracted features from the input fundus images, providing results indicating the presence or absence of the cataract disease. The user interface (UI) of this application was designed using the Qt Designer tool [35].

The Qt Designer tool utilized in this study’s application is a versatile and user-friendly software that simplifies the process of creating graphical user interfaces (GUIs) for Qt-based applications. It provides developers with a robust platform to design and prototype UIs, enabling a focus on the visual aspects and user experience of the software. Qt Designer streamlines the GUI creation process by offering a range of drag-and-drop features and pre-designed elements. Developers can conveniently place UI components such as buttons, menus, and other graphical elements to craft the desired interface. Moreover, Qt Designer supports the integration of various functionalities and allows for the creation of aesthetically pleasing, responsive, and intuitive interfaces. This tool is widely utilized in the development of applications across diverse domains, including industry-standard software, educational tools, and medical applications.

In the cataract detection application of this study, the UI was designed using the Qt Designer tool to ensure a user-friendly experience for individuals interacting with the software. The application processed these images, leveraging a trained model to determine the presence or absence of the cataract disease. The main interface of the Qt Designer tool is shown in Figure 7. The UI created through this tool provided users with the option to input their retinal fundus images or those of a patient, as illustrated in Figure 8. This study involved designing specific elements for “Inserting Fundus Images”, “Running the Prediction Process”, and “Saving Prediction Results”. The layout of these components was easily achieved in Qt Designer by dragging, arranging, and configuring properties. The final GUI window for this study was then connected using Python to facilitate communication between the designed elements and the presentation of the program functionality.

While this study did not delve into specific examples of Qt-based applications, the Qt framework was used in various applications across different sectors, from business software to entertainment and healthcare solutions. Its versatility and robustness make it a go-to tool for creating high-quality, interactive user interfaces.

3.5. GradCAM Technique

Deep learning models, particularly CNN models, are often considered “Black Box” systems, as we have limited insight into their internal processes that lead to predictions. However, an algorithm called GradCAM (which stands for gradient-weighted class activation mapping) [36] provides a solution by allowing us to observe the specific areas of input fundus images that influence the model’s predictions.

In this study, the GradCAM technique was used to observe the heatmaps generated by the GradCAM of five models: Model 1, Model Top1, Model Bottom1, Model Top2, and Model Bottom2. By utilizing GradCAM heatmaps, this study could delve deeper into the reasons why combining the predictions of these five models yielded better performance than that of the standalone models.

3.6. Performance Evaluation Metrics

Several metrics for performance evaluation are applied to assess the effectiveness and quality of a trained CNN model. These metrics offer valuable insights into the model’s performance on a specific task or dataset. The calculations of these metrics often involve the terms TP, TN, FP, and FN, which are explained in Figure 9.

These terms are fundamental in evaluating various performance metrics such as accuracy, precision, recall (sensitivity), and F1 Score. By measuring these metrics, we can gain a thorough understanding of the model’s predictive capabilities and its ability to correctly identify positive and negative instances. The formulas for evaluating performance metrics are shown in Equations (1)–(4), respectively.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(1)

P r e c i s i o n = \frac{T P}{T P + F P}

(2)

R e c a l l = \frac{T P}{T P + F N}

(3)

F 1 S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(4)

One commonly used representation of the model’s performance is the Confusion Matrix, which provides an overview of the model’s predictions. It presents the calculations of true positive, true negative, false positive, and false negative predictions.

The receiver operating characteristic (ROC) curve is a graphical representation that plots the true positive rate (TPR) (sensitivity) against the false positive rate (FPR) (1—specificity) at various classification thresholds. It provides a visual depiction of the trade-off between two rates and allows for the evaluation of a model’s performance at different threshold locations. The area under the ROC curve (AUC-ROC) p is a vital metric for evaluating a model’s overall performance. It quantifies the model’s ability to distinguish between positive and negative instances, with higher values indicating superior performance and greater accuracy in predictions. In essence, higher AUC-ROC values reflect better overall model performance. The equations for evaluating the TPR and the FPR are presented in Equations (5) and (6), respectively.

T P R = \frac{T P}{T P + F N}

(5)

F P R = \frac{F P}{F P + T N}

(6)

4. Results

4.1. GradCAM Technique Consequence

Figure 10a illustrates the heatmap generated by GradCAM on Model 1 (using the full image dataset). Brighter colors of the heatmap indicate the areas where Model 1 primarily focused on extracting features from input images. In Figure 10a, the overlayed image shows the heatmap overlaid on the input images, allowing us to observe that Model 1 predominantly emphasized global features in the input image to generate predictions.

Additionally, Figure 10b–e display the heatmaps and overlay images generated by GradCAM on Model Top 1, Bottom 1, Top 2, and Bottom 2, respectively. These visualizations provide insights into the areas where the CNN models focused their attention when making predictions. The overlaid images from these four models revealed that the variant models primarily emphasized the local components in the input images. In contrast, Model 1 showed a preference for extracting global components from the input image.

With the integration of the predictions from the five aforementioned models, which captured both the local and the global components from the input images, a more robust and accurate result could be achieved. This approach surpassed relying solely on the single prediction from Model 1, which prioritized the extraction of the global components from the input images.

4.2. Confusion Matrix

Model 1 refers to the model that utilized the full image dataset, while the hybrid model was the result of combining the predictions from five models (Model 1, Top 1, Bottom 1, Top 2, and Bottom 2). Figure 11 demonstrates that the majority voting method (hybrid model) exhibited a lower number of incorrect predictions than Model 1. In particular, the hybrid model incorrectly predicted five images in the cataract label, whereas Model 1 had nine incorrect predictions, indicating a substantial decrease of nearly 45%. Furthermore, both methods produced the same number of incorrect predictions for the normal label, with only 3 images being incorrectly predicted out of a total of 139 normal fundus images in the testing dataset. This remarkably low number underscored the remarkable potential of both models in accurately predicting normal images.

Figure 12 displays the confusion matrix of model Top 1, Bottom 1, Top 2, and Bottom 2, including the true labels and the predicted labels of both normal and cataract labels. Overall, the performance of these models did not surpass that of Model 1 or the hybrid model. This could be attributed, in part, to the fact that these models only utilized information and features from a quarter of the image. Therefore, it was understandable that their classification performance was lower. However, these models still exhibited a notable level of accuracy, particularly in the Top 1 and Top 2 models. Both the Top 1 and Top 2 models misclassified eight images for the normal label, while the Top 1 model misclassified five images for the cataract label, and the Top 2 model misclassified seven images for the cataract label. Notably, the Bottom 1 model only misclassified three images for the normal label, which was equivalent to Model 1 and the hybrid model. Combining the classifications from all models to create a hybrid model by using majority voting held the potential to yield favorable results.

4.3. Receiver Operating Characteristic Curve (ROC Curve)

Figure 13 showcases the ROC curve and AUC scores for both Model 1 and the hybrid model. The AUC score served as a comprehensive metric summarizing the model’s ability to discriminate between positive and negative samples. Higher AUC scores indicated better performance in distinguishing between these samples, suggesting a more accurate model. In this case, the AUC scores for Model 1 and the hybrid model were 0.957 and 0.97, respectively. Both of the models exhibited high AUC scores, approaching the maximum value of 1, which indicated a strong predictive capability for detecting cataract and non-cataract fundus images. Furthermore, the AUC score of the hybrid model surpassed that of Model 1 by more than 1.3%, confirming the superior performance of the hybrid model that leveraged the combined predictions from five models.

4.4. Performance Evaluation

Table 5 presents the best training, validation, testing accuracy, and loss of five models (Model 1, Top 1, Bottom 1, Top 2, and Bottom 2). In general, there was a slight difference in accuracy and loss between the models. Overall, five models exhibited a relatively high and stable performance for the training process.

Table 6 provides a comparative analysis of five standalone models and a hybrid model utilizing the majority voting method. Notably, the hybrid model demonstrated the highest accuracy among the models, achieving an outstanding accuracy of 97.12%. This result highlighted the effectiveness of a hybrid model approach, where the predictions of the five individual models were joined through the majority voting method. By adopting this ensemble approach, the hybrid model showcased superior performance on the testing dataset, which consisted of previously unseen data, thereby yielding improved results.

Table 6 and Figure 10 collectively elucidate the underlying rationale for the image quadration process, where each image was divided into four segments, and predictions were made on the basis of both individual segments and the complete image by using five distinct branches. The primary aim of this approach was to obtain a comprehensive understanding of the image’s features. Through this quadration, the models could effectively concentrate on both the local and the global components, acknowledging the presence of diverse cataract patterns and features within various regions of the eye.

In Table 6, the hybrid model, which amalgamated predictions from multiple models, distinctly exhibited the highest accuracy, underscoring the efficacy of this ensemble approach. This strategic methodology harnessed different models, each specializing in global and local features, culminating in a more resilient and precise evaluation of cataract presence.

Figure 10 further accentuates this rationale by visualizing the focus of each model. Model 1 prominently accentuates global features, whereas the other models center their attention on local intricacies. This emphasis underscored the multifaceted perspectives provided by these models. By amalgamating their predictions through the majority voting method, the hybrid model effectively mitigated the potential pitfall of relying solely on a single model’s viewpoint, guaranteeing that essential local information was not disregarded.

In summary, the approach of segmenting each image and leveraging multiple models with distinct areas of focus afforded a holistic comprehension of the image’s characteristics. This not only elevated the cataract detection precision but also minimized the likelihood of overlooking critical data. The success of the hybrid model, as evident in Table 6 and Figure 10, exemplified the advantages of incorporating both local and global components in cataract diagnosis, culminating in a more exhaustive and accurate assessment.

Table 7 presents the precision, recall, and F1 score grades of Model 1 and the hybrid model. Overall, the hybrid model demonstrated superior performance across all three metrics, further demonstrating its superiority over Model 1. In this study, we ensured that the training, validation, and testing sets for the majority of the CNN networks were derived from consistent data sources to allow for fair comparison. However, the datasets used for different models varied in size due to the specific characteristics and preprocessing steps of each study.

Consistent datasets for proposed models: the hybrid model and Model 1, both proposed in this study, used a dataset size of 1388 images. This dataset was partitioned into training, validation, and testing sets consistently to ensure a fair comparison of the two models’ performance.

Chakraborty and Jana Dataset [37]: The dataset used by Chakraborty and Jana is from the same source as our study but originally consisted of 1386 images. A total of 2 images were removed due to quality issues, resulting in a final dataset size of 1384 images. Despite this small difference, the data partitioning strategy was similar, allowing for a valid comparison.

Other dataset sizes: The other state-of-the-art methods compared in our study used datasets of varying sizes, as detailed in Table 8. For instance, Pratap and Koki used 800 images [38], Hossain et al. used 5718 images [7], and Junayed et al. used 4746 images [5]. Each of these studies had different data preprocessing and augmentation techniques tailored to their specific dataset characteristics.

Table 8 presents a comparison of the accuracy achieved in this study on the testing dataset with several other studies that have utilized either the same dataset or the same model structure. When comparing the results of this study with those of the previous studies [7,37,38] that used transfer learning similar to this study, Model 1 achieved an accuracy of 0.9568, while the hybrid model achieved an accuracy of 0.9712 in classifying fundus images. Furthermore, the previous study [5] exhibited higher accuracy than that of both Model 1 and the hybrid model. This difference could be attributed to the substantial difference in the dataset size, with the previous study [5] consisting of 4746 images, which was almost 3.4 times larger than that of the dataset used in this study. However, the relatively small difference in accuracy between the hybrid model of this study (97.12%) and the model of a previous study [5] (99.13%) could be understood considering the variation in dataset sizes, with the previous study [5] utilizing a dataset 3.4 times larger than that of this study.

4.5. Cataract Detection Application

The main objective of this study was to develop a user-friendly and highly applicable software application. This application incorporated a high-performance trained model derived from this study for cataract detection. Users were required to input fundus images of themselves or their patients and initiate the prediction process. The input image was then passed to the trained model, which performed the prediction. Here is a detailed explanation of the application’s functionality along with a user manual:

Inserting fundus images:
- Clicking the “Insert image” button to select and insert a single fundus image.
- Click the “Insert folder of images” button if you have a folder containing multiple fundus images.
- The application will read the input fundus images and display relevant information such as the File name, Location, and Dimensions of the input image (Figure 14).
Running the prediction process:
- Once the fundus image(s) is(are) inserted, click the [Run prediction process] button.
- The application will execute the program and send the input image to the trained CNN model.
- The program will then analyze the image to detect the presence of cataracts and display the results in the “Prediction” box (Figure 15).
Save prediction results
- Users can also save the prediction results in an Excel file by clicking the “Save results as Excel” button.
- The application will automatically save the result, and users can choose the location of the saved Excel file.
- The Excel file (Table 9) includes important information such as the file name, prediction results, implementation time, and file location.
- This feature can be helpful for users when they need to perform a further analysis of the prediction results.

This software application provides a convenient and efficient way for users to input fundus images and obtain predictions regarding the presence of cataracts. The user manual and intuitive interface ensure ease of use and facilitate a further analysis of the prediction results.

5. Conclusions

This study introduced an innovative hybrid approach for cataract detection within retinal fundus images, leveraging CNNs. The proposed methodology entailed training five distinct CNN models, incorporating both the complete image dataset and its corresponding four partial segments. The ultimate prediction was obtained via majority voting based on the predictions generated by these five models.

The experimental results underscored the superiority of the suggested hybrid approach over the conventional practice of training CNN models solely with complete image datasets. In comparison to the prevailing state-of-the-art investigations, this study attained notably enhanced performance, showcasing its potential to advance cataract detection. By integrating this proposed algorithm into a cataract detection application, the study aimed to offer robust assistance to ophthalmologists in the accurate diagnosis of cataracts in patients.

In conclusion, our proposed approach, combining CNNs and image quadration, not only enhances cataract detection’s accuracy, robustness, and generalization but also demonstrates the effectiveness of our ensemble model. By integrating predictions from multiple CNN models, we achieved significant performance improvements, illustrating the power of ensemble learning in medical image analysis. Additionally, the development of a user-friendly and intuitive UI further underscores the practical applicability of our method, making it a valuable tool for ophthalmologists in clinical settings. These advantages highlight the meaningful contributions of our study to the field of cataract detection.

5.1. Limitations and Future Directions

In the domain of ophthalmology and medical image analysis, cataract detection stands as a pivotal field of research, one that has witnessed substantial progress over the years. However, as researchers continually strive for more accurate and reliable methods, they inevitably encounter specific limitations that necessitate innovative approaches for future exploration. This paper embarked on an exploration of these challenges while charting potential avenues for future research.

5.1.1. Limited Dataset Size

The foremost challenge in cataract detection research lies in the constraint of having a relatively small dataset. In our study, this limited dataset size presented significant challenges for both training and evaluation. Researchers were compelled to make judicious choices regarding model architecture and training strategies to optimize performance within these constraints. Unfortunately, this limitation inherently restricted the scope of exploration when it came to various model architectures.

5.1.2. Choice of Backbone Architecture

The selection of a model’s backbone architecture is a critical decision in any deep learning endeavor. In our study, the limited number of images within the datasets led to the selection of the DenseNet121 architecture as the foundational choice for our model. While this architecture displayed promise in delivering accurate results, it is imperative to recognize that alternative backbone architectures, such as ResNet, Inception, and EfficientNet, may offer unique advantages and insights. Regrettably, the confines imposed by the dataset size limited the extent to which these alternatives could be explored.

5.1.3. Absence of LOCS III

One significant limitation of our study is the absence of the Lens Opacities Classification System III (LOCS III), which is a widely recognized method for grading cataract severity based on lens opacity. The exclusion of LOCS III means our study may lack the standardized grading that facilitates consistent and comparable assessments across different studies and clinical settings. Additionally, LOCS III takes into account various factors affecting cataract formation, such as age, genetics, and systemic conditions, which were not fully considered in our approach. This could lead to variability in cataract detection and grading, potentially affecting the reliability of our results.

5.2. Future Research Directions

To address these limitations and propel cataract detection research into the future, several promising avenues beckon:

Explore Alternative Backbone Architectures: Researchers have the opportunity to broaden their horizons by conducting experiments with a diverse array of backbone architectures. This should include exploration beyond DenseNet121, encompassing models such as ResNet, Inception, and EfficientNet. Comparative assessments of their performance within the constraints of a limited dataset can yield invaluable insights into the most suitable architecture for cataract detection.
Data Augmentation and Expansion: Expanding the size and diversity of the dataset emerges as a paramount step toward improving model generalization. Techniques for data augmentation, coupled with collaboration with healthcare institutions, can facilitate this goal. Such an approach permits the evaluation of a broader spectrum of model architectures.
Transfer Learning: Investigating the potential of transfer learning from models pretrained on larger medical image datasets, followed by fine-tuning for cataract detection, presents a promising avenue. Transfer learning leverages pre-existing knowledge and adapts it to the specific intricacies of cataract detection.
Multi-modality Fusion: The fusion of data from diverse imaging modalities, including optical coherence tomography (OCT) and ultrasound, holds the potential to forge a comprehensive cataract diagnosis system. This approach aspires to bolster accuracy and diagnostic capabilities.
Evaluation of Operating Efficiency: Although the current study focuses on the methodological advancements and their impact on diagnostic performance, future research should include a comprehensive evaluation of the proposed method’s operating efficiency. This would involve detailed comparisons with other state-of-the-art techniques to ensure that our approach is not only accurate but also computationally efficient and practical for real-world clinical applications.

In conclusion, the journey of cataract detection is marked by both challenges and opportunities. The process of addressing these limitations and embarking on innovative future work is poised to not only elevate model performance but also render cataract detection systems more adaptable to diverse clinical settings and datasets. Collaborative endeavors and pioneering research hold the promise of propelling the field forward, fostering prospects for improved patient care and ocular health. The future beckons with the potential for groundbreaking advancements in the realm of cataract detection.

Author Contributions

Conceptualization, C.-L.L. and V.-V.N.; methodology, C.-L.L. and V.-V.N.; software, V.-V.N. validation, C.-L.L. and V.-V.N.; formal analysis, V.-V.N.; data curation, V.-V.N.; writing—original draft preparation, C.-L.L. and V.-V.N.; writing—review and editing, C.-L.L.; supervision, C.-L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets used in this study for training the CNN model for cataract detection are publicly available from the following sources: Dataset 1: Ocular Disease Recognition (ODIR-5K) is available at https://www.kaggle.com/datasets/andrewmvd/ocular-disease-recognition-odir5k, accessed on 18 April 2024. Dataset 2: Cataract Dataset is available at https://www.kaggle.com/datasets/jr2ngb/cataractdataset, accessed on 18 April 2024. These datasets include images of various eye diseases such as cataract, glaucoma, diabetes, hypertension, and normal conditions. They are openly accessible and can be used for research and educational purposes under the terms provided by their respective sources.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mangi, M.; Bashir, M.K.; Inam, M. Outcome of Cataract Surgery at Secondary Eye Care Facility in Karachi. Pak. J. Ophthalmol. 2022, 38. [Google Scholar] [CrossRef]
Nahar, B. Manual Small-Incision Cataract Surgery. inSIGHT 2022, 2, 7. [Google Scholar]
Hatch, W.V.; Campbell, E.d.L.; Bell, C.M.; El-Defrawy, S.R.; Campbell, R.J. Projecting the growth of cataract surgery during the next 25 years. Arch. Ophthalmol. 2012, 130, 1479–1481. [Google Scholar] [CrossRef] [PubMed]
Resnikoff, S.; Lansingh, V.C.; Washburn, L.; Felch, W.; Gauthier, T.-M.; Taylor, H.R.; Eckert, K.; Parke, D.; Wiedemann, P. Estimated number of ophthalmologists worldwide (International Council of Ophthalmology update): Will we meet the needs? Br. J. Ophthalmol. 2020, 104, 588–592. [Google Scholar] [CrossRef]
Junayed, M.S.; Islam, M.B.; Sadeghzadeh, A.; Rahman, S. CataractNet: An automated cataract detection system using deep learning for fundus images. IEEE Access 2021, 9, 128799–128808. [Google Scholar] [CrossRef]
Khan, M.S.M.; Ahmed, M.; Rasel, R.Z.; Khan, M.M. Cataract detection using convolutional neural network with VGG-19 model. In Proceedings of the 2021 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 10–13 May 2021; pp. 0209–0212. [Google Scholar]
Hossain, M.R.; Afroze, S.; Siddique, N.; Hoque, M.M. Automatic detection of eye cataract using deep convolution neural networks (DCNNs). In Proceedings of the 2020 IEEE Region 10 Symposium (TENSYMP), Dhaka, Bangladesh, 5–7 June 2020; pp. 1333–1338. [Google Scholar]
Lai, C.-J.; Pai, P.-F.; Marvin, M.; Hung, H.-H.; Wang, S.-H.; Chen, D.-N. The use of convolutional neural networks and digital camera images in cataract detection. Electronics 2022, 11, 887. [Google Scholar] [CrossRef]
Vimala, G.A.G.; Kajamohideen, S. Diagnosis of diabetic retinopathy by extracting blood vessels and exudates using retinal color fundus images. WSEAS Trans. Biol. Biomed. 2014, 11, 20–28. [Google Scholar]
Tsiknakis, N.; Theodoropoulos, D.; Manikis, G.; Ktistakis, E.; Boutsora, O.; Berto, A.; Scarpa, F.; Scarpa, A.; Fotiadis, D.I.; Marias, K. Deep learning for diabetic retinopathy detection and classification based on fundus images: A review. Comput. Biol. Med. 2021, 135, 104599. [Google Scholar] [CrossRef] [PubMed]
Zhang, W.; Zhao, X.; Chen, Y.; Zhong, J.; Yi, Z. DeepUWF: An automated ultra-wide-field fundus screening system via deep learning. IEEE J. Biomed. Health Inform. 2020, 25, 2988–2996. [Google Scholar] [CrossRef]
Szeto, S.K.-H.; Hui, V.W.K.; Siu, V.; Mohamed, S.; Chan, C.K.; Cheung, C.Y.L.; Hsieh, Y.T.; Tan, C.S.; Chhablani, J.; Lai, T.Y. Recent advances in clinical applications of imaging in retinal diseases. Asia-Pac. J. Ophthalmol. 2023, 12, 252–263. [Google Scholar] [CrossRef]
Li, T.; Bo, W.; Hu, C.; Kang, H.; Liu, H.; Wang, K.; Fu, H. Applications of deep learning in fundus images: A review. Med. Image Anal. 2021, 69, 101971. [Google Scholar] [CrossRef] [PubMed]
Nguyen, V.-V.; Lin, C.-L. Cataract Detection using Hybrid CNN Model on Retinal Fundus Images. In Proceedings of the 2023 9th International Conference on Applied System Innovation (ICASI), Chiba, Japan, 21–25 April 2023; pp. 42–44. [Google Scholar]
Zhang, L.; Li, J.; Han, H.; Liu, B.; Yang, J.; Wang, Q. Automatic cataract detection and grading using deep convolutional neural network. In Proceedings of the 2017 IEEE 14th International Conference on Networking, Sensing and Control (ICNSC), Calabria, Italy, 16–18 May 2017; pp. 60–65. [Google Scholar]
Faizal, S.; Rajput, C.A.; Tripathi, R.; Verma, B.; Prusty, M.R.; Korade, S.S. Automated cataract disease detection on anterior segment eye images using adaptive thresholding and fine tuned inception-v3 model. Biomed. Signal Process. Control 2023, 82, 104550. [Google Scholar] [CrossRef]
Qiao, Z.; Zhang, Q.; Dong, Y.; Yang, J.-J. Application of SVM based on genetic algorithm in classification of cataract fundus images. In Proceedings of the 2017 IEEE International Conference on Imaging Systems and Techniques (IST), Beijing, China, 18–20 October 2017; pp. 1–5. [Google Scholar]
Binus, A. Automatic Cataract Detection System based on Support Vector Machine (SVM). In Proceedings of the Second Asia Pacific International Conference on Industrial Engineering and Operations Management, Surakarta, Indonesia, 14–16 September 2021. [Google Scholar]
Chande, K.; Jha, P.; Aulakh, K.K.; Shinde, S. Cataract detection using textural features and machine learning algorithms. In Proceedings of the 2nd International Conference on Artificial Intelligence: Advances and Applications (ICAIAA 2021), Hyderabad, India, 15–17 April 2021; pp. 595–606. [Google Scholar]
Maaliw, R.R.; Alon, A.S.; Lagman, A.C.; Garcia, M.B.; Abante, M.V.; Belleza, R.C.; Tan, J.B.; Maaño, R.A. Cataract detection and grading using ensemble neural networks and transfer learning. In Proceedings of the 2022 IEEE 13th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 12–15 October 2022; pp. 0074–0081. [Google Scholar]
Ishtiaq, U.; Abdullah, E.R.M.F.; Ishtiaque, Z. A Hybrid Technique for Diabetic Retinopathy Detection Based on Ensemble-Optimized CNN and Texture Features. Diagnostics 2023, 13, 1816. [Google Scholar] [CrossRef] [PubMed]
Sevani, N.; Tampubolon, H.; Wijaya, J.; Cuvianto, L.; Salomo, A. A Study of Convolution Neural Network Based Cataract Detection with Image Segmentation. In Proceedings of the 2022 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT), Solo, Indonesia, 3–5 November 2022; pp. 216–221. [Google Scholar]
Deshmukh, S.V.; Roy, A. Retinal Blood Vessel Segmentation Based on Modified CNN and Analyze the Perceptional Quality of Segmented Images. In Proceedings of the International Conference on Advanced Network Technologies and Intelligent Computing, Varanasi, India, 22–24 December 2022; pp. 609–625. [Google Scholar]
Yadav, S.; Singh Yadav, J.K.P. Enhancing Cataract Detection Precision: A Deep Learning Approach. Trait. Signal 2023, 40, 1413–1424. [Google Scholar] [CrossRef]
Choi, E.Y.; Han, S.H.; Ryu, I.H.; Kim, J.K.; Lee, I.S.; Han, E.; Kim, H.; Choi, J.Y.; Yoo, T.K. Automated detection of crystalline retinopathy via fundus photography using multistage generative adversarial networks. Biocybern. Biomed. Eng. 2023, 43, 725–735. [Google Scholar] [CrossRef]
Sahu, S.; Singh, A.K.; Ghrera, S.; Elhoseny, M. An approach for de-noising and contrast enhancement of retinal fundus image using CLAHE. Opt. Laser Technol. 2019, 110, 87–98. [Google Scholar]
Liang, Y.; He, L.; Fan, C.; Wang, F.; Li, W. Preprocessing study of retinal image based on component extraction. In Proceedings of the 2008 IEEE International Symposium on IT in Medicine and Education, Xiamen, China, 12–14 December 2008; pp. 670–672. [Google Scholar]
Setiawan, A.W.; Mengko, T.R.; Santoso, O.S.; Suksmono, A.B. Color retinal image enhancement using CLAHE. In Proceedings of the International Conference on ICT for Smart Society, Jakarta, Indonesia, 13–14 June 2013; pp. 1–3. [Google Scholar]
Alwazzan, M.J.; Ismael, M.A.; Ahmed, A.N. A hybrid algorithm to enhance colour retinal fundus images using a Wiener filter and CLAHE. J. Digit. Imaging 2021, 34, 750–759. [Google Scholar] [CrossRef] [PubMed]
Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef]
Team, K. Keras Documentation: Keras Applications. Available online: https://www.kerasio/api/applications (accessed on 15 May 2023).
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Xu, X.; Zhang, L.; Li, J.; Guan, Y.; Zhang, L. A hybrid global-local representation CNN model for automatic cataract grading. IEEE J. Biomed. Health Inform. 2019, 24, 556–567. [Google Scholar] [CrossRef] [PubMed]
Pang, B.; Nijkamp, E.; Wu, Y.N. Deep learning with tensorflow: A review. J. Educ. Behav. Stat. 2020, 45, 227–248. [Google Scholar] [CrossRef]
Willman, J.M. Creating GUIs with Qt Designer. In Beginning PyQt: A Hands-on Approach to GUI Programming with PyQt6; Springer: Berlin, Germany, 2022; pp. 217–258. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Chakraborty, S.; Jana, S. Early Prediction of Cataract using Convolutional Neural Network. In Proceedings of the 2023 IEEE Devices for Integrated Circuit (DevIC), Kalyani, India, 7–8 April 2023; pp. 446–450. [Google Scholar]
Pratap, T.; Kokil, P. Computer-aided diagnosis of cataract using deep transfer learning. Biomed. Signal Process. Control 2019, 53, 101533. [Google Scholar] [CrossRef]

Figure 1. Image Preprocessing Methods. (a) Initial reading of fundus image from the dataset. (b) Example of a fundus image with a large, unnecessary black area. The ‘Crop Black Area’ method is employed to remove the unnecessary black area from the image, resulting in fundus images with incomplete circular shapes. Subsequently, the ‘Recircle’ method is applied to achieve uniformly circular fundus images. (c) Image separation into three channels: red, green, and blue. (d) CLAHE applied to the Green Channel. (e) Combination of the three channels to obtain the enhanced image. (f) Image normalization. (g) Data augmentation techniques, including random horizontal flip, random rotation of 30°, and zooming in on the original image.

Figure 2. Example of inconsistent fundus images.

Figure 3. (a) Histogram of original image. (b) CLAHE redistributed across the histogram to achieve a more balanced enhancement.

Figure 4. (a) Pixel values of original image. (b) Image normalization with pixel scaling.

Figure 5. Transfer learning is keeping “learn” from the trained model.

Figure 6. Hybrid CNN models.

Figure 7. Qt Designer’s main interface.

Figure 8. Cataract detection application’s UI.

Figure 9. TP, TN, FP, and FN metrics.

Figure 10. Heatmap of (a) Model 1 (trained using full image dataset), (b) Model Top 1 (trained using Top 1 dataset), (c) Model Bottom 1 (trained using Bottom 1 dataset), (d) Model Top 2 (trained using Top 2 dataset), and (e) Model Bottom 2 (trained using Bottom 2 dataset). Heatmaps generated by Grad-CAM on the CNN models. The first column shows the original input fundus images. The second column shows the Grad-CAM heatmaps overlaid on the original images, highlighting the regions where the models focused their attention during prediction. The third column presents the combined visualization for better interpretability.

Figure 11. Confusion matrix of (a) Model 1 (using full image dataset) and (b) hybrid model.

Figure 12. Confusion matrix of (a) Model Top 1, (b) Bottom 1, (c) Top 2, and (d) Bottom 2.

Figure 13. ROC curve of (a) Model 1 (full image dataset) and (b) hybrid model (majority voting).

Figure 14. Application presents several input images and relevant information.

Figure 15. Application displays the prediction result of trained model on input fundus image.

Table 1. Number of normal and cataract fundus images in two datasets.

Dataset	Normal	Cataract
Dataset 1	2002	594
Dataset 2	300	100
Total	2302	694

Table 2. Experiment dataset size.

Condition	Number of Images	Total Number of Images	Dataset	Number of Images
Normal fundus images	694	1388	Training dataset	888
Normal fundus images	694		Validation dataset	222
Cataract fundus images	694		Validation dataset	222
Cataract fundus images	694		Testing dataset	278

Table 3. Size of training dataset after application of data augmentation.

Original Training Dataset Size	Augmented Training Dataset Size
888	2664

Table 4. Hyperparameter for training models.

Hyperparameter	Learning Rate	Dropout	Batch Size	Epoch	Patience
Value	5 × 10⁻⁵	0.5	32	50	15

Table 5. Training, validation accuracy, and loss of five models.

Data	Performance	Model 1	Model Top 1	Model Bottom 1	Model Top 2	Model Bottom 2
Training	Accuracy	0.9555	0.9760	0.9675	0.9724	0.9555
Training	Loss	0.1542	0.0837	0.1532	0.1310	0.1642
Validation	Accuracy	0.9496	0.9460	0.9532	0.9424	94.60
Validation	Loss	0.1496	0.2527	0.2089	0.1710	0.1816
Testing	Accuracy	0.9568	0.9532	0.9496	0.9460	0.9424
Testing	Loss	0.1030	0.2209	0.1551	0.1877	0.1473

Table 6. Testing accuracy and loss of five models and hybrid model.

Methods	Accuracy (%)
Hybrid model (majority voting)	97.12
Model 1 (full image dataset)	95.68
Model Top 1	95.32
Model Bottom 1	94.96
Model Top 2	94.60
Model Bottom 2	94.24

Table 7. Precision, recall, and F1 score of Model 1 and hybrid model.

Methods	Condition	Precision	Recall	F1 Score
Model 1	Cataract	0.94	0.98	0.96
Model 1	Normal	0.98	0.94	0.96
Hybrid Model	Cataract	0.96	0.98	0.97
Hybrid Model	Normal	0.98	0.96	0.97

Table 8. Comparison of proposed methods with state-of-the-art methods. *: 0.9248 is the mean accuracy of this study because the two models were trained with two separate datasets (Dataset 1 accuracy = 0.9747, Dataset 2 accuracy = 0.8750).

Methods	Year	Dataset Size	Model	Accuracy
Hybrid Model	2024	1388	DenseNet121	0.9712
Model 1	2023	1388	DenseNet121	0.9568
Pratap and Koki [38]	2019	800	AlexNet and SVM	0.9291
Hossain et al. [7]	2020	5718	ResNet-50	0.9577
Chakraborty and Jana [37]	2023	1386	VGG19	0.9248 *
Junayed et al. [5]	2021	4746	CataractNet	0.9913

Table 9. Prediction from application can be saved as an Excel file.

File Name	Prediction Results	Implementation Time	File Location
220_left.jpg	Normal	2023-05-25 15:37:38	C:/Users/Andrew/Desktop/Fundus Imgae Dataset/Dataset Fundus Image/test_img/220_left.jpg
240_left.jpg	Normal	2023-05-25 15:37:42	C:/Users/Andrew/Desktop/Fundus Imgae Dataset/Dataset Fundus Image/test_img/240_left.jpg
260_left.jpg	Cataract	2023-05-25 15:37:46	C:/Users/Andrew/Desktop/Fundus Imgae Dataset/Dataset Fundus Image/test_img/260_left.jpg
303_right.jpg	Normal	2023-05-25 15:37:50	C:/Users/Andrew/Desktop/Fundus Imgae Dataset/Dataset Fundus Image/test_img/303 _right.jpg
305_right.jpg	Cataract	2023-05-25 15:37:54	C:/Users/Andrew/Desktop/Fundus Imgae Dataset/Dataset Fundus Image/test _img/305_right.jpg
330_left.jpg	Cataract	2023-05-25 15:37:54	C:/Users/Andrew/Desktop/Fundus Imgae Dataset/Dataset Fundus Image/test_img/330_left.jpg
368_right.jpg	Cataract	2023-05-25 15:37:58	C:/Users/Andrew/Desktop/Fundus Imgae Dataset/Dataset Fundus Image/test_img/368_right.jpg
392_left.jpg	Normal	2023-05-25 15:38:02	C:/Users/Andrew/Desktop/Fundus Imgae Dataset/Dataset Fundus Image/test_img/392_left.jpg
crop_img.jpg	Normal	2023-05-25 15:38:10	C:/Users/Andrew/Desktop/Fundus Imgae Dataset/Dataset Fundus Image/test_img/crop_img.jpg
NL_012.png	Cataract	2023-05-25 15:38:13	C:/Users/Andrew/Desktop/Fundus Imgae Dataset/Dataset Fundus Image/test_img/NL_012.png
NL_020.png	Cataract	2023-05-25 15:38:17	C:/Users/Andrew/Desktop/Fundus Imgae Dataset/Dataset Fundus Image/test img/NL_020.png

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nguyen, V.-V.; Lin, C.-L. Enhancing Cataract Detection through Hybrid CNN Approach and Image Quadration: A Solution for Precise Diagnosis and Improved Patient Care. Electronics 2024, 13, 2344. https://doi.org/10.3390/electronics13122344

AMA Style

Nguyen V-V, Lin C-L. Enhancing Cataract Detection through Hybrid CNN Approach and Image Quadration: A Solution for Precise Diagnosis and Improved Patient Care. Electronics. 2024; 13(12):2344. https://doi.org/10.3390/electronics13122344

Chicago/Turabian Style

Nguyen, Van-Viet, and Chun-Ling Lin. 2024. "Enhancing Cataract Detection through Hybrid CNN Approach and Image Quadration: A Solution for Precise Diagnosis and Improved Patient Care" Electronics 13, no. 12: 2344. https://doi.org/10.3390/electronics13122344

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Cataract Detection through Hybrid CNN Approach and Image Quadration: A Solution for Precise Diagnosis and Improved Patient Care

Abstract

1. Introduction

2. Related Works

2.1. Traditional Image Processing Techniques

2.2. Machine-Learning-Based Approaches

2.3. Ensemble and Hybrid Models

2.4. Quadration-Based Approaches

2.5. Dataset and Preprocessing Techniques

2.6. Recent Advances in Ophthalmic Disease Diagnosis Techniques

3. Methods

3.1. Experiment Dataset

3.2. Image Preprocessing Methods

3.2.1. Recircle and Crop Black Areas

3.2.2. Contrast Limited Adaptive Histogram Equalization (CLAHE) on Green Channel

3.2.3. Image Normalization

3.2.4. Data Augmentation

3.3. Transfer Learning with Pretrained CNN Model

3.3.1. CNN Model Selection

3.3.2. Transfer Learning Technique

3.4. Proposed Hybrid CNN Models and Cataract Detection Application

3.4.1. Subdivide Original Image into Partials

3.4.2. Hybrid CNN Model Development

3.4.3. Environment and Hyperparameter Setting

3.4.4. Cataract Detection Application

3.5. GradCAM Technique

3.6. Performance Evaluation Metrics

4. Results

4.1. GradCAM Technique Consequence

4.2. Confusion Matrix

4.3. Receiver Operating Characteristic Curve (ROC Curve)

4.4. Performance Evaluation

4.5. Cataract Detection Application

5. Conclusions

5.1. Limitations and Future Directions

5.1.1. Limited Dataset Size

5.1.2. Choice of Backbone Architecture

5.1.3. Absence of LOCS III

5.2. Future Research Directions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI