3.4.1. Deep Learning Techniques
Recent advancements in DL for AMD classification are discussed in this section, showing promising results in categorizing retinal fundus images for different AMD stages. Studies have explored network depth and transfer learning, indicating that training networks from scratch with sufficient data can lead to higher accuracy than using pre-trained models [
87,
88]. However, addressing model limitations, such as generalizability to diverse datasets and matching attention maps with underlying vision transformer (ViT0 sickness, requires further research. In diagnosing retinal diseases, DL tasks mainly involve segmentation and classification. Classification directly categorizes input images into illness groups, while segmentation tasks reveal detailed information about retinal disorders from fundus images, including critical lesions and biomarkers [
7,
89,
90,
91].
Several studies have examined categorizing AMD modalities, using different techniques and methodologies. For example, Karri et al. [
19] proposed a transfer learning method based on the Inception network, utilizing the DHU database to classify dry AMD, DME, or normal using a pre-trained CNN and GoogLeNet. Initially, saturated pixels were replaced with pixels of intensity 10. Following RPE estimation, smoothing, and retinal flattening, the RPE lower contour was relocated to a fixed position 71% of the height). The image was resized to
and filtered using the BM3D filter. Three BM3D filter results were replicated, each treated as channel information. The proposed method utilized flattened and filtered images to classify retinal OCT images as normal, DME, or AMD. Across all validations (N1, N2, and N3), the average prediction accuracy for normal, DME, and AMD was 99%, 86%, and 89%, respectively, with the best model achieving 94% accuracy.
Philippe et al. [
92] utilized DL techniques, including transfer learning and universal features trained on fundus images, along with input from a clinical grader, to develop automated methods for AMD detection from fundus images. Their approach aimed to classify images into disease-free/early stages and referable intermediate/advanced stages using the AREDS database. The deep CNN achieved an accuracy ranging from 88.4% to 91.6% and an AUC between 0.94 and 0.96.
Li et al. [
21] utilized the VGG16 to classify CNV and normal AMD using a private retinal OCT image database. They employed image normalization but did not incorporate image denoising to avoid overfitting and enhance generalization. To address variations in image intensities, they normalized and resized OCT volumes to ensure uniform dimensions for processing with VGG16. They developed a normalization technique to adjust eye curvature, normalize volume intensities, and align BM layers achieving 100% AUC, 98.6% accuracy, 97.8% sensitivity, and 99.4% specificity.
Bulut et al. [
93] proposed a DL method for identifying retinal defects using color fundus images. They employed the Xception model and transfer learning technique, training on images from the eye diseases department of Akdeniz University Hospital, and additional open-access fundus datasets for testing. The study explored 50 potential parameter combinations of the Xception model to optimize performance. The fourth model achieved the highest accuracy of 91.39% for the training set, while the zeroth model attained the best accuracy of 82.5% for the validation set.
Xu et al. [
94] devised a DL approach using a ResNet50 model on a private database from Peking Union Medical College Hospital. They also constructed alternative machine-learning models with random forest classifiers for comparison. Three retinal specialists independently diagnosed the same testing dataset for a human-machine comparison, with participant photographs presented in a single partition subset. Utilizing fundus and OCT image pairs, their dual deep CNN model classified polypoidal choroidal vasculopathy (PCV) and AMD. Transfer learning involved initially applying ResNet-50 weights to two separate models handling fundus and OCT images, refining weights on new data, and then transferring them to corresponding convolutional blocks. The final FC layer classified input pairs into four categories: PCV, Wet AMD, Dry AMD, and AMD. The bimodal DCNN outperformed the top expert (Human1, Cohen’s
0.810) with 87.4% accuracy, 88.8% sensitivity, 95.6% specificity, and full agreement with the diagnostic gold standard (Cohen’s
0.828).
Contributions: Studies [
19,
21,
92,
93,
94] showed the effectiveness of fine-tuning models trained on medical/non-medical images for improved disease recognition compared to traditional methods. It also highlighted the feasibility of utilizing pre-trained models for faster convergence with less data. Some studies compared their method with alternative machine-learning models, such as random forest classifiers.
Limitations: Studies [
19,
21,
92,
93,
94] encountered challenges such as overfitting, reliance on a private database, gradient disappearance or explosion and slight class imbalance in the number of fundus images during training were encountered. Additionally, some studies relied only on accuracy. Furthermore, some studies relied solely on the AREDS dataset for evaluation, lacking validation on independent clinical datasets such as MESSIDOR. Additionally, the reliance on transfer learning and a single model architecture may limit the generalizability of the approach to different datasets or clinical settings, suggesting the need for further validation and exploration of alternative methodologies.
Rasti et al. [
20] developed a CAD system using public OCT images (NEH dataset; DHU dataset) to differentiate between dry AMD, DME, and normal retina. Their approach utilized a multi-scale convolutional mixture of expert (MCME) ensemble model, a data-driven neural structure employing a cost function for discriminative and fast learning of image attributes. Image preparation involved normalization, retinal flattening, cropping, ROI selection, VOI creation, and augmentation. On the DHU dataset, the model achieved accuracy, recall, and AUC of 98.33%, 97.78%, and 99.90% for N1, N2, and N3, respectively, while on the NEH dataset, it achieved total precision, total recall, and total AUC of 99.39%, 99.36%, and 0.998.
Contributions: The study introduced a CAD system employing the MCME ensemble model for classifying OCT images into dry AMD, DME, or normal retina categories. By incorporating a mixture model, the study achieved high accuracy, recall, and AUC, demonstrating the effectiveness of CNNs on multiple-scale sub-images.
Limitations: The study lacked occlusion testing and qualitative assessment of model predictions. Additionally, it did not include image denoising, complete retinal layer segmentation, or lesion detection algorithms, which are essential for comprehensive analysis in clinical settings.
Thakoor et al. [
41] developed a custom-built 3D CNN with two dense layers, four 3D convolutional layers, and a final SoftMax classification. They discuss 3D–2D hybrid CNNs. Patients who were 18 years of age or older who were treated by collaborating vitreoretinal faculty at Columbia University Medical Centre provided the data utilized in this study. Patients with non-neovascular AMD, patients with no significant vascular pathology on OCTA (non-AMD), and patients with neovascular AMD who have actionable CNV based on patient diagnosis make up the three patient groups for CNN training. They achieve a 93.4% testing accuracy in the binary categorization of neovascular AMD vs. non-AMD by using stacked 2D OCTA images of the retinal layers with 97 healthy, 80 neovascular AMD (NV-AMD) and 160 non-neovascular AMD (non-NV-AMD).
Contributions: The findings demonstrate the superiority of models with various imaging modalities concatenated into one, such as OCT volumes with b-scan images and OCTA, over models with a single input image modality. By using dataset balance, the models’ performance was improved. GradCAM visualization at each layer of the 3D input volumes indicates that model performance may be enhanced by including a high-resolution b-scan cube across the retina. Very modest angiographic results, particularly in early types of AMD, need to be confirmed by high-resolution b-scans.
Limitations: Imbalanced settings and data-limited are the study’s limitations.
Tan et al. [
1] proposed a deep CNN for diagnosing AMD using fundus images. Their custom-designed CNN, consisting of 14 layers, aimed to classify images into AMD (dry and wet) and normal categories. Utilizing data from the Ophthalmology Department of Kasturba Medical College (KMC), they employed blindfold and 10-fold cross-validation methods, achieving high classifier accuracies of 91.17% and 95.45%, respectively.
Contributions: The study proposed a deep CNN for diagnosing AMD using fundus images, consisting of 14 layers, and aimed at classifying images into AMD (dry and wet) and normal categories. The model’s portability and affordability make it suitable for deployment in regions with limited access to ophthalmology services, facilitating rapid screening for AMD among the elderly.
Limitations: The study faced limitations such as the need for large amounts of labeled ground truth data for optimal model performance. Additionally, the complexity of the CNN model led to issues with overfitting and convergence, requiring continuous parameter adjustments for optimal performance. Moreover, CNN model training was laborious and time-consuming, although testing of fundus images became quick and precise once the model was trained.
Jain et al. [
95] aimed to classify retinal fundus images as diseased or healthy without explicit segmentation or feature extraction. They developed the LCDNet system for binary categorization using DL techniques. Two datasets were utilized: one from the ML data repository at Friedrich-Alexander University and another from the Retinal Institute of Karnataka, India. The model achieved an accuracy ranging from 96.5% to 99.7%. Notably, color images performed worse than red-free images, aligning with medical community beliefs.
Contributions: The study developed the LCDNet system for binary categorization of retinal fundus images as diseased or healthy, utilizing DL techniques without explicit segmentation or feature extraction. The study highlighted the potential of DL for automated disease classification in retinal images, with red-free images showing better performance compared to color images, aligning with medical community beliefs.
Limitations: The study identified the need for a larger and more diverse dataset to enhance the model’s performance for specific diseases. Additionally, the focus on binary categorization limits the model’s ability to distinguish between different diseases or severity levels within diseased images. Comprehensive training with multi-class datasets would be necessary to address this limitation and improve the model’s applicability in clinical settings.
Islam et al. [
96] proposed a CNN-based method to identify eight ocular disorders and their affected areas. After standard pre-processing, data are fed into the network for classification. Using the ODIR dataset and a state-of-the-art GPU, the model achieved an AUC of 80.5%, a
score of 31%, and an F-score close to 85%.
Contributions: This study marked the first evaluation of various eye disorders on a real-life dataset, demonstrating strong performance across different datasets. Additionally, the model showed flawless performance when tested on alternative datasets, highlighting its accuracy in identifying all eight categories of ocular disorders. The user-friendly nature of the system and its practical viability in real-time testing underscore its potential for revolutionizing public healthcare services through image processing and neural networks.
Limitations: The study did not address specific challenges or limitations encountered during model development or evaluation. Further exploration of potential limitations, such as computational resource requirements, dataset biases, or generalizability to diverse populations, could provide valuable insights for future research and implementation in clinical settings.
Bhuiyan et al. [
97] utilized CNNs and the AREDS dataset to classify Referable AMD of fundus images into no, early, intermediate, or advanced AMD and predict late AMD progression: dry or wet. They developed six automated AMD prediction algorithms based on color fundus images by integrating DL, ML, and algorithms for AMD-specific image parameters. These methods produced image probabilities, which were combined with demographic and AMD-specific parameters in a machine-learning prediction model to identify individuals at risk of progressing from intermediate to late AMD.
Contributions: The study utilized CNNs and the AREDS dataset to develop six automated AMD prediction algorithms, classifying Referable AMD of fundus images into various stages and predicting late AMD progression. By integrating DL, ML, and algorithms for AMD-specific image parameters, the study produced image probabilities combined with demographic and AMD-specific parameters, facilitating the identification of individuals at risk of progressing from intermediate to late AMD. The immediate applicability of these methods in AMD studies using color photography could significantly reduce human labor in image classification, potentially advancing telemedicine-based automated screening for AMD and improving patient care in the field of public health.
Limitations: The study faced limitations, particularly in lower prediction accuracy when stratified according to GA and CNV. This discrepancy might be attributed to the predominance of non-incident events compared to pure dry and wet AMD cases during the construction of machine-learning models.
Zapata et al. [
61] utilized the Optretina dataset, comprising approximately 306,302 images, to develop a CNN-based classification algorithm for AMD Disease vs. No disease. They designed five algorithms and involved three retinal specialists to classify all images. Three CNN architectures were employed, two of which aimed to reduce parameters while maintaining accuracy. The study achieved an accuracy of 0.863, an AUC of approximately 0.936, a sensitivity of 90.2%, and a specificity of 82.5%.
Contributions: The study utilized the Optretina dataset to develop a CNN-based classification algorithm for AMD Disease vs. No disease, involving three retinal specialists in image classification. The study designed five algorithms and employed three CNN architectures, achieving impressive performance metrics. These algorithms demonstrated the ability to assess image quality, distinguish between left and right eyes, and accurately identify AMD and GON with notable sensitivity, specificity, and accuracy.
Limitations: The study may have faced limitations such as potential biases introduced by the involvement of retinal specialists in image classification. Additionally, the performance metrics achieved might vary across different datasets or clinical settings, warranting further validation and evaluation on diverse datasets to ensure the generalizability and robustness of the algorithms.
Yellapragada et al. [
98] developed a deep neural network with self-supervised Non-Parametric Instance Discrimination (NPID) to predict AMD severity in four steps (none, early, intermediate, and advanced AMD) and classify the disease as referable AMD (intermediate or advanced AMD) and advanced AMD (CNV or central GA). They utilized the Age-Related Eye Disease Study (AREDS) and applied three-step, four-step, and nine-step classification methods to evaluate the model’s performance in grading AMD severity without labels. Using spherical k-means clustering and hierarchical learning, they studied network behavior and compared balanced and unbalanced NPID accuracies against ophthalmologists and supervised-trained networks. NPID demonstrated flexibility in different AMD classification schemes and achieved balanced accuracies comparable to human ophthalmologists or supervised-trained networks in classifying advanced AMD (82% vs. 81–92% or 89%), referable AMD (87% vs. 90–92% or 96%), and the 4-step AMD severity scale (65% vs. 63–75% or 67%), all without the need for retraining.
Contributions: The study introduced a deep neural network with self-supervised NPID to predict AMD severity and classify referable AMD and advanced AMD without the need for labeled data. By utilizing the AREDS dataset and various classification methods, they demonstrated the flexibility of their approach in different AMD grading schemes. The NPID algorithm achieved balanced accuracies comparable to human ophthalmologists and supervised-trained networks in classifying AMD severity and referable AMD, showcasing its potential for unbiased and data-driven classification of ocular phenotypes related to AMD and other conditions.
Limitations: The study’s reliance on a specific dataset may limit the generalizability of the results to other populations or clinical settings. Additionally, while the NPID algorithm demonstrated adaptability across different labeling schemes, further validation on diverse datasets is necessary to assess its robustness and applicability in real-world healthcare settings. Furthermore, the study’s emphasis on algorithmic performance highlights the need for comprehensive evaluation of clinical outcomes and patient impacts to ensure the effectiveness and safety of self-supervised learning-based diagnostic systems in clinical practice.
Thomas et al. [
99] devised an algorithm for AMD diagnosis in retinal OCT images, employing statistical techniques, randomization, and RPE layer identification. The algorithm focuses on statistically categorizing AMD by utilizing the RPE layer and baseline to calculate Drusen height, indicative of disease severity. The methodology involves despeckling, RPE estimation, baseline estimation, Drusen height detection, and categorization. The evaluation was conducted on a publicly available dataset from Duke comprising 2130 slices from thirty participants. The proposed approach achieved an overall accuracy of 96.66%, surpassing comparable statistical techniques. An adaptive denoising method enhances RPE estimation by removing stray pixels under the RPE.
Contributions: The study developed an algorithm for diagnosing AMD in retinal OCT images, focusing on statistically categorizing AMD severity using the RPE layer and baseline to calculate Drusen height. Their methodology, which includes despeckling, RPE estimation, baseline estimation, Drusen height detection, and categorization, achieved an impressive overall accuracy of 96.66%. By utilizing statistical techniques and randomization, their approach surpassed comparable methods, showcasing its potential for accurate AMD diagnosis and disease severity assessment in OCT images.
Limitations: The study was conducted on a single publicly available dataset, which may limit the generalizability of the results to other datasets or clinical settings. Additionally, further validation on larger and more diverse datasets is necessary to assess the algorithm’s robustness and applicability in real-world healthcare scenarios. Furthermore, the study’s focus on statistical techniques may overlook potential nuances in AMD diagnosis that could be addressed by incorporating additional imaging modalities or clinical features.
Christiana et al. [
100] proposed an automated AMD identification method using fundus images using DL algorithms. Image features were extracted with an Elite U Net model optimized with mustard sunflower optimization. Training and testing utilized the AREDS dataset, achieving high accuracy. Comparison with existing approaches indicates effectiveness in AMD identification. The clinical context presents challenges due to image variability, but the proposed method excels in distinguishing vessel bifurcations and crossings from AMDs with 98% accuracy and minimal error.
Contributions: The study introduced an automated method for identifying AMD in fundus images using DL algorithms. Their approach utilized an Elite U Net model optimized with mustard sunflower optimization to extract image features, achieving high accuracy on the AREDS dataset. By comparing with existing methods, they demonstrated the effectiveness of their approach in accurately identifying AMD, particularly excelling in distinguishing vessel bifurcations and crossings from AMDs with 98% accuracy and minimal error.
Limitations: The study was primarily conducted on a single dataset, which may limit its generalizability to diverse populations or clinical settings. Additionally, the clinical context of AMD diagnosis presents challenges due to image variability, suggesting the need for further validation on larger and more diverse datasets to assess the algorithm’s robustness across different imaging conditions and patient demographics. Furthermore, the reliance on DL algorithms may require careful consideration of computational resources and potential biases in training data to ensure reliable performance in real-world clinical practice.
Vaiyapuri et al. [
101] introduced a multi-retinal disease detection model using the IDL-MRDD technique on fundus images. The model aims to classify images such as Normal, Hypertensive Retinopathy, AMD, DR, Glaucoma, Pathological Myopia, and Others. The IDL-MRDD model undergoes preprocessing, segmentation using AFA-SF, feature extraction with SqueezeNet, and classification via SSAE. AFA-SF-based multi-level thresholding enables accurate detection of infected zones. SqueezeNet generates feature vectors, while SSAE serves as the classifier. Evaluation on a benchmark multi-retinal disease dataset demonstrates superior performance with a maximum accuracy of 0.963 compared to state-of-the-art techniques.
Contributions: The study proposed a multi-retinal disease detection model, the IDL-MRDD technique, for classifying fundus images into categories such as Normal, Hypertensive Retinopathy, AMD, DR, Glaucoma, Pathological Myopia, and Others. Their approach integrated preprocessing, segmentation using AFA-SF, feature extraction with SqueezeNet, and classification via SSAE.
Limitations: The study was conducted on a specific benchmark dataset, which may limit its generalizability to other datasets or clinical scenarios. Additionally, further validation of diverse datasets representing various populations and clinical settings is necessary to assess the robustness and applicability of the model in real-world healthcare environments. Moreover, the computational complexity of the proposed technique, particularly involving multiple stages such as preprocessing, segmentation, feature extraction, and classification, may pose challenges in deployment, requiring consideration of computational resources and scalability for practical implementation.
Lee et al. [
102] proposed two DL models, CNN-LSTM and CNN-Transformer, to predict the 2-year and 5-year risk of progression to late AMD by utilizing sequential information in longitudinal CFPs. LSTM and Transformer were utilized to capture this sequential data. The models were evaluated using data from the Age-Related Eye Disease Study (AREDS), one of the largest longitudinal AMD cohorts with CFPs. Results showed that the proposed models outperformed baseline models, which only considered single-visit CFPs, in predicting the risk of late AMD (AUC of 0.879 vs. 0.868 for 2-year prediction, and 0.879 vs. 0.862 for 5-year prediction).
Contributions: The study proposed two DL models, CNN-LSTM and CNN-Transformer, for predicting the 2-year and 5-year risk of progression to late AMD by utilizing sequential information in longitudinal CFPs. Utilizing LSTM and Transformer architectures to capture sequential data, the models were evaluated using data from the AREDS, one of the largest longitudinal AMD cohorts with CFPs.
Limitations: The study was conducted on data from a specific cohort (AREDS), which may limit their generalizability to other populations or clinical settings. Additionally, the complexity of DL models may pose challenges in interpretability and computational resources, requiring careful consideration in practical implementation. Furthermore, further validation on diverse datasets and external validation cohorts is necessary to assess the robustness and generalizability of the models in real-world clinical practice.
Fang et al. [
2] utilized multiple CNN models in the ADAM challenge, including EfficientNet, DenseNet, Inception-ResNet, ResNeXt, SENet, Xception, Inception-v3, ResNet50, DenseNet101, Autoencoder with ResNet50 as backbone, ResNet-101, and EfficientNet-B4, to detect AMD into four categories using the ODIR dataset. The objectives of the challenge included detecting AMD, detecting and segmenting the optic disc, finding the fovea, and detecting and segmenting lesions from fundus images. The achieved results showed high performance with an AUC of 0.9714 and AUCs for Early AMD (0.9159), Intermediate AMD (0.9964), Advanced AMD dry (0.9914), and Advanced AMD wet (0.9917).
Contributions: The study utilized multiple CNN models in the ADAM challenge to detect AMD and perform related tasks such as optic disc segmentation and fovea localization using the ODIR dataset. Their ensemble approach, combined with the incorporation of clinical domain information, led to high-performance outcomes. The study highlighted the importance of ensembling methods and the inclusion of clinical domain knowledge in enhancing DL model performance for automated solutions in ophthalmology.
Limitations: The study was conducted on a specific dataset (ODIR), which may limit the generalizability of the findings to other datasets or clinical scenarios. Additionally, the reliance on ensembling methods and clinical domain information may introduce complexities in model interpretation and practical implementation. Furthermore, the study emphasized the importance of including clinical domain information for tasks such as AMD classification, but the impact of this information may vary depending on the specific task and dataset, requiring further investigation and validation on diverse datasets.
El-Den et al. [
103] proposed an integrated model using color fundus images to scale input images and classify retinas as normal or belonging to various grades of AMD. The method consists of two phases: in the first phase, a custom auto-encoder-based model resizes input images and performs preprocessing as needed. The output is then fed into the second phase, where a pre-trained ResNet50 model classifies the images into normal retinas, intermediate AMD, GA, and wet AMD grades. The model aims to facilitate early identification of AMD grades for prompt treatment, potentially slowing down disease progression and reducing severity. The dataset used for this study was gathered from the University of Pennsylvania-sponsored Comparisons of AMD Treatments Trials (CATT). The model comprises a CNN classification network and an auto-encoder-based network for scale adaptation. Through experiments, the model achieved an accuracy, sensitivity, and specificity of 96.2%, 96.2%, and 99%, respectively, outperforming other models.
Contributions: The study proposed an integrated model utilizing color fundus images to scale input images and classify retinas as normal or belonging to various grades of AMD. Through experiments, the model achieved impressive performance metrics outperforming other models.
Limitations: The study was conducted on a specific dataset (CATT), which may limit the generalizability of the findings to other datasets or clinical contexts. Additionally, the reliance on color fundus images may overlook potential complementary information from other imaging modalities, such as OCT or angiography, which could further improve the accuracy of AMD classification. Furthermore, while the model achieved high accuracy, further validation on external datasets and diverse populations is necessary to assess its robustness and generalizability in real-world clinical settings.
Kadry et al. [
104] developed an automated method for distinguishing between AMD patients and non-AMD patients using a DL scheme (VGG16 CNN) by merging deep features (DF) and handcrafted features (HF). The database included OCT images for assessment and fundus retinal images from the iChallenge-AMD database. The study encompassed data processing, manual feature extraction, DF extraction with VGG16, optimum feature selection using the Mayfly method, feature concatenation, binary classification, and validation. Handcrafted features such as local binary pattern (LBP), pyramid histogram of oriented gradients (PHOG), and discrete wavelet transform (DWT) were extracted from test images to enhance performance, and integrated with DF. The system was evaluated independently using CFP and OCT images. A SoftMax classifier verified detection performance, followed by a comparison of VGG16’s performance with VGG19, AlexNet, and ResNet50. In comparison to OCT images, which had accuracies ranging from 97.08% to 97.92%, the proposed VGG16 achieved an AMD detection accuracy of 97.08%, precision of 97.48%, sensitivity of 96.67%, specificity of 97.50%, and NPV of 96.69%.
Contributions: The study developed an automated method for distinguishing between AMD and non-AMD patients using a DL scheme. The system demonstrated improved performance in AMD detection, especially with OCT images.
Limitations: The study was conducted on a specific dataset, which may limit its generalizability to other datasets or clinical settings. Additionally, the comparison of VGG16’s performance with other DL models (VGG19, AlexNet, ResNet50) could be further expanded to include a broader range of architectures for a comprehensive assessment. Furthermore, the study’s emphasis on accuracy metrics may overlook potential biases or limitations in the dataset or algorithm, highlighting the need for further validation and evaluation in real-world clinical settings.
Kihara et al. [
105] introduced a binary classification ViT method for identifying eyes with neovascular AMD (neMNV). Using a 6 × 6 mm scan pattern from swept-source OCT angiography (SS-OCTA), they processed 500 B-scans to generate an integrated face prediction map. This map was used with a segmentation model to compute prediction masks, distinguishing between Drusen and the double-layer sign (DLS) associated with neMNV. Annotated individual B-scans were employed, and graded by human evaluators to test the model. The algorithm achieved sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of 82%, 90%, 79%, and 91%, respectively, with an area under the curve of 0.91. In comparison with human graders on the same set of 100 eyes, two junior graders independently identified DLSs in 24 out of 33 eyes with neMNV and correctly identified 56 out of 67 eyes without MNV, achieving 73%, 84%, 69%, and 86% for sensitivity, specificity, PPV, and NPV, respectively. The senior grader achieved 88%, 87%, 76%, and 94% for sensitivity, specificity, PPV, and NPV, respectively, detecting DLSs in 29 of 33 eyes with neMNV and correctly identifying absence in 58 of 67 eyes without MNV. The model demonstrated strong performance in detecting DLSs in eyes with late AMD, exhibiting substantial agreement with the senior human grader (
,
p < 0.001). Junior and senior human graders also showed high agreement (
,
p < 0.001) in their assessments.
Contributions: For identifying neMNV, a deep learning model was created to differentiate the DLS linked to the neMNV from Drusen. The assessment showed that segmentation performance may significantly increase sensitivity and specificity in the final classification job and is consistent and dependable.
Limitations: The model should only be used with structural SS-OCT scans with AMD because it was only evaluated on these scans. The model’s ability to identify tiny lesions is likewise rather limited. The size of the lesion and its performance are highly correlated. Small lesions are typically hard for the model to remove and hard to tell apart from Drusen from DLSs. If they include more cases in the training dataset, the performance could get better. Adding lesion size reweighting to their loss function while the ViT segmentation model is being trained is an additional strategy.
Xu, et al. [
106] proposed a hierarchical vision transformer model for distinguish Normal AMD Dry AMD Wet AMD type-I MNV and type-II MNV using different CFP databases including WMUEH, ODIR, ADAM, and Ichallenge databases. Mixup and Cutmix are the two augmentation techniques they use. They employed “Center crop” and “Random Resized Crop” to extract the most pertinent portion of an image while resizing it to a particular size to increase the variety and unpredictability of the training dataset. In addition, they used “Random Horizontal Flip” to flip the image horizontally and produce a mirror image. Additionally, they included “Random Augment”, which automatically improves images by combining arbitrary rotation, translation, scaling, and shearing combinations. They used a technique called “Random Erasing”, which randomly erases reticular sections of the image to further enlarge the dataset. The classification model was motivated to concentrate on other areas and pick up more robust characteristics by excluding specific portions of the image. The hierarchical vision transformer’s general design was divided into four phases, each of which had a distinct function. A Swin Transformer block and a Linear Embedding module make up the first step. The remaining three steps, which comprised a Swin Transformer block and a patch merging module, have the same structure.
Contributions: A deep learning model based on CFPs was suggested for the detection of differential AMD. Compared to ophthalmologists, DeepDrAMD is more efficient and lowers expenses. They compare traditional deep learning models such as CNN and MLP. The outcomes showed how well DeepDrAMD (vision transformer) worked in AMD identification, with a high AUC of 96.47% in the external Ichallenge-AMD cohort and 98.76% in the WMUEH test set.
Limitations: While augmentation of data can greatly improve model performance, there are drawbacks as well. For example, over-augmentation can lead to overfitting and increase processing resource consumption. Appropriate data augmentation techniques must be chosen and used sparingly to balance data augmentation with computational resources and prevent overuse. DeepDrAMD may perform differently on various datasets and demographics, necessitating more testing and improvement. Furthermore, the model’s applicability to actual clinical situations and its incorporation into current healthcare systems need to be carefully considered and validated.
Dominguez et al. [
107] extensively studied two families of deep learning models, CNNs and transformer architectures, for automatic diagnosis and severity grading of referable and non-referable AMD. They evaluated various scaling techniques and ensemble strategies to enhance the performance of convolutional architectures using fundus images from the UpRetina database. Among the eight CNNs (EfficientNet B3, EfficientNet v2, HRnet w32, Inception-v4, Inception-ResNetv2, ResNet50, ResnetRS50, and ResNeSt50) and nine transformer-based architectures (BeiT, CaiT, CoaT, DeiT, PiT, Swin, TNT, ViT, and Visformer), convolutional models consistently achieved an average AUROC of over 95%, nearly 80% mean
value, and over 90% for other evaluation metrics. In contrast, transformer-based models performed less effectively, with mean
values below 31% and mean AUROC values below 75% for AMD diagnosis. Their ensemble model for grading AMD severity achieved a mean accuracy (SD) of 82.55% (2.92) and a mean weighted
coefficient (SD) of 84.76% (2.45). For diagnosing referable AMD, the models attained a mean F1-score (SD) of 92.60% (0.47), mean AUROC (SD) of 97.53% (0.40), and mean weighted
coefficient (SD) of 85.28% (0.91).
Contributions: They have thoroughly examined many deep learning architecture families, training plans, and ensemble techniques to automatically identify AMD that is referable or non-referable and to provide a severity rating based on retinal fundus images. Convolutional-based designs outperform transformer-based architectures in this situation, despite the former’s encouraging findings when working with real images. Using a progressive scaling technique, where the models are trained with large images at first and then with smaller images, enhanced the performance of convolutional architectures. Binary models that are taught to grade AMD severity scales perform comparably to models that are just trained to diagnose referable versus non-referable AMD. Lastly, they may use test-time augmentation to further improve the performance of binary and multi-class convolutional-based models; however, other techniques such as the use of classical model ensembles or the cascade of models only serve to enhance the output of multi-class models.
Gholami et al. [
108] utilized OCT image data from three research datasets to propose a federated learning (FL) approach for training deep learning models such as ResNet18 and ViT to identify AMD. They addressed domain shift concerns across institutions by integrating four domain adaptation algorithms. Their study demonstrated that FL techniques enabled competitive performance similar to centralized models, even with local models trained on subset data. The Adaptive Personalisation FL method consistently performed well across tests. The research underscored the effectiveness of simpler architectures in image classification tasks, particularly for privacy-preserving decentralized learning. Further investigation into deeper models and FL techniques was recommended for a comprehensive performance assessment. On test sets, ResNet18 achieved 94.58% ± 0.62 accuracy on the Kermany dataset, while ViT achieved 98.18% ± 0.55 and 99.11% ± 0.39 on the Srinivasan and Li datasets, respectively. These findings highlight FL’s critical role in healthcare analytics, ensuring patient privacy and enabling insights from distributed data.
Contributions: Utilizing three different datasets, this study conducted a thorough set of tests to compare the efficacy of deploying DL models utilizing local, centralized, and FL approaches. Using ResNet18 and ViT encoders, the main goal was to classify OCT images into two binary categories: Normal and AMD. In addition, they incorporated four distinct DA techniques into the FL approach to address the widespread problem of domain shift.
Limitations: They used an aggregation policy based only on a weighted average and a quite simple DL architecture throughout the training phase. Subsequent initiatives will investigate more complex aggregation policies. Despite these limitations, their findings enhance our understanding of FL techniques and offer priceless insights into the relative effectiveness of simpler structures for image classification tasks. We hope that exploring further FL techniques in future studies will provide more light on the subtleties of these models’ functionality. Another area that warrants attention is the distinct classification head, where amplitude normalization and intelligent weight aggregation strategy may increase FL network efficiency. Finally, exploring more complex multi-layer perceptron designs and extra transformer blocks in deeper models like ResNet50, ResNet101, or ViTs may change performance dynamics and provide new insights.
Retinal ViT is a revolutionary vision transformer design that integrates the self-attention mechanism into the area of medical image analysis employing fundus images, as reported by Wang et al. [
109]. The last component of the proposed model was a feed-forward network design multi-label classifier. This classifier used a Sigmoid activation function with two layers. On the publicly available dataset ODIR-2019, the experimental results demonstrate that the suggested model performs better than state-of-the-art methods like ResNet, VGG, DenseNet, and MobileNet. The proposed method also outperforms the state-of-the-art algorithms in terms of F1 score (0.932 ± 0.06) and AUC (0.950 ± 0.03).
Contributions: A unique vision transformer model was proposed to address the problem of multi-label categorization of retinal images. The suggested method may classify retinal images into a total of eight groups. This study demonstrated that transformer-based models can outperform CNN-based models in terms of performance. It should be noted that the suggested deep learning model’s utilized attention mechanism, which aims to identify the global correlations between distant pixels, may be responsible for this.
Limitations: The unbalanced problem with the utilized dataset was not taken into account in this investigation. The DR (D), normal (N), and other abnormalities (O) categories in the ODIR-2019 dataset contain a much higher number of images than the other five classifications combined. Consequently, the performance of the suggested method may be impacted by the dataset’s unequal distribution. Second, the original vision transformer served as the basis for the given deep model. The key change made to the original vision transformer is located at the output layer to accommodate the need for multi-label classification. To obtain a more precise outcome, the inner workings of the vision transformer requirements should also be optimized. Lastly, the tests only used one particular dataset, which may not have been sufficient to demonstrate the generalizability of the suggested vision transformer design.
To automatically distinguish between AMD, DME, and normal eyes, Jiang et al. [
13] suggested a CAD technique that uses a vision transformer to analyze OCT images from Duke University. The size of the images was changed in this investigation, and the results were then normalized. Following the model pruning, the classification accuracy remained unchanged and the recognition time reached 0.010 s. Vision Transformer after pruning showed superior recognition ability when compared to CNN image classification models (VGG16, Resnet50, Densenet121, and EfficientNet). The findings indicated that the vision transformer was a better substitute for making more accurate diagnoses of retinal disorders. Regarding the identification speed of each image, VGG16 and Vision Transformer outperformed other CNN models in this regard. After pruning, the vision transformer’s single image recognition speed was the quickest, taking only 0.010 s to complete. Its identification accuracy stays at 99.69% and it operates at a pace greater than any other model. When it comes to identification speed and accuracy, the vision transformer after trimming outperforms CNN models in identifying fundus OCT images.
Contributions: The study showed that vision transformers were the best at recognition and that their capacity to recognize objects quickly and accurately was unaffected by pruning, which indicates that there will be no missed diagnosis in practical applications. This research proposes a model that can more accurately represent the CAD of retinal disorders.
Table 6 summarizes the discussed DL-related studies.
3.4.2. Machine Learning Techniques (Handcrafted Features)
Various ML algorithms are utilized to classify fundus images of AMD by identifying patterns and features within the data. These algorithms fall under the umbrella of ML, which encompasses techniques employed across diverse sectors to address various issues [
10,
11,
110,
111,
112]. Learning algorithms can be categorized into two main domains based on the type of knowledge they acquire. Supervised learning involves explicit information or direct human involvement, while unsupervised or semi-supervised learning entails the system determining target-related patterns autonomously to varying degrees [
33,
113]. Over the past decade, computer vision applications like image classification and object identification have relied on handcrafted features. The complexity and quantity of these features have increased over time to better address the tasks at hand. To assess the performance of feature-based systems employing an ensemble of multiple features, various feature-based techniques are considered [
114].
Fraccaro et al. [
114] developed models to diagnose AMD using interpretable white box methods such as decision trees and logistic regression, as well as less interpretable black box methods like AdaBoost, random forests, and support vector machines (SVM). They utilized an Electronic Health Record (EHR) system in Genoa, Italy, collecting data during routine visits from patients with AMD, other retinal disorders, and healthy participants to detect AMD or other macular diseases. Patient demographics and clinical symptoms associated with AMD were recorded, including subretinal fibrosis, subretinal fluid, macula thickness, depigmentation area, subretinal hemorrhage, soft Drusen, retinal pigment epithelium defects/pigment mottling, and macular scar. The study included 487 patients (912 eyes), with SVM and decision trees performing at a mean of 0.90, and AdaBoost, logistic regression, and random forests at a mean performance of 0.92. Age and soft Drusen were identified as the most discriminating elements in diagnosing AMD by clinicians. They found a potential trade-off between performance and complexity, noting that the relationship between increased complexity and better performance is not always consistent. Even though a single variable could not lead to successful diagnosis, decision trees (average AUC of 90%) and logistic regression (average AUC of 92%) performed comparably to random forests.
Contributions: The study provided the identification of age and soft Drusen as the most discriminating elements in diagnosing AMD and provided valuable insights for clinicians. Furthermore, they highlighted the potential trade-off between model performance and complexity, emphasizing that increased complexity does not always guarantee better performance.
Limitations: The study revealed limitations in precisely identifying all consistent sets of diagnostic pathways, leading to occasional ambiguity in diagnostic decisions. Nonetheless, they that an automated system incorporating longitudinal data and new variables could help overcome these limitations, enabling the detection of different decision paths and facilitating timely diagnosis, particularly in distinguishing ambiguous patient subsets.
Fundus images are utilized to differentiate between normal and AMD classes, including Early AMD, Intermediate AMD, and Late AMD, as described by Mookiah et al. [
12]. Features such as Pattern Occurrence (PO) and Linear Configuration Coefficients (CC) were extracted from these images. These features were then inputted into various supervised classifiers such as SVM. The features are sorted using the
t-test
p-value to categorize the normal and AMD classes. The system’s performance was evaluated using ten-fold cross-validation on Automated Retinal Image Analysis (ARIA) and Structured Analysis of the Retina (STARE) datasets obtained from Kasturba Medical Hospital in Manipal, India. The private dataset achieved an accuracy of 93.52%, sensitivity of 91.11%, and specificity of 95.93%, while the ARIA dataset achieved an accuracy of 91.36%, sensitivity of 92.18%, and specificity of 90.00%. The proposed approach yielded the best results for the STARE dataset using 22 significant features, with an average accuracy of 97.78%, specificity of 97.50%, and sensitivity of 98.00%.
Contributions: By extracting features and employing t-test p-value sorting, the study achieved impressive classification accuracy. Evaluation on both ARIA and STARE datasets demonstrated high performance, with the STARE dataset yielding particularly excellent results, indicating the method’s potential for aiding clinicians in AMD diagnosis during mass eye screening programs.
Limitations: The study has some limitations. Firstly, while the classification accuracy is high, the reliance on supervised classifiers like SVM may limit the model’s ability to generalize to unseen data or populations with different characteristics. Moreover, the study does not provide comprehensive insight into the interpretability of the extracted features, which could hinder clinicians’ understanding and trust in the diagnostic process. Finally, the proposed approach may require further validation and refinement before widespread adoption in clinical settings, considering factors such as scalability, cost-effectiveness, and integration with existing diagnostic workflows.
Phan et al. [
111] proposed an automatic classification method for AMD to differentiate between Non-AMD, Early, Moderate, and Advanced stages in a telemedicine setting, ensuring robustness and reproducibility. Initially, a study was conducted using color, texture, and visual context in fundus images to identify the most critical aspects for AMD classification. The AREDS protocol was followed, employing a random forest and an SVM to evaluate feature importance and categorize images based on different AMD stages. The experiments utilized a database of 279 fundus photographs from a telemedicine platform. The results showed that local binary patterns in multiresolution were the most significant for AMD classification, irrespective of the classifier used. The technique exhibited promising performances across various classification tasks, with areas under the ROC curve for screening ranging from 0.739 to 0.874 and for grading from 0.469 to 0.685.
Contributions: By utilizing color, texture, and visual context in fundus images and following the AREDS protocol, they identified local binary patterns in multiresolution as the most critical aspect for AMD classification, regardless of the classifier used. The technique demonstrated promising performances across classification tasks, showing reliability in image quality and superior discriminating power of LBP features.
Limitations: The study’s reliance on a database of 279 fundus photographs from a telemedicine platform raises concerns about generalizability, emphasizing the need for further validation on a larger and more diverse sample size to fully validate their findings. Nonetheless, their proposed method represents a significant step forward in providing a reliable diagnostic tool for AMD in clinical settings and patient tracking.
Alfahaid et al. [
40] classified OCTA images from different retinal layers as having wet AMD or normal control using a KNN classifier using rotation-invariant uniform local binary pattern texture characteristics calculated on 184 2D OCTA images (92 AMD, 92 healthy). Manchester Royal Eye Hospital donated images utilized in this investigation. Both the individual retinal layers and the collective layers underwent the categorization process. The algorithm performed admirably, as the mean accuracy of 89% for all layers combined and 98% for the outer, 89% for superficial, 94% for deep, and 100% for choriocapillaris layers.
Contributions: The study’s most significant finding is that employing the local texture features based on the LBPsriu2 descriptor in the classification using OCTA images achieved outstandingly accurate results. The algorithm achieved promising results in processing all layers together and each layer separately. The highest accuracies were achieved with the outer and choriocapillaris layers since the deformities of blood vessels on these layers are very clearly observable. Four different retinal layers are tested in this study and the main purpose of using the various layers is to identify the layer that has the most discriminative information which describes the abnormal blood vessel patterns in wet AMD cases.
Limitations: The dataset that was employed was not very large. Extensive experiments should be conducted to determine the ideal value of K neighbors, and its reliability for use in clinical ophthalmology should be assessed. It should also be tested on a sizable dataset to determine its efficacy and strength when a larger dataset is used, and its ability to determine the severity of the disease should be tested using a dataset that includes AMD cases at different stages.
Wang et al. [
10] introduced a computer-aided diagnosis (CAD) model to differentiate between AMD, DME, and healthy macula using OCT data. The study utilized a publicly available OCT dataset from Duke University, Harvard University, and the University of Michigan. OCT images were analyzed using features based on linear configuration pattern (LCP) with correlation-based feature subset (CFS) selection. Various classification algorithms were employed, including neural network multi-layer perceptron (BP), quadratic programming-based sequential minimal optimization (SMO), SVM with polynomial kernel, logistic regression (LR), Naive Bayes, and Random Forest (RF) with J48 decision tree. The optimal model, based on the SMO approach, achieved an accuracy of 99.3% for each of the three sample classes. SMO surpassed BP and LR in specificity and sensitivity, ranking second in terms of AUC performance with a slight reduction. It excelled in AMD samples with an accuracy of 97.8% and ranked second in DME samples. LR performed well in detecting DME samples with an overall accuracy of 94.3%.
Contributions: Utilizing publicly available datasets from reputable institutions, the study employed a comprehensive approach, utilizing features based on LCP with CFS selection. The study achieved remarkable accuracy with the SMO approach leading with impressive metrics for each of the three sample classes.
Limitations: The high accuracy achieved may raise questions about overfitting, especially given the relatively small dataset utilized. Furthermore, while the SMO approach excelled in specificity and sensitivity, there was a slight reduction in AUC performance compared to other models. Additionally, while the study demonstrates the importance of selecting relevant features and enhancing feature extraction efficiency, the generalizability of the model to diverse datasets and clinical settings remains to be validated.
Nugroho et al. [
115] compared the effectiveness of deep neural network features with handcrafted characteristics. The study utilized OCT images from the investigation by Kermany et al., categorizing scans into Normal, DME, Drusen (early stages of AMD), and CNV. Feature extractors included DenseNet-169, ResNet50, Local Binary Pattern (LBP), and Histogram of Oriented Gradient (HOG). A perceptron neural network without a hidden layer (logistic regression) served as the evaluated classifier, adequate for benchmarking with a linear classifier. Classifiers trained on features from deep neural networks performed the best, achieving 89% accuracy for ResNet and 88% for DenseNet, compared to 50% for HOG and 42% for LBP. The classifiers’ F1-score and LBP Feature Precision Recall were 0.23 and 0.42, respectively, while HOG features yielded 0.56 and 0.50, respectively. ResNet50 features scored 0.91, 0.89, and 0.89, and DenseNet-169 features scored 0.90, 0.88, and 0.88. Deep neural network-based techniques also demonstrated better results for underrepresented classes.
Contributions: By comparing deep neural network features with handcrafted characteristics, the study provided valuable insights into the performance of various classifiers. Their outcome suggested that both deep neural network-based approaches provide superior feature sets for OCT image classification compared to HOG and LBP methods.
Limitations: The study revealed limitations. While deep neural network-based methods showed superior performance, they may require larger datasets and computational resources for training compared to handcrafted feature extraction techniques like LBP and HOG. Additionally, despite the promising accuracy rates, further validation on diverse datasets and clinical settings is necessary to assess the generalizability of the findings. Moreover, the study’s focus on OCT images may limit its applicability to other imaging modalities or multi-modal approaches, which could provide complementary information for disease diagnosis and monitoring.
Hussain et al. [
11] proposed a classification approach for automatically identifying individuals with DME or AMD using retinal characteristics from Spectral Domain Optical Coherence Tomography (SD-OCT) images. SD-OCT images were obtained from four sources: Tian et al., Duke University, New York University (NYU), and Centre for Eye Research Australia (CERA). Retinal parameters such as retinal thickness and distinct retinal layers, as well as the volume of diseases like Drusen and hyper-reflective intra-retinal spots, were utilized in the classification process. Ten clinically significant retinal characteristics were automatically extracted for classification by segmenting individual SD-OCT images. The efficacy of the retrieved characteristics was assessed on 251 participants (59 normal, 177 AMD, and 15 DME) using various classification techniques, including Random Forest. Fifteen-fold cross-validation tests were performed for three phenotypes—DME, AMD, and normal cases—using these datasets. The Random Forest classification method achieved an accuracy of over 95% for each dataset, and when trained as a two-class problem consisting of normal and pathological eyes, the system yielded an accuracy of over 96%. Each dataset also yielded an area under the receiver operating characteristic curve (AUC) value of 0.99.
Contributions: Their approach utilized retinal parameters obtained from multiple sources and utilized advanced segmentation techniques to extract ten clinically significant retinal characteristics for classification. Their findings revealed a strong correlation between the thicknesses of the layers and the projected diseases.
Limitations: The approach may have limitations, particularly in cases of severe diseases where even state-of-the-art segmentation techniques may miss retinal layers. This suggests the need for further investigation and refinement, particularly in enhancing sensitivity to improve outcomes in such scenarios.
Li et al. [
23] introduced three integration frameworks, including the innovative ribcage network (RC Net), which effectively combines manually crafted features with deep learning approaches such as VGG, DenseNet, and Xception. By demonstrating the effectiveness of adding handcrafted features like Gabor and scale-invariant feature transform (SIFT) in improving deep network classification accuracy, the study highlights the potential for enhancing DNN performance, particularly in scenarios with limited training data. The experimental results indicated that RC Net outperforms other integration techniques, achieving impressive accuracy, sensitivity, and specificity metrics. Notably, the study found that early integration techniques perform comparably to late integration methods but with reduced computational complexity, suggesting practical advantages in terms of computing time and model parameters. Additionally, the use of dense blocks and a sum operation in RC Net instead of convolutional blocks and concatenation contributes to parameter efficiency and network performance enhancement.
Contributions: The work significantly advanced eye disease categorization by introducing integration frameworks that combine deep learning and handcrafted features, notably proposing the RC Net for feature integration, which outperformed other techniques utilizing dense blocks and a sum procedure, RC Net achieved high accuracy, sensitivity, and specificity. The study emphasized the importance of incorporating handcrafted features to enhance DNN performance, especially with limited training data. Additionally, it highlighted the efficiency of early integration techniques in reducing computing time and parameters.
Limitations: The retrospective nature of the OCT image dataset may introduce biases, and the generalizability of the findings to other populations or imaging modalities needs further exploration. Moreover, while the study demonstrates the efficacy of integrating handcrafted features, the specific selection and combination of these features may require further validation across diverse datasets and disease categories. Overall, Li et al.’s work provides valuable insights into the integration of deep and handcrafted features for improved disease categorization, paving the way for further advancements in medical image analysis.
Govindaiah et al. [
110] proposed a method to improve prediction probabilities by combining five different ML and statistical algorithms (Random Forest, Naïve Bayes, Logistic model tree, Simple Logistic, Multilayer Perceptron) using data from the AREDS study, which included 566 participants, in each of the six models. Two independent subsystems were included in each prediction model to predict the late wet AMD and late dry AMD categories. ML models were developed to predict late AMD and category in a single visit, in 2, 5, and 10 years, using different combinations of genetic, sociodemographic (S-D)/clinical, and retinal imaging data. In their investigation, they employed a wide range of factors, many of which might be challenging to find in a related study conducted elsewhere for external validation. They used the “NAT-2” dataset from an AMD study to test a model based on retinal images, with encouraging outcomes. The inclusion of genetic, clinical, and sociodemographic variables during model training may enhance the overall efficacy of late AMD prediction models. Each model’s performance was evaluated using 10-fold cross-validation due to the limited dataset. The model yielded 72.9% accuracy, with 73.8% sensitivity and 72.7% specificity.
Contributions: The study demonstrated the potential for developing dependable prediction systems for late AMD in real-world settings. Furthermore, the finding that imaging alone can accurately predict late AMD for shorter timeframes without genome sequencing is noteworthy, indicating practical implications for early detection and intervention.
Limitations: The dataset used for training and evaluation is relatively small, which may affect the generalizability of the findings to broader populations. Additionally, while the models demonstrated promising accuracy, sensitivity, and specificity, there is room for improvement, particularly in predicting late AMD for longer durations and in distinguishing between dry and wet forms. Moreover, the reliance on retrospective data from the AREDS study and the NAT-2 dataset may introduce biases and limit the applicability of the models to more contemporary datasets. Finally, the study acknowledges the potential for enhancing predictions with spectral-domain optical coherence tomography (SD-OCT), suggesting avenues for future research to incorporate advanced imaging techniques for improved accuracy and reliability.
Table 7 summarizes the discussed ML-related studies.
3.4.3. Retinal Experts Classification
One of the main causes of permanent vision loss is AMD, which can be treated with injections of anti-vascular endothelial growth factor (VEGF) medicines to treat macular neovascularization (MNV). Several studies demonstrated the superiority of OCTA over other imaging methods to identify MNV (macular neovascularization) in eyes with macular degeneration.
For instance, Corvi et al. [
43] assessed the efficacy of OCTA in comparison to OCT, indocyanine green angiography (ICGA), and fluorescein angiography (FA), in detecting MNV in eyes suffering from atrophy. Multimodal imaging using FA, ICGA, structural OCT, and OCTA was performed on eyes with MNV and atrophy (also known as macular atrophy or MA) attributable to AMD, as well as AMD eyes with GA without MNV. Senior retina experts used all imaging modalities to determine the existence of MNV, which was regarded as the gold standard reference. Two professional readers then independently assessed each unique imaging modality for the existence of MNV. On the specially designed OCTA slab, the morphologic properties of the MNV were assessed. They enrolled 21 patients with GA alone and 21 with MA+MNV. High specificity (95.2%) and sensitivity (95.2%) were demonstrated by the manual segmentation on OCTA, which enabled the identification of the MNV in 4.7% of eyes with GA and in 95.2% of eyes with MA+MNV. In 57.1%, 52.3%, and 66.7% of eyes with MA+MNV and in 14.2%, 9.5%, and 42.8% of eyes with GA, FA, ICGA, and OCT identified MNV. For FA, the corresponding values were 85.7% and 57.1%, for ICGA, 90.5% and 52.4%, and for OCT, 66.7% and 57.1%. Their approach of using data from various modalities and having this assessed by a senior specialist is a fair choice in the lack of a histopathological reference.
Contributions: The study demonstrated the ability of OCTA to detect MNV in eyes with atrophy compared with fluorescein angiography (FA), ICGA, and OCT. This study has several strengths including its prospective design, use of multiple graders, and a masked grading process. The use of deeper penetrating SS-OCTA is also a strength. They also evaluated the morphological features of MNV associated with MA.
Limitations: There are some limitations to this study. To start, the sample size is modest. On the other hand, it might not happen very often for new MNV to grow in an eye that has previously experienced atrophy. This study’s cohort was put together with a 50/50 mix of patients with and without MNV, which is another important drawback. Because of this, the group is “selected” and does not represent the overall clinic population of atrophy-affected eyes; as such, the results of the sensitivity, specificity, PPV, and NPV should be interpreted accordingly. A further plausible constraint on the research is the “gold” or reference standard. Histology would have been the perfect gold standard, but that is not feasible.
A procedure to assess and harmonize a cutting-edge nomenclature for reporting neovascular AMD data was developed by Spaide et al. [
116]. For neovascular AMD, they developed a consensus categorization. The elements of neovascular AMD were identified and divided into groups. The effects of macular neovascularization on disease were outlined. Three key documents were created: an explanation of AMD, a classification of the many forms of neovascular AMD, and a framework for a consensual nomenclature system. Creating a common set of terminology will make it easier to compare various patient groups and research projects. The framework that has been given is easily updated and modified, a process that is expected to happen periodically. The development of OCT and OCT angiography, in particular, which have allowed for extensive three-dimensional investigation of the vascular anatomic and topographic features inside neovascular AMD lesions, has proven essential in the suggested categorization. The standardization of AMD investigation and reporting will be enhanced by the use of recognized categories and nomenclature. Increased nomenclature clarity will improve AMD study reporting and comparability. Increased terminology accuracy will improve AMD study comparability and reporting.
Contributions: In addition to helping to enhance standardization and communication between researchers, doctors, and patients, the classification offers a way to group neovascularization and other lesion components in neovascular AMD. This new categorization makes research in the future more thorough and broadly applicable. It also aids in bringing definitions from clinical investigators and reading centers into harmony. The classification’s broad structure for other data to be added, such as analyses of the definitions’ repeatability and accuracy, to further improve it.
Limitations: This method has been criticized because cRORA and MNV are distinct illnesses with distinct pathophysiologies, and as such, they should not be compared. This staging system’s foundation came from a time when late AMD was associated with an exceptionally high risk of serious visual impairment. For example, if fluorescein angiography revealed a neovascular lesion, the patient was usually at high risk of significant vision loss. Predicting the risk of late disease progression rather than visual function was the aim. Patients with MNV are likely to have stabilization or improvement in their visual acuity with current treatment; thus, grading the disease’s severity—an evaluation of the entire impact of the illness on an organ, including both reversible and permanent effects—becomes necessary.
Optical coherence tomographic (OCT) angiography was used to assess eyes with AMD and high-risk features for CNV in Palejwala et al. [
117]. They demonstrated the usefulness of OCTA for early detection of CNV and identified early CNV (type I) in their series, which was challenging to identify using traditional FA and SD-OCT. Two (6%) of the 32 eyes had Type 1 CNV according to OCT angiography. Neither fluid on OCT nor leakage on fluorescein angiography was linked to the lesions. When nonexudative CNV lesions are present, they are difficult to distinguish with fluorescein angiography and OCT. The Casey Eye Institute’s retina clinics at Oregon Health and Science University in Portland, Oregon, served as a source for study participants.
Contributions: They assessed eyes with high-risk features for CNV and AMD. Nonexudative CNV lesions are challenging to distinguish using FA and OCT; however, OCTA can determine their existence.
Limitations: The data set used in this study is small, It should be tested on a sizable dataset to determine its efficacy and strength when a larger dataset is used.
Coscas et al. [
118] evaluated 80 eyes with wet AMD using OCTA scanning and found several CNV patterns. Their group further received routine multimodal imaging, which was based on FA, spectral domain OCT (SD-OCT), and ICGA, to see if treatment was required. They utilized 80 exudative AMD eyes that were identified with various forms of CNV: polyps linked to AMD, mixed Type I and II, Type I, Type II, and retinal angiomatous proliferation. The results of the OCTA were utilized to distinguish between two distinct patterns of CNV and the investigation was carried out at the Odeon Ophthalmology Centre in Paris, France. The results of several multimodal imaging and OCTA analyses were then compared to assess any potential relationships between the CNV aspect of OCTA and the treatment choice. Conventional multimodal imaging revealed a CNV lesion in 58 eyes (72.5%) classified as Group A (requiring treatment). Fifty-nine eyes (73.7%) had an OCTA lesion classified as Pattern I, while 21 eyes (26.3%) had a Pattern II lesion. The Pattern I CNV on OCTA and the cases Group A on conventional multimodal imaging corresponded with 94.9% accuracy. Additionally, a 90.5% correlation between Group B (non-treatment-requiring) cases on conventional multimodal imaging and Pattern II CNV on OCTA was calculated.
Contributions: To help patients with exudative AMD make treatment decisions, they compared OCTA with OCT, FA, and ICG. While OCT displays fluid buildup and its changes and fluorescein angiography is still the gold standard for detecting leakage, OCTA provides noninvasive monitoring of the CNV and supports every treatment decision made during the follow-up. Based on the availability of both functional and morphological information from a single OCT scan, the study proposes that OCTA is a valuable tool for noninvasive monitoring of the state of CNV in AMD. This kind of observation might be useful in directing treatment choices and assessing how well CNV responds to medication.
Limitations: The use of a prototype device, the small number of patients, the lack of a three-dimensional reconstruction of the CNV, and the qualitative assessment of OCTA and multimodal imaging results are the primary limitations of this work.
In their publication, de Carlo et al. [
119] detailed the features, sensitivity, and specificity of CNV detection by OCTA. Participants: 61 participants total—48 eyes of 43 people with CNV and 24 eyes of 18 subjects without CNV. There were 72 eyes in total. New England Eye Centre patients scanned with the OCTA system were evaluated. Patients whose OCTA revealed CNV were assessed to determine the features of the CNV, including appearance, size (measured using the largest linear dimension), and the existence of intraretinal and subretinal fluid. To determine the sensitivity and specificity of OCTA in identifying CNV using FA as ground truth, a second cohort of patients who had same-day OCTA and fluorescein angiography (FA) for suspected CNV was also analyzed concurrently. Thirty-one of the 48 eyes in the group exhibited CNV linked to neovascular AMD. When comparing OCTA to FA, the specificity of CNV detection was high (91%) while the sensitivity was low (4/8).
Contributions: By using OCTA, a doctor may noninvasively visualize CNV and potentially detect and guide treatment for the condition. When compared to FA, the specificity of CNV detection using OCTA appears to be high.
Limitations: This study’s modest sample size and primarily cross-sectional design are its limitations. Furthermore, in certain cases, the automated prototype program did not allow for the segmentation to eliminate choriocapillaris since the user could not manually modify the curvature lines designating the region of interest that was automatically recognized.
Table 8 summarizes the discussed experts-related studies.