Comprehensive Survey of OCT-Based Disorders Diagnosis: From Feature Extraction Methods to Robust Security Frameworks
Abstract
1. Introduction
1.1. Optical Coherence Tomography
1.2. Feature Extraction Techniques
1.3. Other Survey Literature on OCT
- Provides a systematic review of the existing methods of feature extraction from OCT images, categorizing them into hand-crafted and deep learning-based approaches:
- Evaluates these methods against various performance metrics, accuracy, precision, sensitivity, specificity, and F1 score.
- Evaluates and highlights the evolution from using hand-crafted features to using deep learning techniques like CNNs and transformers in enhancing feature extraction from OCT images.
- Assesses the impact of dataset choice on the performance of feature extraction methods.
- Explores the emerging field of adversarial conditions in medical imaging, particularly in OCT, to propose future directions for research that could lead to more robust, accurate, and clinically relevant feature extraction technologies.
2. Review of OCT Datasets for Ocular Disorder Classification
2.1. OCT Datasets Details
2.2. Dataset Bias, Imaging Heterogeneity, and Domain Shift
3. Hand-Crafted Feature Extraction Techniques
4. Deep Learning Approaches
4.1. CNNs
4.2. CNN with Attention
4.3. CNN Ensembles and Multi-Scale
4.4. CNN Augmentations
4.5. Transformers
5. Comparative Analysis
5.1. Accuracy Comparisons Based on Datasets
5.2. Task Relevance, Data Dependency, Interpretability, or Robustness
- Task relevance assesses the model’s effectiveness in addressing specific diagnostic objectives, such as detecting age-related macular degeneration (AMD) or differentiating between multiple ocular diseases.
- Data dependency examines the scale and complexity of the datasets required for training, which directly impacts model generalizability in data-constrained settings.
- Interpretability considers the extent to which the model’s decision-making process can be understood and trusted by clinicians, fostering confidence and facilitating clinical integration.
- Lastly, robustness evaluates a model’s resilience to variations in input data, including noise, artifacts, and rare disease presentations, ensuring reliable performance in diverse clinical environments.
6. Future Work
6.1. Enhancing Robustness Against Adversarial Attacks in Medical Imaging
- Lack of Comprehensive Threat Models for OCT: Most current research on adversarial attacks focuses on generic image attacks. Future work needs to develop OCT-specific adversarial attack models that account for the unique characteristics of OCT data (e.g., speckle noise, depth information, volumetric nature, specific retinal layers) and their clinical relevance. The subtle, structured nature of retinal pathologies requires highly targeted and clinically plausible perturbations.
- Real-world Applicability of Defenses: While adversarial training [87,88,89], feature fusion (MEFF [91]), multi-view classification with dissonance measures [92], and knowledge-guided training [93] show promise, their effectiveness, computational overhead, and generalizability in diverse, high-stakes clinical OCT environments are yet to be fully validated. A key challenge is developing defense mechanisms that are both effective against a wide range of unknown attacks and computationally feasible for rapid clinical deployment without hindering diagnostic speed.
- Robustness to Diverse Perturbations and Imaging Artifacts: Existing studies often focus on a limited set of synthetic noise types (Gaussian, Salt and Pepper, contrast degradation [83,84]) or specific attack methods (FGSM, PGD [85,86]). Future research must develop models resilient not only to a broader spectrum of adversarial perturbations but also to realistic, naturally occurring imaging artifacts (e.g., motion blur, saturation, poor fixation, operator-dependent variations) which can mimic adversarial effects in real-world clinical OCT acquisition.
- Quantifying and Certifying Robustness: A significant challenge is to establish rigorous and standardized metrics, benchmarks, and formal verification frameworks for quantifying and, ideally, certifying the adversarial robustness of OCT diagnostic models. This is critical for building clinician trust, facilitating regulatory approval, and ensuring patient safety.
- Impact of 3D Volumetric OCT: While [94] briefly mentions 3D MRI, the complex challenges of adversarial attacks on 3D volumetric OCT data, which contain significantly more information and structural dependencies than 2D images, are largely unexplored. Perturbing 3D data subtly across slices while maintaining clinical plausibility is a major technical hurdle.
- Cross-Domain Robustness: Models trained on specific OCT devices or patient populations may be vulnerable to domain shift. A challenge is to develop methods that ensure adversarial robustness across different OCT scanners, imaging protocols, and demographic groups, without requiring extensive re-training.
- Adversarial Training Data Generation: Generating sufficiently diverse and clinically relevant adversarial examples for robust training, especially for rare diseases or subtle pathological changes, remains a challenge. Synthetic data generation and advanced data augmentation techniques could play a role but need careful validation. Table 9 summarizes the techniques discussed above.
6.2. Incorporating Large Language Models (LLMs) in Ophthalmic Diagnostics
- Multimodal Integration Challenges and Architectures: A key challenge lies in effectively and efficiently integrating visual features extracted from OCT images (which are often high-dimensional and complex) with the textual reasoning capabilities of LLMs. This requires developing sophisticated, robust, and often computationally expensive architectures that can seamlessly fuse image embeddings with patient history, symptoms, clinical notes, and potentially even genetic data, enabling holistic diagnostic reasoning beyond current capabilities.
- Interpretability and Explainability of LLM-driven Diagnostics: While LLMs can generate rich textual explanations, ensuring their reasoning aligns with medical best practices, is medically sound, and is truly interpretable by clinicians is crucial. The challenge is to develop methods where LLMs can not only make predictions but also provide transparent, evidence-based, and auditable justifications that clinicians can trust, verify, and use for their own decision-making process. This includes pinpointing specific image regions or textual elements contributing to a diagnosis.
- Handling Medical Nuances, Context, and “Hallucinations”: LLMs, trained on vast general text corpora, may struggle with the subtle nuances, specific terminology, rare disease presentations, and context-dependent reasoning inherent in complex medical cases. Ensuring LLMs do not “hallucinate” clinically inaccurate information or provide misleading medical advice remains a critical safety and ethical challenge, particularly in a field where slight misinterpretations can have severe consequences.
- Data Privacy and Security for Sensitive PHI: The use of LLMs, especially large-scale cloud-based models, raises significant data privacy and security concerns when handling sensitive patient health information (PHI). Future research must explore secure, privacy-preserving LLM integration methods, potentially involving privacy-preserving fine-tuning, on-device processing, or federated learning approaches, to comply with strict medical regulations (e.g., HIPAA, GDPR).
- Validation and Benchmarking for Multimodal Systems: Robust and clinically meaningful validation frameworks are needed to rigorously evaluate LLM-integrated diagnostic systems on diverse, real-world OCT datasets and patient cohorts. This requires developing new benchmarks for multimodal reasoning tasks that go beyond simple classification accuracy, assessing their reliability, clinical utility, and the consistency of their explanations across various ocular disorders and patient demographics.
- Integration into Clinical Workflow: A significant practical challenge is the seamless and user-friendly integration of LLM-powered diagnostic tools into existing clinical workflows without overburdening clinicians or disrupting established practices. This includes intuitive interfaces and efficient information flow.
- Computational Resources and Accessibility: Deploying and fine-tuning large LLMs, especially for specialized medical tasks, often requires substantial computational resources. Making these powerful tools accessible and scalable for clinics and research institutions with limited resources is a practical challenge.
- Ethical, Regulatory, and Accountability Considerations: The deployment of LLM-driven diagnostic tools necessitates careful consideration of complex ethical implications (e.g., bias amplification, patient autonomy), navigating stringent regulatory guidelines, and establishing clear accountability frameworks in cases of misdiagnosis or adverse outcomes. This involves defining the roles of AI and human clinicians.
6.3. Proposals for Future Research Directions
- Robustness to Diverse Noise and Degradation: Future research in OCT disorder prediction must prioritize the inclusion of OCT images corrupted by various types of noise (e.g., Gaussian, Salt and Pepper, uniform, speckle, Rayleigh noise, as shown in Figure 5) and realistic clinical artifacts. Incorporating these into training and validation datasets is crucial for rigorously assessing the robustness of deep learning models under less-than-ideal, real-world clinical conditions. Furthermore, LLMs could be explored to assist in identifying and characterizing different types of noise, enabling automated and adaptive preprocessing techniques. This approach could complement traditional noise reduction strategies by providing more precise noise recognition and guiding targeted denoising, thereby leading to enhanced model performance.
- Adversarial Testing and Defense Frameworks for OCT: Another promising direction involves the systematic incorporation of adversarial testing into OCT feature extraction and classification frameworks. Methods and frameworks designed to test the resilience of OCT models against a broad spectrum of adversarial attacks, including those specific to 3D OCT, are essential. This includes developing preprocessing techniques specifically tailored to detect and remove adversarial samples. These techniques might involve advanced adversarial training, where models are explicitly exposed to and learn to defend against diverse adversarial examples during training, or utilizing robust denoising autoencoders to filter out subtle perturbations before inference. By proactively addressing the challenge of adversarial robustness, future OCT-based AI models can be made significantly more reliable, maintaining high accuracy and sensitivity even under adverse clinical conditions.
- Data Dependency and Availability: While CNNs excel with large datasets, their reliance on extensive, high-quality annotated datasets remains a significant limitation in clinical settings, where data acquisition is often costly, time-consuming, and prone to scarcity.
- Dataset Bias, Imaging Heterogeneity, and Domain Shift: Inherent biases within datasets (e.g., class imbalance), variations introduced by different OCT device types and acquisition protocols (imaging heterogeneity), and performance degradation when models are deployed in new clinical environments (domain shift) profoundly impact model generalizability and reliability. Addressing these factors is paramount for clinical translation.
- Interpretability and Trust: Despite high accuracy, the “black-box” nature of many deep learning models makes it challenging for clinicians to understand how a diagnosis is reached. Enhancing model interpretability is vital for fostering trust and facilitating clinical adoption.
- Integration of Multimodal Information and Reasoning: The effective incorporation of advanced models like Large Language Models (LLMs) alongside image analysis presents both an opportunity and a significant challenge. Successfully fusing visual OCT data with textual clinical information (patient history, symptoms) and enabling complex clinical reasoning is crucial for comprehensive diagnostic support, yet it introduces complexities in model architecture, training, and validation.
- Explainable AI (XAI) for LLM-integrated Multimodal OCT Systems: Future work should focus on developing advanced XAI techniques tailored explicitly for multimodal LLM-integrated OCT diagnostic systems. This involves not only generating comprehensive textual explanations of diagnostic reasoning but also creating intuitive visual saliency maps from the OCT images, directly linked to the LLM’s decision-making pathways. The goal is to provide clinicians with transparent, verifiable, and actionable diagnostic insights, bridging the gap between AI predictions and clinical understanding.
- Federated Learning and Privacy-Preserving AI for OCT: Given data privacy concerns, exploring federated learning approaches for training robust OCT diagnostic models across multiple institutions without centralizing sensitive patient data is crucial. This extends to developing privacy-preserving techniques for LLM integration, such as differential privacy or secure multiparty computation, to ensure that patient data remains protected while leveraging collective data insights.
- Longitudinal OCT Data Analysis and Prognosis with LLMs: Expanding beyond single-time-point diagnosis, future research should explore how LLMs can integrate sequential OCT scans and patient history to predict disease progression, treatment response, and long-term prognosis. This involves challenges in handling time-series data and complex patient narratives.
7. Conclusions
- Validate the potential of automating OCT image interpretation, enhancing ocular diagnostics, improving patient outcomes, and optimizing clinical decision-making and healthcare practices.
- Highlight the critical role of OCT in refining patient-specific therapeutic approaches, guiding clinicians toward increasingly personalized post-infarction therapy.
- Lead to the optimization of antithrombotic, lipid-lowering, and, when necessary, anti-inflammatory therapies, further improving patient clinical management.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Viedma, I.A.; Alonso-Caneiro, D.; Read, S.A.; Collins, M.J. Collins, Deep learning in retinal optical coherence tomography (OCT): A comprehensive survey. Neurocomputing 2022, 507, 247–264. [Google Scholar] [CrossRef]
- Usman, M.; Fraz, M.M.; Barman, S.A. Computer Vision Techniques Applied for Diagnostic Analysis of Retinal OCT Images: A Review. Arch. Comput. Methods Eng. 2017, 24, 449–465. [Google Scholar] [CrossRef]
- Meiburger, K.M.; Salvi, M.; Rotunno, G.; Drexler, W.; Liu, M. Automatic Segmentation and Classification Methods Using Optical Coherence Tomography Angiography (OCTA): A Review and Handbook. Appl. Sci. 2021, 11, 9734. [Google Scholar] [CrossRef]
- Pan, L.; Chen, X. Retinal OCT Image Registration: Methods and Applications. IEEE Rev. Biomed. Eng. 2023, 16, 307–318. [Google Scholar] [CrossRef]
- Elsharkawy, M.; Elrazzaz, M.; Ghazal, M.; Alhalabi, M.; Soliman, A.; Mahmoud, A.; El-Daydamony, E.; Atwan, A.; Thanos, A.; Sandhu, H.S.; et al. Role of Optical Coherence Tomography Imaging in Predicting Progression of Age-Related Macular Disease: A Survey. Diagnostics 2021, 11, 2313. [Google Scholar] [CrossRef] [PubMed]
- Bharuka, R.; Mhatre, D.; Patil, N.; Chitnis, S.; Karnik, M. A Survey on Classification and Prediction of Glaucoma and AMD Based on OCT and Fundus Images. In Proceedings of the International Conference on Mobile Computing and Sustainable Informatics (ICMCSI 2020), Lalitpur, Nepal, 23–24 January 2020; Raj, J.S., Ed.; EAI/Springer Innovations in Communication and Computing; Springer: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
- Kiefer, R.; Steen, J.; Abid, M.; Ardali, M.R.; Amjadian, E. A Survey of Glaucoma Detection Algorithms using Fundus and OCT Images. In Proceedings of the 2022 IEEE 13th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 12–15 October 2022; pp. 191–196. [Google Scholar] [CrossRef]
- Naveed, M.; Ramzan, A.; Akram, M.U. Clinical and technical perspective of glaucoma detection using OCT and fundus images: A review. In Proceedings of the 2017 1st International Conference on Next Generation Computing Applications (NextComp), Mauritius, 19–21 July 2017; pp. 157–162. [Google Scholar] [CrossRef]
- Ran, A.R.; Tham, C.C.; Chan, P.P.; Cheng, C.-Y.; Tham, Y.-C.; Rim, T.H.; Cheung, C.Y. Deep learning in glaucoma with optical coherence tomography: A review. Eye 2021, 35, 188–201. [Google Scholar] [CrossRef] [PubMed]
- Akpinar, M.H.; Sengur, A.; Faust, O.; Tong, L.; Molinari, F.; Acharya, U.R. Rajendra Acharya, Artificial Intelligence in Retinal Screening Using OCT Images: A Review of the Last Decade (2013–2023). Comput. Methods Programs Biomed. 2024, 254, 108253. [Google Scholar] [CrossRef]
- Nugroho, K.A. A Comparison of Handcrafted and Deep Neural Network Feature Extraction for Classifying Optical Coherence Tomography (OCT) Images. In Proceedings of the 2018 2nd International Conference on Informatics and Computational Sciences (ICICoS), Semarang, Indonesia, 30–31 October 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Naren, O.S. Retinal OCT-C8. 2021. Available online: https://www.kaggle.com/datasets/obulisainaren/retinal-oct-c8 (accessed on 1 January 2024).
- Dash, S.K.; Sethy, P.K.; Das, A.; Jena, S.; Nanthaamornphong, A. Advancements in Deep Learning for Automated Diagnosis of Ophthalmic Diseases: A Comprehensive Review. IEEE Access 2024, 12, 171221–171240. [Google Scholar] [CrossRef]
- Song, D.; Fu, B.; Li, F.; Xiong, J.; He, J.; Zhang, X.; Qiao, Y. Deep Relation Transformer for Diagnosing Glaucoma with Optical Coherence Tomography and Visual Field Function. IEEE Trans. Med. Imaging 2021, 40, 2392–2402. [Google Scholar] [CrossRef]
- Gholami, P.; Sheikhhassani, M.; Zelek, J.S.; Lakshminarayanan, V.; Parthasarathy, M.K.; Azar, F.S.; Intes, X. Classification of optical coherence tomography images for diagnosing different ocular diseases. In Proceedings of the SPIE 10487, Multimodal Biomedical Imaging XIII, Francisco, CA, USA, 16 March 2018; p. 1048705. [Google Scholar] [CrossRef]
- Yu, Y.-W.; Lin, C.-H.; Lu, C.-K.; Wang, J.-K.; Huang, T.-L. Automated Age-Related Macular Degeneration Detector on Optical Coherence Tomography Images Using Slice-Sum Local Binary Patterns and Support Vector Machine. Sensors 2023, 23, 7315. [Google Scholar] [CrossRef]
- Lemaître, G.; Rastgoo, M.; Massich, J.; Cheung, C.Y.; Wong, T.Y.; Lamoureux, E.; Milea, D.; Mériaudeau, F.; Sidibé, D. Classification of SD-OCT Volumes Using Local Binary Patterns: Experimental Validation for DME Detection, Hindawi. J. Ophthalmol. 2016, 2016, 3298606. [Google Scholar] [CrossRef] [PubMed]
- Alsaih, K.; Lemaitre, G.; Vall, J.M.; Rastgoo, M.; Sidibe, D.; Wong, T.Y.; Lamoureux, E.; Milea, D.; Cheung, C.Y.; Meriaudeau, F. Classification of SD-OCT volumes with multi pyramids, LBP and HOG descriptors: Application to DME detections. In Proceedings of the 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, USA, 16–20 August 2016; pp. 1344–1347. [Google Scholar] [CrossRef]
- Liew, A. Multi-kernel Wiener local binary patterns for OCT ocular disease detections with resiliency to Gaussian noises. In Proceedings of the SPIE 13033, Multimodal Image Exploitation and Learning 2024, National Harbor, MD, USA, 7 June 2024; p. 130330H. [Google Scholar] [CrossRef]
- Kayadibi, I.; Güraksın, G.E.; Köse, U. A Hybrid R-FTCNN based on principal component analysis for retinal disease detection from OCT images. Expert Syst. Appl. 2023, 230, 120617. [Google Scholar] [CrossRef]
- Diao, S.; Su, J.; Yang, C.; Zhu, W.; Xiang, D.; Chen, X.; Peng, Q.; Shi, F. Classification and segmentation of OCT images for age-related macular degeneration based on dual guidance networks. Biomed. Signal Process. Control 2023, 84, 104810. [Google Scholar] [CrossRef]
- Barua, P.D.; Chan, W.Y.; Dogan, S.; Baygin, M.; Tuncer, T.; Ciaccio, E.J.; Islam, N.; Cheong, K.H.; Shahid, Z.S.; Acharya, U.R. Multilevel Deep Feature Generation Framework for Automated Detection of Retinal Abnormalities Using OCT Images. Entropy 2021, 23, 1651. [Google Scholar] [CrossRef] [PubMed]
- Ji, Q.; He, W.; Huang, J.; Sun, Y. Efficient Deep Learning-Based Automated Pathology Identification in Retinal Optical Coherence Tomography Images. Algorithms 2018, 11, 88. [Google Scholar] [CrossRef]
- Alqudah, A.M. AOCT-NET: A convolutional network automated classification of multiclass retinal diseases using spectral-domain optical coherence tomography images. Med. Biol. Eng. Comput. 2020, 58, 41–53. [Google Scholar] [CrossRef] [PubMed]
- Fang, L.; Jin, Y.; Huang, L.; Guo, S.; Zhao, G.; Chen, X. Iterative fusion convolutional neural networks for classification of optical coherence tomography images. J. Vis. Commun. Image Represent. 2019, 59, 327–333. [Google Scholar] [CrossRef]
- Rajan, R.; Kumar, S.N. IoT based optical coherence tomography retinal images classification using OCT Deep Net2, Measurement. Sensors 2023, 25, 100652. [Google Scholar] [CrossRef]
- Tsuji, T.; Hirose, Y.; Fujimori, K.; Hirose, T.; Oyama, A.; Saikawa, Y.; Mimura, T.; Shiraishi, K.; Kobayashi, T.; Mizota, A.; et al. Classification of optical coherence tomography images using a capsule network. BMC Ophthalmol. 2020, 20, 114. [Google Scholar] [CrossRef]
- Bridge, J.; Harding, S.P.; Zhao, Y.; Zheng, Y. Dictionary Learning Informed Deep Neural Network with Application to OCT Images. In Ophthalmic Medical Image Analysis; Fu, H., Garvin, M., MacGillivray, T., Xu, Y., Zheng, Y., Eds.; OMIA 2019. Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2019; Volume 11855. [Google Scholar] [CrossRef]
- Shaker, F.; Baharlouei, Z.; Plonka, G.; Rabbani, H. Application of Deep Dictionary Learning and Predefined Filters for Classification of Retinal Optical Coherence Tomography Images. IEEE Access 2025, 13, 596–607. [Google Scholar] [CrossRef]
- Wang, X.; Tang, F.; Chen, H.; Luo, L.; Tang, Z.; Ran, A.-R.; Cheung, C.Y.; Heng, P.-A. UD-MIL: Uncertainty-Driven Deep Multiple Instance Learning for OCT Image Classification. IEEE J. Biomed. Health Inform. 2020, 24, 3431–3442. [Google Scholar] [CrossRef]
- Rasti, R.; Mehridehnavi, A.; Rabbani, H.; Hajizadeh, F. Automatic diagnosis of abnormal macula in retinal optical coherence tomography images using wavelet-based convolutional neural network features and random forests classifier. J. Biomed. Opt. 2018, 23, 1–10. [Google Scholar] [CrossRef]
- Fang, L.; Wang, C.; Li, S.; Rabbani, H.; Chen, X.; Liu, Z. Attention to Lesion: Lesion-Aware Convolutional Neural Network for Retinal Optical Coherence Tomography Image Classification. IEEE Trans. Med. Imaging 2019, 38, 1959–1970. [Google Scholar] [CrossRef]
- Mishra, S.S.; Mandal, B.; Puhan, N.B. Multi-Level Dual-Attention Based CNN for Macular Optical Coherence Tomography Classification. IEEE Signal Process. Lett. 2019, 26, 1793–1797. [Google Scholar] [CrossRef]
- Mishra, S.S.; Mandal, B.; Puhan, N.B. Perturbed Composite Attention Model for Macular Optical Coherence Tomography Image Classification. IEEE Trans. Artif. Intell. 2022, 3, 625–635. [Google Scholar] [CrossRef]
- Liu, X.; Bai, Y.; Cao, J.; Yao, J.; Zhang, Y.; Wang, M. Joint disease classification and lesion segmentation via one-stage attention-based convolutional neural network in OCT images. Biomed. Signal Process. Control 2022, 71 Part A, 103087. [Google Scholar] [CrossRef]
- Huang, X.; Ai, Z.; Wang, H.; She, C.; Feng, J.; Wei, Q.; Hao, B.; Tao, Y.; Lu, Y.; Zeng, F. GABNet: Global attention block for retinal OCT disease classification. Front. Neurosci. 2023, 17, 1143422. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Das, V.; Prabhakararao, E.; Dandapat, S.; Bora, P.K. B-Scan Attentive CNN for the Classification of Retinal Optical Coherence Tomography Volumes. IEEE Signal Process. Lett. 2020, 27, 1025–1029. [Google Scholar] [CrossRef]
- Abd Elaziz, M.; Mabrouk, A.; Dahou, A.; Chelloug, S.A. Medical Image Classification Utilizing Ensemble Learning and Levy Flight-Based Honey Badger Algorithm on 6G-Enabled Internet of Things. Comput. Intell. Neurosci. 2022, 2022, 5830766. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Hassan, B.; Hassan, T.; Li, B.; Ahmed, R.; Hassan, O. Deep Ensemble Learning Based Objective Grading of Macular Edema by Extracting Clinically Significant Findings from Fused Retinal Imaging Modalities. Sensors 2019, 19, 2970. [Google Scholar] [CrossRef]
- Das, V.; Dandapat, S.; Bora, P.K. Multi-scale deep feature fusion for automated classification of macular pathologies from OCT images. Biomed. Signal Process. Control 2019, 54, 101605. [Google Scholar] [CrossRef]
- Thomas, A.; Harikrishnan, P.; Ramachandran, R.; Ramachandran, S.; Manoj, R.; Palanisamy, P.; Gopi, V.P. A novel multiscale and multipath convolutional neural network based age-related macular degeneration detection using OCT images. Comput. Methods Programs Biomed. 2021, 209, 106294. [Google Scholar] [CrossRef]
- Thomas, A.; Harikrishnan, P.M.; Krishna, A.K.; Palanisamy, P.; Gopi, V.P. A novel multiscale convolutional neural network based age-related macular degeneration detection using OCT images. Biomed. Signal Process. Control 2021, 67, 102538. [Google Scholar] [CrossRef]
- Sotoudeh-Paima, S.; Jodeiri, A.; Hajizadeh, F.; Soltanian-Zadeh, H. Multi-scale convolutional neural network for automated AMD classification using retinal OCT images. Comput. Biol. Med. 2022, 144, 105368. [Google Scholar] [CrossRef]
- Akinniyi, O.; Rahman, M.M.; Sandhu, H.S.; El-Baz, A.; Khalifa, F. Multi-Stage Classification of Retinal OCT Using Multi-Scale Ensemble Deep Architecture. Bioengineering 2023, 10, 823. [Google Scholar] [CrossRef]
- Rasti, R.; Rabbani, H.; Mehridehnavi, A.; Hajizadeh, F. Macular OCT Classification Using a Multi-Scale Convolutional Neural Network Ensemble. IEEE Trans. Med. Imaging 2018, 37, 1024–1034. [Google Scholar] [CrossRef]
- Das, V.; Dandapat, S.; Bora, P.K. Automated Classification of Retinal OCT Images Using a Deep Multi-Scale Fusion CNN. IEEE Sens. J. 2021, 21, 23256–23265. [Google Scholar] [CrossRef]
- Rong, Y.; Xiang, D.; Zhu, W.; Yu, K.; Shi, F.; Fan, Z.; Chen, X. Surrogate-Assisted Retinal OCT Image Classification Based on Convolutional Neural Networks. IEEE J. Biomed. Health Inform. 2019, 23, 253–263. [Google Scholar] [CrossRef]
- Das, V.; Dandapat, S.; Bora, P.K. A Data-Efficient Approach for Automated Classification of OCT Images Using Generative Adversarial Network. IEEE Sens. Lett. 2020, 4, 7000304. [Google Scholar] [CrossRef]
- Das, V.; Dandapat, S.; Bora, P.K. Unsupervised Super-Resolution of OCT Images Using Generative Adversarial Network for Improved Age-Related Macular Degeneration Diagnosis. IEEE Sens. J. 2020, 20, 8746–8756. [Google Scholar] [CrossRef]
- Ma, Z.; Xie, Q.; Gao, X.; Zhu, J. HCTNet: A Hybrid ConvNet-Transformer Network for Retinal Optical Coherence Tomography Image Classification. Biosensors 2022, 12, 542. [Google Scholar] [CrossRef]
- He, J.; Wang, J.; Han, Z.; Ma, J.; Wang, C.; Qi, M. An interpretable transformer network for the retinal disease classification using optical coherence tomography. Sci. Rep. 2023, 13, 3637. [Google Scholar] [CrossRef]
- Playout, C.; Duval, R.; Boucher, M.C.; Cheriet, F. Focused Attention in Transformers for interpretable classification of retinal images. Med. Image Anal. 2022, 82, 102608. [Google Scholar] [CrossRef]
- Cai, L.; Wen, C.; Jiang, J.; Liang, C.; Zheng, H.; Su, Y.; Chen, C. Classification of diabetic maculopathy based on optical coherence tomography images using a Vision Transformer model. BMJ Open Ophthalmol. 2023, 8, e001423. [Google Scholar] [CrossRef]
- Hammou, B.A.; Antaki, F.; Boucher, M.-C.; Duval, R. MBT: Model-Based Transformer for retinal optical coherence tomography image and video multi-classification. Int. J. Med. Inform. 2023, 178, 105178. [Google Scholar] [CrossRef]
- Shen, J.; Hu, Y.; Zhang, X.; Gong, Y.; Kawasaki, R.; Liu, J. Structure-Oriented Transformer for retinal diseases grading from OCT images. Comput. Biol. Med. 2023, 152, 106445. [Google Scholar] [CrossRef]
- Wang, H.; Guo, X.; Song, K.; Sun, M.; Shao, Y.; Xue, S.; Zhang, H.; Zhang, T. OCTFormer: An Efficient Hierarchical Transformer Network Specialized for Retinal Optical Coherence Tomography Image Recognition. IEEE Trans. Instrum. Meas. 2023, 72, 2532217. [Google Scholar] [CrossRef]
- Hemalakshmi, G.R.; Murugappan, M.; Sikkandar, M.Y.; Begum, S.S.; Prakash, N.B. Automated retinal disease classification using hybrid transformer model (SViT) using optical coherence tomography images. Neural. Comput. Applic. 2024, 36, 9171–9188. [Google Scholar] [CrossRef]
- Dutta, P.; Sathi, K.A.; Hossain, M.A.; Dewan, M.A.A. Conv-ViT: A Convolution and Vision Transformer-Based Hybrid Feature Extraction Method for Retinal Disease Detection. J. Imaging 2023, 9, 140. [Google Scholar] [CrossRef] [PubMed]
- Yu, Y.; Zhu, H. Transformer-based cross-modal multi-contrast network for ophthalmic diseases diagnosis. Biocybern. Biomed. Eng. 2023, 43, 507–527. [Google Scholar] [CrossRef]
- Li, Z.; Han, Y.; Yang, X. Multi-Fundus Diseases Classification Using Retinal Optical Coherence Tomography Images with Swin Transformer V2. J. Imaging 2023, 9, 203. [Google Scholar] [CrossRef]
- Wen, H.; Zhao, J.; Xiang, S.; Lin, L.; Liu, C.; Wang, T.; An, L.; Liang, L.; Huang, B. Towards more efficient ophthalmic disease classification and lesion location via convolution transformer. Comput. Methods Programs Biomed. 2022, 220, 106832. [Google Scholar] [CrossRef]
- Azizi, M.M.; Abhari, S.; Sajedi, H. Stitched vision transformer for age-related macular degeneration detection using retinal optical coherence tomography images. PLoS ONE 2024, 19, e0304943. [Google Scholar] [CrossRef] [PubMed]
- Ashtari-Majlan, M.; Dehshibi, M.M.; Masip, D. Spatial-aware Transformer-GRU Framework for Enhanced Glaucoma Diagnosis from 3D OCT Imaging. arXiv 2024, arXiv:2403.05702. [Google Scholar] [CrossRef] [PubMed]
- Jiang, Z.; Wang, L.; Wu, Q.; Shao, Y.; Shen, M.; Jiang, W.; Dai, C. Computer-aided diagnosis of retinopathy based on vision transformer. J. Innov. Opt. Health Sci. 2022, 15, 2250009. [Google Scholar] [CrossRef]
- Kihara, Y.; Shen, M.; Shi, Y.; Jiang, X.; Wang, L.; Laiginhas, R.; Lyu, C.; Yang, J.; Liu, J.; Morin, R.; et al. Detection of Nonexudative Macular Neovascularization on Structural OCT Images Using Vision Transformers. Ophthalmol. Sci. 2022, 2, 100197. [Google Scholar] [CrossRef]
- Oghbaie, M.; Araújo, T.; Emre, T.; Schmidt-Erfurth, U.; Bogunović, H. Transformer-Based End-to-End Classification of Variable-Length Volumetric Data. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2023; Greenspan, H., Madabhushi, A., Mousavi, P., Salcudean, S., Duncan, J., Syeda-Mahmood, T., Taylor, R., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2023; Volume 14225. [Google Scholar] [CrossRef]
- Zhou, Z.; Niu, C.; Yu, H.; Zhao, J.; Wang, Y.; Dai, C. Diagnosis of retinal diseases using the vision transformer model based on optical coherence tomography images. In Proceedings of the SPIE 12601, SPIE-CLP Conference on Advanced Photonics 2022, San Diego, CA, USA, 28 March 2023; p. 1260102. [Google Scholar] [CrossRef]
- Srinivasan, P.P.; Kim, L.A.; Mettu, P.S.; Cousins, S.W.; Comer, G.M.; Izatt, J.A.; Farsiu, S. Fully automated detection of diabetic macular edema and dry age-related macular degeneration from optical coherence tomography images. Biomed. Opt. Express 2014, 5, 3568–3577. [Google Scholar] [CrossRef] [PubMed]
- Sotoudeh-Paima, S. Labeled Retinal Optical Coherence Tomography Dataset for Classification of Normal, Drusen, and CNV Cases, Mendeley Data, V1. 2021. Available online: https://paperswithcode.com/dataset/labeled-retinal-optical-coherence-tomography (accessed on 1 January 2024).
- Kermany, D.; Zhang, K.; Goldbaum, M. Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification. Mendeley Data, V2. 2018. Available online: https://data.mendeley.com/datasets/rscbjbr9sj/3 (accessed on 1 January 2024).
- Farsiu, S.; Chiu, S.J.; O’COnnell, R.V.; Folgar, F.A.; Yuan, E.; Izatt, J.A.; Toth, C.A. Quantitative Classification of Eyes with and without Intermediate Age-related Macular Degeneration Using Optical Coherence Tomography. Ophthalmology 2014, 121, 162–172. [Google Scholar] [CrossRef]
- Liu, Y.-Y.; Chen, M.; Ishikawa, H.; Wollstein, G.; Schuman, J.S.; Rehg, J.M. Automated macular pathology diagnosis in retinal OCT images using multi-scale spatial pyramid and local binary patterns in texture and shape encoding. Med. Image Anal. 2011, 15, 748–759. [Google Scholar] [CrossRef]
- Lemaître, G.; Rastgoo, M.; Massich, J.; Sankar, S.; Mériaudeau, F.; Sidibé, D. Classification of SD-OCT volumes with LBP: Application to DME detection. In Proceedings of the Ophthalmic Medical Image Analysis Second International Workshop, OMIA 2015, Held in Conjunction with MICCAI2015, Munich, Germany, 9 October 2015; pp. 9–16. [Google Scholar] [CrossRef]
- Gholami, P.; Roy, P.; Parthasarathy, M.K.; Lakshminarayanan, V. OCTID: Optical Coherence Tomography Image365 Database. Comput. Electr. Eng. 2020, 81, 106532. [Google Scholar] [CrossRef]
- Liew, A.; Agaian, S.; Benbelkacem, S. Distinctions between Choroidal Neovascularization and Age Macular Degeneration in Ocular Disease Predictions via Multi-Size Kernels ξcho-Weighted Median Patterns. Diagnostics 2023, 13, 729. [Google Scholar] [CrossRef] [PubMed]
- Liew, A.; Ryan, L.; Agaian, S. Alpha mean trim texture descriptors for optical coherence tomography eye classification. In Proceedings of the SPIE 12100, Multimodal Image Exploitation and Learning 2022, Orlando, FL, USA, 27 May 2022; p. 121000F. [Google Scholar] [CrossRef]
- Xu, J.; Yang, W.; Wan, C.; Shen, J. Weakly supervised detection of central serous chorioretinopathy based on local binary patterns and discrete wavelet transform. Comput. Biol. Med. 2020, 127, 104056. [Google Scholar] [CrossRef] [PubMed]
- Yu, Y.-W.; Lin, C.-H.; Lu, C.-K.; Wang, J.-K.; Huang, T.-L. Distinct Feature Labeling Methods for SVM-Based AMD Automated Detector on 3D OCT Volumes. In Proceedings of the 2022 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 7–9 January 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Hussain, A.; Bhuiyan, A.; Luu, C.D.; Smith, R.T.; Guymer, R.H.; Ishikawa, H.; Schuman, J.S.; Ramamohanarao, K.; Vavvas, D.G. Classification of healthy and diseased retina using SD-OCT imaging and Random Forest algorithm. PLoS ONE 2018, 13, e0198281. [Google Scholar] [CrossRef]
- Thomas, A.; Sunija, A.P.; Manoj, R.; Ramachandran, R.; Ramachandran, S.; Varun, P.G.; Palanisamy, P. RPE layer detection and baseline estimation using statistical methods and randomization for classification of AMD from retinal OCT. Comput. Methods Programs Biomed. 2021, 200, 105822. [Google Scholar] [CrossRef]
- Mousavi, E.; Kafieh, R.; Rabbani, H. Classification of dry age-related macular degeneration and diabetic macular oedema from optical coherence tomography images using dictionary. IET Image Process. 2020, 14, 1571–1579. [Google Scholar] [CrossRef]
- Sun, Y.; Li, S.; Sun, Z. Fully automated macular pathology detection in retina optical coherence tomography images using sparse coding and dictionary learning. J. Biomed. Opt. 2017, 22, 016012. [Google Scholar] [CrossRef]
- Liew, A.; Agaian, S.; Zhao, L. Mitigation of adversarial noise attacks on skin cancer detection via ordered statistics binary local features. In Proceedings of the SPIE 12526, Multimodal Image Exploitation and Learning, Orlando, FL, USA, 15 June 2023; p. 125260O. [Google Scholar] [CrossRef]
- Liew, A.; Agaian, S.S.; Zhao, L. Enhancing the resilience of wireless capsule endoscopy imaging against adversarial contrast reduction using color quaternion modulus and phase patterns. In Proceedings of the SPIE 13033, Multimodal Image Exploitation and Learning 2024, National Harbor, MD, USA, 7 June 2024; p. 130330l. [Google Scholar] [CrossRef]
- Paschali, M.; Conjeti, S.; Navarro, F.; Navab, N. Generalizability vs. Robustness: Adversarial Examples for Medical Imaging. arXiv 2018, arXiv:1804.00504. [Google Scholar] [CrossRef]
- Puttagunta, M.K.; Ravi, S.; Nelson Kennedy Babu, C. Adversarial examples: Attacks and defences on medical deep learning systems. Multimed. Tools Appl. 2023, 82, 33773–33809. [Google Scholar] [CrossRef]
- Ma, X.; Niu, Y.; Gu, L.; Wang, Y.; Zhao, Y.; Bailey, J.; Lu, F. Understanding adversarial attacks on deep learning based medical image analysis systems. Pattern Recognit. 2021, 110, 107332. [Google Scholar] [CrossRef]
- Finlayson, S.G.; Bowers, J.D.; Ito, J.; Zittrain, J.L.; Beam, L.; Kohane, I.S. Adversarial attacks on medical machine learning emerging vulnerabilities demand new conversations. Science 2019, 363, 1287–1290. [Google Scholar] [CrossRef] [PubMed]
- Hirano, H.; Minagi, A.; Takemoto, K. Universal adversarial attacks on deep neural networks for medical image classification. BMC Med. Imaging 2021, 21, 9. [Google Scholar] [CrossRef]
- Chen, F.; Wang, J.; Liu, H.; Kong, W.; Zhao, Z.; Ma, L.; Liao, H.; Zhang, D. Frequency constraint-based adversarial attack on deep neural networks for medical image classification. Comput. Biol. Med. 2023, 164, 107248. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Al–Dulaimi, K.; Obeed, H.A.-H.; Saihood, A.; Fadhel, M.A.; Jebur, S.A.; Chen, Y.; Albahri, A.; Santamaría, J.; Gupta, A.; et al. MEFF–A model ensemble feature fusion approach for tackling adversarial attacks in medical imaging. Intell. Syst. Appl. 2024, 22, 200355. [Google Scholar] [CrossRef]
- Yue, X.; Dong, Z.; Chen, Y.; Xie, S. Evidential dissonance measure in robust multi-view classification to resist adversarial attack. Inf. Fusion 2025, 113, 102605. [Google Scholar] [CrossRef]
- Jiang, S.; Wu, Z.; Yang, H.; Xiang, K.; Ding, W.; Chen, Z.-S. A prior knowledge-guided distributionally robust optimization-based adversarial training strategy for medical image classification. Inf. Sci. 2024, 673, 120705. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, H.; Bermudez, C.; Chen, Y.; Landman, B.A.; Vorobeychik, Y. Anatomical context protects deep learning from adversarial perturbations in medical imaging. Neurocomputing 2020, 379, 370–378. [Google Scholar] [CrossRef] [PubMed]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. LLaMA: Open and Efficient Foundation Language Models. arXiv 2023, arXiv:2302.13971. [Google Scholar] [CrossRef]
- Li, J.; Guan, Z.; Wang, J.; Cheung, C.Y.; Zheng, Y.; Lim, L.-L.; Lim, C.C.; Ruamviboonsuk, P.; Raman, R.; Corsino, L.; et al. Integrated image-based deep learning and language models for primary diabetes care. Nat. Med. 2024, 30, 2886–2896. [Google Scholar] [CrossRef] [PubMed]
- Xu, P.; Chen, X.; Zhao, Z.; Shi, D. Unveiling the Clinical Incapabilities: A Benchmarking Study of GPT-4V(ision) for Ophthalmic Multimodal Image Analysis. Br. J. Ophthalmol. 2024, 108, 1384–1389. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Wu, J.; Shao, A.; Shen, W.; Ye, P.; Wang, Y.; Ye, J.; Jin, K.; Yang, J. Uncovering Language Disparity of ChatGPT on Retinal Vascular Disease Classification: Cross-Sectional Study. J. Med. Internet Res. 2024, 26, e51926. [Google Scholar] [CrossRef] [PubMed]
- Dossantos, J.; An, J.; Javan, R. Eyes on AI: ChatGPT’s Transformative Potential Impact on Ophthalmology. Cureus 2023, 15, e40765. [Google Scholar] [CrossRef] [PubMed]
Feature Comparison | [1] | [2] | [3] | [4] | [5] | [6] | [7] | [8] | [9] | [10] | [11] | [13] | OUR |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Covers both DL and hand-crafted features | - | - | - | - | - | - | - | - | - | - | ✓ | - | ✓ |
In-depth discussion on hand-crafted features | - | - | - | - | - | - | - | ✓ | ✓ | - | ✓ | ✓ | ✓ |
In-depth discussion on CNNs | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
In-depth discussion and comparison of each type of CNN | - | - | - | - | - | - | - | - | - | - | - | - | ✓ |
In-depth discussion on Vision Transformers | - | - | - | - | - | - | - | - | - | - | ✓ | - | ✓ |
In-depth comparisons between types of CNNs | - | - | - | - | - | - | - | - | - | - | ✓ | ✓ | ✓ |
Includes comparative analysis of DL and HCF | - | - | - | - | - | - | - | - | - | - | - | - | ✓ |
Includes in-depth discussion of ocular disorders | - | - | - | - | - | - | - | - | - | - | ✓ | ✓ | - |
Discusses latest advancements | - | ✓ | ✓ | ✓ | - | - | ✓ | ✓ | ✓ | - | ✓ | ✓ | ✓ |
Review of multiple OCT datasets | - | - | - | - | - | - | - | - | - | - | ✓ | - | ✓ |
Reviews specific OCT imaging techniques | - | - | - | - | - | - | - | - | - | - | ✓ | ✓ | ✓ |
Identifies gaps in current research | - | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | - | ✓ | ✓ | ✓ |
Suggests future research into adversarial attacks | - | - | - | - | - | - | - | - | - | - | - | - | ✓ |
Suggests future research in LLMs | - | - | - | - | - | - | - | - | - | - | - | - | ✓ |
Survey | Scope of the Survey | Limitations/Gaps |
---|---|---|
[1] | Comprehensive narrative review of DL techniques for retinal layer segmentation in OCT; contrasts DL with traditional and early ML methods. | No benchmarking or comparative performance table; focuses only on segmentation |
[2] | Disease-specific review linking symptoms and OCT manifestations; emphasizes CAD system design. | No quantitative comparisons or benchmarking; limited detail on DL methods |
[3] | Comparative review of OCTA-based methods across applications; structured as a handbook for researchers. | Limited dataset discussion |
[4] | Focused review on OCT image registration; discusses applications like speckle noise reduction and longitudinal tracking. | Limited discussion of datasets and comparative performance |
[5] | Highlights performance metrics of CAD tools for AMD diagnosis; discusses early detection and telemedicine potential. | Focuses only on AMD; lacks detailed algorithmic comparison and dataset analysis |
[6] | Joint analysis of OCT and Fundus-based approaches for GL and AMD; supports early detection research. | No detailed performance evaluation or dataset review |
[7] | Machine learning approaches by modality and method; highlights major technical trends in both fundus and OCT domains. | Lacks a detailed comparative performance analysis |
[8] | Clinical and technical features for glaucoma detection using OCT and Fundus imaging. Bridges clinical features with computational methods and highlights anatomical biomarkers for diagnosis. | Lacks quantitative comparison or algorithmic benchmarking |
[9] | Deep learning applications for glaucoma detection using OCT (2D/3D/B-scans) and highlights DL’s clinical potential with OCT, covering 2D and 3D data. | Lacks specific dataset info |
[10] | Systematic review of AI-based retinal screening using OCT images, which surveys numerous articles from a decade, 2013–2023, covering broad disease coverage; also comparing ML vs. DL. | Lacks dataset-level comparison. |
[11] | Comparison of hand-crafted vs. deep features for OCT classification, where a quantitative comparison shows that DL outperforms hand-crafted features. | Lacks in-depth hand-crafted vs. deep features and dataset comparisons |
[13] | Deep learning methods for automated ophthalmic disease diagnosis, where 99 deep learning studies are reviewed. Discusses modality-specific challenges and trends; emphasizes clinical integration and interpretability. | No benchmarking; lacks dataset-level discussion |
Dataset | Classes and Counts | Dataset Characteristics | Institutional Source |
---|---|---|---|
1 | 15 DME volume images, 15 AMD volume images, and 15 normal volume images | Includes images of two major causes of vision impairment and a healthy baseline for comparison, which can be employed to train models to recognize diabetic swelling and age-related retinal changes, respectively. | Duke University, Harvard University, and University of Michigan |
2 | 48 AMD volume images, 50 DME images, 50 normal images | Provides a larger collection of images than D1, representing two leading causes of vision impairment along with healthy cases, offering a strong baseline for training models to detect diabetic swelling and age-related retinal abnormalities. | Noor Eye Hospital in Tehran (NEH) |
3 | 120 normal volume images, 160 drusen volume images, and 161 CNV volume images; 16,822 3D OCT images total | Focuses on identifying changes associated with AMD, particularly by isolating drusen as an early indicator of the disease, which monitors disease progression toward advanced stages like CNV. | Noor Eye Hospital in Tehran (NEH) |
4* | 37,206 CNV 2D images, 11,349 DME images 8617 drusen 2D images, 51,140 normal 2D images | Includes both wet and dry AMD, necessitating different treatment strategies, which enables training model(s) to distinguish between these forms of AMD, while identifying DME and normal retinal conditions. | University of California San Diego, Guangzhou Women and Children’s Medical Center |
4 | Trimmed-down version of 4* referred to as OCT2017 37,455 CNV 2D images, 11,598 DME 2D images, 8866 drusens 2D images, and 26,565 normal 2D images, total of 84,484 OCT images | This trimmed-down version of Dataset 4 requires reduced computational demand, which leads to less training time. | University of California San Diego, Guangzhou Women and Children’s Medical Center |
5 | 269 intermediate AMD volume images and 115 normal volume images | A simplified dataset narrows the diagnostic focus and improves the model’s ability to detect the intermediate stage of AMD. | Boards of Devers Eye Institute, Duke Eye, Center, Emory Eye Center, and National Eye Institute |
6 | 3000 AMD images, 3000 CNV images, 3000 DME images, 3000 MH images, 3000 DR images, 3000 CSR images, 24,000 total 2D OCT images | This dataset enables differentiation between various AMD types and rarer conditions like CSR and requires the model to learn from a larger volume of data to detect the eight classes. | Boards of Devers Eye Institute, Duke Eye, Center, Emory Eye Center, and National Eye Institute |
7 | Normal macular (316), macular edema (261), macular hole (297), AMD (284) | This dataset trains models to detect key macular diseases, each with distinct structural changes such as retinal swelling and fluid buildup in OCT images. | UPMC Eye Center, Eye and Ear Institute, Ophthalmology and Visual Science Research Center |
7* | 3319 OCT images total, 1254 early DME, 991 advanced DME, 672 severe DME, and 402 atrophic maculopathy | This dataset helps models learn the detailed progression of diabetic eye disease and macular damage across different severity levels. | Renmin Hospital of Wuhan University |
8 | 16 DME volume images and 16 normal volume images | This dataset helps models to learn distinctions between healthy and diabetic macular edema cases. | Singapore Eye Research Institute (SERI) |
9 | Macular holes, MH (102), AMD (55), diabetic retinopathy, DR (107), and normal retinal images (206) | This dataset helps train models to detect a range of retinal conditions, including macular holes, which cause central vision loss, while DR results from diabetes-related vessel damage. | Cirrus HD-OCT machine (Carl Zeiss Meditec, Inc., Dublin, CA, USA) at Sankara Nethralaya (SN) Eye Hospital in Chennai, India |
10 | 1395 samples (697 glaucoma and 698 non-glaucoma) | This helps improve the model’s ability to distinguish between healthy eyes and glaucoma. | Zhongshan Ophthalmic Center, Sun Yat-sen University |
Refs. | Method | Method’s Descriptions | Performance Summary |
---|---|---|---|
[16] | LBP Slice-Sum and SVM | Low-complexity feature vector slice-sum with SVM classifier. | D5Method: Accuracy (%), Sensitivity (%), LBP-RIU2: 90.80, 93.85, 87.72 |
[17] | 3D-LBP | Global descriptors extracted from 2D feature images for LBPs and from the 3D volume OCT image. Features are fed into classifier for predictions. | D9,VACC% F1% SE% SP% Global-LBP: 81.2 78.5 68.7 93.7 Local-LBP: 75.0 75.0 75.0 75.0 Local-LBP-TOP: 75.0 73.3 68.7 81.2 |
[18] | HOG + LBP | Histogram of Oriented Gradients (HOG) and LBP features are extracted and combined. These features are fed into a linear SVM classifier. | D9,VSens, Spec, Prec, F1, Acc. HOG: 0.69 0.94 0.91 0.81 0.78 HOG + PCA: 0.75 0.87 0.85 0.80 0.81 |
[19] | Multi-kernel Wiener local binary patterns (MKW-LBP) | Image denoised using Wiener filter. MKW-LBP descriptor calculates the mean and variance of neighboring pixels. SVMs, Adaboost, and Random Forest are used for classification. | D1Kernel/Classifier: Prec. (%), Sen. (%), spec. (%), Acc (%), 3 × 3/SVM-Poly: 97.84, 97.48, 98.89, 97.86 3 × 5/SVM-Poly: 98.84, 98.59, 99.41, 98.85 5 × 5/SVM-Poly: 98.19, 98.05, 99.15, 98.33 |
[75] | Multi-Size Kernels Echo-Weighted Median Patterns (MSK-EMP) | Image denoised using median filter and flattened. MSKξMP is a variant of LBP that selects a weighted median pixel in a kernel and is applied to a preprocessed image. Also employs Singular Value Decomposition and Neighborhood Component Analysis-based weighted feature selection method. | Classifier: prec., sens., spec, acc D1SVM-Poly: 0.9976, 0.9971, 0.9989, 0.9978 D2SVM-Poly: 0.9662, 0.9663, 0.9833, 0.9669 D3SVM: RBF: 0.8952, 0.8758, 0.9395, 0.8887 |
[76] | Alpha Mean Trim Local Binary Patterns (AMT-LBP) | Image denoised using median filter and is flattened. AMT-LBP is a variant of LBP that encodes by averaging all pixel values in a kernel and omitting highest and lowest values. SVM is employed for classification. | D1SVM-Poly: tr1 = 0, tr2 = 2 || SVM-Poly: tr1 = 2, tr2 = 0 || SVM-Poly: tr1 = 2, tr2 = 2 precision 0.9796 || 0.9846 || 0.9710 sensitivity 0.9751 || 0.9813 || 0.9654 specificity 0.9887 || 0.9920 || 0.9854 accuracy 0.9774 || 0.9836 || 0.9700 F-measure 0.9773 || 0.9829 || 0.9680 AUC 0.9740 || 0.9802 || 0.9697 |
[77] | H-F-V&H-LBP + T | Combines discrete wavelet transform (DWT) image decomposition, LBP-based texture feature extraction, and multi-instance learning (MIL). LBP is chosen for its ability to handle low contrast and low-quality images. | D3,BAcc.: 99.58% |
[78] | Slice-chain labeling Slice-threshold labeling | OCT B-scans of a volume image are employed, where each slice is labeled and thresholded, which extracts features. | D3,BD5–Acc.: 92.50% D3,BD5–Acc.: 96.36% |
[79] | Retinal thickness method | The thickness of the retinal layers is measured, and each OCT image is classified according to the thickness. | D3,BD1–Acc.: 97.33%, Sen. 94.67%, Spec. 100%, F1: 97.22%, AUC: 0.99 |
[80] | RPE layer detection and baseline estimation using statistical methods | Pixel grouping/iterative elimination, guided by layer intensities, is employed to detect the RPE layer and is enhanced by randomization techniques. | D1,VAMD Acc: 100% Normal Acc: 93.3% DME Acc: 96.6% |
[68] | Histogram of Oriented Gradients (HOG) descriptors and SVM | Noise removal using sparsity-based block matching and 3D filtering. HOG and SVM are employed for the classification of AMD and DME. | D1,VAMD Acc: 100% Normal Acc: 86.67% DME Acc: 100% |
[81] | Dictionary learning (COPAR), (FDDL), and (LRSDL) | Image denoising, flattening the retinal curvature, cropping, extracting HOG features, and classifying using a dictionary learning approach. | D1,VD1–AMD Acc: 100% Normal Acc: 100% DME Acc: 95.13% |
[82] | Sparse Coding Dictionary Learning | Preprocessed retina aligning and image cropping. Then, image partitioning, feature extraction, and dictionary training with sparse coding are applied to the OCT images. Linear SVM is utilized to classify images. | D1,VD1–AMD Acc: 100% Normal Acc: 100% DME Acc: 95.13% |
CNN Structure | Key Components | Relative Merits | When It Performs Best |
---|---|---|---|
Standard CNN | Residual or Inception Unit → Feature Extraction → Fully Connected Layer | Simple and efficient; good for baseline performance | Suitable for balanced datasets with clear disease features |
Ensemble CNN | Multiple CNNs → Independent FC Layers → Voting Mechanism | Increases robustness and reduces model variance | Effective when the dataset is diverse or noisy; improves generalization |
Multi-Scale CNN | Multi-Scaler → Multiple Resolutions → Feature Extraction | Captures features at different scales; enhances spatial awareness | Best for detecting lesions of varying sizes (e.g., drusen, edema) |
Attention-Based CNN | CNN → Attention Network (e.g., segmentation map) → FC | Focuses on clinically important regions; improves interpretability | Ideal when critical regions are small or when guided attention improves accuracy |
CNN With Augmentation | Augmenter Network → CNN | Addresses class imbalance; improves training data diversity | Performs well when the dataset is imbalanced or small; boosts underrepresented class accuracy |
Refs | Method | Method’s Description | Results |
---|---|---|---|
[20] | Hybrid Retinal Fine-Tuned Convolutional Neural Network (R-FTCNN) | R-FTCNN is employed with principal component analysis (PCA) used concurrently within this methodology. PCA converts the fully connected layers of the R-FTCNN into principal components, and the Softmax function is then applied to these components to create a new classification model. | D1FC1 + PCA: Acc: 1.0000, Sen.: 1.0000, Spec.: 1.0000, Prec.: 1.0000, F1: 1.0000, AUC: 1.0000 D4FC1 + PCA: Acc: 0.9970, Sen.: 0.9970, Spec.: 0.9990, Prec.: 0.9970, F1: 0.9970, AUC: 0.99999 (61mil-parameters) |
[21] | Complementary Mask Guided Convolutional Neural Network (CM-CNN) | CM-CNN classifies OCT B-scans by using masks generated from a segmentation task. A Class Activation Map Guided UNet (CAM-UNet) segments drusen and CNV lesions, utilizing CAM output from the CM-CNN. | D3AUC, Sen, Spe, Class Acc D3CNV: 0.9988, 0.9960, 0.9680, 0.9773 D3Drusen 0.9874, 0.9120, 0.9980, 0.9693 D3Normal 0.9999, 1, 0.9880, 0.9920 D3Overall Acc: 0.9693 |
[22] | CNN Iterative ReliefF + SVM | DeepOCT employs multilevel feature extraction using 18 pre-trained networks combined with tent maximal pooling, followed by feature selection using ReliefF. | D1Acc: 1.00, Pre: 1.00, F1: 1.00, Rec: 1.00, MCC: 1.00 4*Acc: 0.9730, Pre: 0.9732, F1: 0.9730, Rec: 0.9730, MCC: 0.9641 |
[23] | Inception V3–Custom Fully Connected layers | Eliminating the final layers of a pre-trained Inception V3 model and using the remaining part as a fixed feature extractor. | D1,VAMD 15/15 = 100%, DME 15/15 = 100%, NOR 15/15 = 100% |
[24] | AOCT-NET | Utilizes a Softmax classifier to distinguish between five retinal conditions: AMD, CNV, DME, drusen, and typical cases. | 4+5AMD: 100%, 100%; CNV: 98.64%, 100%; DME: 99.2%, 0.96; Drusen: 97.84%, 0.92; Normal: 98.56%, 0.97 |
[25] | Iterative Fusion Convolutional Neural Network (IFCNN) | Employs iterative fusion for merging features from the current convolutional layer with those from all preceding layers in the network. | D4Sensitivity., Specificity, Accuracy Drusen 76.8 ± 7.2, 94.9 ± 1.9, 93 ± 1.7 87.3 ± 2.2; CNV 87.9 ± 4.3, 96 ± 1.7, 92.4 ± 1.3, DME 81.9 ± 6.8, 96.3 ± 2, 94.4 ± 1, Normal 92.2 ± 4.7 96 ± 1.6 94.8 ± 1.2. |
[26] | IoT OCT Deep Net2 | Expands from 30 to 50 layers and features a dense architecture with three recurrent modules. | D4Precision, Recall, F1-Score, Acc. 0.97 Normal: 0.99, 0.93, 0.96, CNV: 0.95, 0.98, 0.98, DME: 0.96, 0.99, 0.98, Drusen: 0.99, 1.00, 0.99 |
[27] | Capsule Network | Composed of neuron groups representing different attributes, utilizes vectors to learn positional relationships between image features. | D4Sensitivity, Specificity, Precision, F1 CNV: 1.0, 0.9947, 1.0, 1.0, DME: 0.992, 0.9973, 0.992, 0.992, Drusen: 0.992, 0.9973, 0.992, 0.992, Normal: 1.0, 1.0, 1.0, 1.0 |
[28] | Dictionary Learning Informed Deep Neural Network (DLI-DNN) | Downsampling by utilizing DAISY descriptors and Improved Fisher kernels to extract features from OCT images. | D4Accuracy: 97.2%, AUC: 0984, Sensitivity: 97.1%, Specificity: 99.1% |
[29] | S-DDL–4 classes Wavelet Scattering Transform (WST)–5 classes | S-DDL addresses the vanishing gradient problem and shortens training time (Figure 4) -------------------------------------------- WST employs the Wavelet Scattering Transform using predefined filters within the network layers (Figure 6). | D9CSR-Acc: 45.5%, AMD-Acc: 64.3%, MH-Acc: 56.0%, NO-Acc: 90.7% OA: 72.0% ------------------------------------ D9AMD-Acc: 9.1%, CSR-Acc: 90%, DR-Acc: 71.4%, MH-Acc: 75%, NO-Acc: 100% OA: 79.6% |
[30] | Multiple Instance Learning (UD-MIL) | Employs an instance-level classifier for iteratively deep multiple instance learning, which enables the classifier. Then, a recurrent neural network (RNN) utilizes the features from those instances to make the final predictions. | D5Accuracy, F1, AUC μ = 0.1, 0.971 ± 0.010, 0.980 ± 0.007, 0.955 ± 0.020 μ = 0.2, 0.979 ± 0.018, 0.986 ± 0.012, 0.970 ± 0.027 μ = 0.3, 0.979 ± 0.018, 0.986 ± 0.012, 0.970 ± 0.027 μ = 0.4, 0.979 ± 0.011, 0.986 ± 0.007, 0.975 ± 0.020 μ = 0.5, 0.979 ± 0.011, 0.986 ± 0.007, 0.975 ± 0.020 |
Refs. | Method | Method’s Descriptions | Results | |||
---|---|---|---|---|---|---|
[32] | Lesion-Aware Convolutional Neural Network (LACNN) | LACNN concentrates on local lesion-specific regions by utilizing a lesion detection network to generate a soft attention map over the entire OCT image. | D4 | Acc | Prec | |
Drusen | 93.6 ± 1.4 | 70.0 ± 5.7 | ||||
CNV | 92.7 ± 1.5 | 93.5 ± 1.3 | ||||
DME | 96.6 ± 0.2 | 86.4 ± 1.6 | ||||
Normal | 97.4 ± 0.2 | 94.8 ± 1.1 | ||||
D4Overall ACC: 90.1 ± 1.4, Overall Sensitivity: 86.8 ± 1.3 D2Overall Sensitivity: 99.33 ± 1.49, Overall PR: 99.39 ± 1.36, F1, 99.33 ± 1.49, AUC: 99.40 ± 1.34 | ||||||
[33] | Multilevel Dual-Attention-Based CNN (MLDA-CNN) | A dual-attention mechanism is applied at multiple levels of the CNN and integrates multilevel feature-based attention, emphasizing high-entropy regions within the finer features. | D1Acc: 95.57, Prec: 95.29, Recall: 96.04, F1: 0.996 D2Acc: 99.62 (+/−0.42), Prec: 99.60 (+/−0.39), Recall: 99.62 (+/−0.42), F1: 0.996, AUC: 0.9997 | |||
[34] | Multilevel Perturbed Spatial Attention (MPSA) and Multidimension Attention (MDA) | MPSA emphasizes key regions in input images and intermediate network layers by perturbing the attention layers. MDA captures the information across different channels of the extracted feature maps. | D1Acc: 100%, Prec: 100%, Recall: 100% D2Acc: 99.79 (+/−0.43), Prec: 99.80 (+/−0.41), Recall: 99.78 (+/−0.43) D4Acc: 92.62 (+/−1.69), Prec: 89.96 (+/−3.16), Recall: 88.53 (+/−3.26) | |||
[35] | One-Stage Attention-Based Framework Weakly Supervised Lesion Segmentation | One-stage attention-based classification and segmentation, where the classification network generates a heatmap through Grad-CAM and integrates the proposed attention block. | D4 | Acc | SE | Spec |
CNV | 93.6 ± 1.9 | 90.1 ± 3.8 | 96.5 ± 1.4 | |||
DME | 94.8 ± 1.2 | 86.5 ± 1.5 | 96.4 ± 2.1 | |||
DRUSEN | 94.6 ± 1.4 | 71.5 ± 4.8 | 96.9 ± 1.2 | |||
NORMAL | 97.1 ± 1.0 | 96.3 ± 1.5 | 98.9 ± 0.3 | |||
D4OA: 90.9 ± 1.0, OS: 86.3 ± 1.8, OP: 85.5 ± 1.6 | ||||||
[36] | Efficient Global Attention Block (GAB) and Inception | GAB generates an attention map across three dimensions for any intermediate feature map and computes adaptive feature weights by multiplying the attention map with the input feature map. | D4*Accuracy: 0.914, Recall: 0.9141, Specificity: 0.9723, F1: 0.915, AUC: 0.9914 | |||
[37] | B-Scan Attentive Convolutional Neural Network (BACNN) | BACNN employs a self-attention module to aggregate extracted features based on their clinical significance, producing a high-level feature vector for diagnosis. | D1Sen: 97.76 ± 2.07, Spec: 95.61 ± 4.35, Acc: 97.12 ± 2.78, | |||
D2 | Sens. | Spec. | Acc. | |||
AMD | 92.0 ± 4.4 | 95.0 ± 0.1 | 93.2 ± 2.7 | |||
DME | 100.0 ± 0.0 | 98.9 ± 2.4 | 99.3 ± 1.5 | |||
Normal | 87.8 ± 4.3 | 93.2 ± 2.3 | 92.2 ± 2.3 | |||
[38] | 6G-Enabled IoMT Method–MobileNetV3 | Leverages transfer learning for feature extraction and optimizes through feature selection using Hunger Games search algorithm. | D4 | Acc. | Recall | Prec |
SVM | 99.69 | 99.69 | 99.69 | |||
XGB | 99.38 | 99.38 | 99.4 | |||
KNN | 99.59 | 99.59 | 99.59 | |||
RF | 99.38 | 99.38 | 99.4 | |||
[39] | Deep Ensemble CNN + SVM, Naïve Bayes, Artificial Neural Network | A secondary layer within the CNN model to extract key feature descriptors, where they are subsequently concatenated and fed into a supervised hybrid classifier SVM and naïve Bayes models. | D4Sensivity, Specificity, Accuracy ANN: 0.96, 0.90, 0.93 || SVM: 0.94, 0.91, 0.91 NB: 0.93, 0.90, 0.91 || Ensemble: 0.97, 0.92, 0.94 | |||
[40] | Multi-scale Deep Feature Fusion (MDFF) CNN | MDFF technique captures inter-scale variations in the images, providing the classifier with discriminative information. | D4 | Sens. | Spec. | Acc. |
CNV | 96.6 | 98.73 | 97.78 | |||
DME | 94.14 | 98.97 | 98.33 | |||
DR | 90.49 | 98.32 | 97.52 | |||
NO | 96.9 | 89.26 | 97.85 | |||
[41] | Multi-scale and Multipath CNN with Six Convolutional Layers | MDFF captures variations across different scales and feeds them into a classifier. | Precision | Recall | Accu. | |
D1-2C | 0.969 | 0.967 | 0.9666 | |||
D2-2C | 0.99 | 0.99 | 0.9897 | |||
D4-2C | 0.998 | 0.998 | 0.9978 | |||
[42] | Multi-scale CNN with Seven Convolutional Layers | The architecture consists of a multi-scale CNN with seven convolutional layers, allowing for the generation of numerous local structures with various filter sizes. | Precision Recall F1-score Accuracy AUC D1-2C 0.9687, 0.9666, 0.9666, 0.9667, 1.0000 D2-2C 0.9803, 0.9795, 0.9795, 0.9795, 0.9816 D4-2C 0.9973, 0.9973, 0.9973, 0.9973, 0.9999 D9-2C 0.9810 0.9808 0.9809 0.9808 0.9971 | |||
[43] | Multi-scale CNN Based on the Feature Pyramid Network | Combines a feature pyramid network (FPN) and utilizes multi-scale receptive fields, providing end-to-end training. | Accuracy (%) Sensitivity (%) Specificity (%) D2FPN-VGG16: 92.0 ± 1.6, 91.8 ± 1.7, 95.8 ± 0.9 D2FPN-ResNet50: 90.1 ± 2.9, 89.8 ± 2.8, 94.8 ± 1.4 D2FPN-DenseNet: 90.9 ± 1.4, 90.5 ± 1.9, 95.2 ± 0.7 D2FPN-EfficientNetB0: 87.8 ± 1.3, 86.6 ± 1.8, 93.3 ± 0.8 D4FPN-VGG16: 98.4, 100, 97.4 | |||
[44] | Multi-scale (Pyramidal) Feature Ensemble Architecture (MSPE) | A multi-scale feature ensemble architecture employing a scale-adaptive neural network generates multi-scale inputs for feature extraction and ensemble learning. | D1Acc= 99.69%, Sen= 99.71%, Spec.= 99.87% D4Accy = 97.79%, Sen = 95.55%, Spec. = 99.72% | |||
[45] | Multi-scale Convolutional Mixture of Experts (MCME) Ensemble Model | MCME model utilizes a cost function for feature learning by applying CNNs at multiple scales. Maximizing a likelihood function for the training dataset and ground truth using a Gaussian mixture model. | D2Precision: 99.39 ± 1.21, Recall: 99.36 ± 1.33, F1: 99.34 ± 1.34, AUC: 0.998 | |||
[46] | Deep Multi-scale Fusion CNN (DMF-CNN) | DMF-CNN uses multiple CNNs with varying receptive fields to extract scale-specific features, which are then used to extract cross-scale features. Additionally, a joint scale-specific and cross-scale multi-loss optimization strategy is employed. | D2Sensitivity (%), Precision (%), F1 Score, OS, OP/OF1 AMD: 99.62 ± 0.27, 99.54 ± 0.17, 99.58 ± 0.16, 99.58 ± 0.23 DME: 99.45 ± 0.59, 99.45 ± 0.38, 99.45 ± 0.35, 99.59 ± 0.20 Normal: 99.68 ± 0.22, 99.75 ± 0.41, 99.71 ± 0.20, 99.60 ± 0.22 OA: 99.60 ± 0.21, AUC: 0.997 ± 0.002 D4Sensitivity (%), Precision (%), F1 Score CNV: 97.33 ± 1.05, 97.05 ± 1.19, 97.18 ± 0.32 DME: 93.22 ± 3.22, 96.26 ± 2.17, 94.65 ± 1.09 Drusen: 89.29 ± 3.59, 87.73 ± 3.84, 88.34 ± 1.27 Normal: 97.62 ± 1.11, 97.49 ± 1.30, 97.55 ± 0.49, OS/OP/OF1/OA: 94.37 ± 1.16, 94.64 ± 0.90, 94.43 ± 0.59, 96.03 ± 0.43 | |||
[47] | Surrogate-assisted CNN | Denoising, thresholding, and morphological dilation are performed on images to create masks, which produce surrogate images for training the CNN model. | D1Denoised: Acc: 95.09%, Sen. 96.39%, Spec: 93.60% D1Surrogate: Acc: 95.09%, Sen. 96.39%, Spec: 93.60% | |||
[48] | CNN and Semi-supervised GAN | D2 | Sen (%) | Spec (%) | Acc (%) | |
AMD | 98.38 ± 0.69 | 97.79 ± 0.68 | 97.98 ± 0.61 | |||
DME | 96.96 ± 1.32 | 99.23 ± 0.36 | 98.61 ± 0.49 | |||
Normal | 96.96 ± 0.73 | 99.12 ± 0.64 | 98.26 ± 0.67 | |||
OS/OSp/OA: 97.43 ± 0.68, 98.71 ± 0.34, 97.43 ± 0.66 |
Refs. | Method | Method’s Descriptions | Results | |||
---|---|---|---|---|---|---|
[50] | Hybrid ConvNet-Transformer network (HCTNet) | HCT-Net employs feature extraction modules via a residual dense block. Next, two parallel branches, a Transformer and ConvNet, are utilized to capture both global and local contexts in the OCT images. A feature fusion module with an adaptive reweighting mechanism integrates these global and local features. | D1 | Acc. (%) | Sen. (%) | Prec. (%) |
AMD | 95.94 | 82.6 | 95.08 | |||
DME | 86.61 | 80.22 | 85.29 | |||
Normal | 89.81 | 93.39 | 85.22 | |||
OA: 86.18%, OS: 85.40%, OP: 88.53% | ||||||
D4 | Acc (%) | Sen. (%) | Prec. (%) | |||
CNV | 94.6 | 92.23 | 95.53 | |||
DME | 96.14 | 87.96 | 84.42 | |||
Drusen | 95.54 | 77.36 | 79.00 | |||
Normal | 96.84 | 96.73 | 93.5 | |||
OA: 91.56%, OS: 88.57%, OP: 88.11% | ||||||
[51] | Interpretable Swin-Poly Transformer network | Swin-Poly transformer shifts window partitions and connects adjacent non-overlapping windows from the previous layer, allowing it to flexibly capture multi-scale features. The model refines cross-entropy by adjusting the importance of polynomial bases, thereby improving the accuracy of retinal OCT image classification. | D4 | Acc. | Prec. | Recall |
CNV | 1.0000 | 0.9960 | 1.0000 | |||
DME | 0.9960 | 1.0000 | 0.9960 | |||
Drusen | 1.0000 | 0.9960 | 1.0000 | |||
Normal | 0.9960 | 1.0000 | 0.9960 | |||
Ave. | 0.9980 | 0.9980 | 0.9980 | |||
D6 | Acc. | Prec. | Recall | |||
AMD | 1.0000 | 1.0000 | 1.0000 | |||
CNV | 0.9489 | 0.9389 | 0.9571 | |||
CSR | 1.0000 | 1.0000 | 1.0000 | |||
DME | 0.9439 | 0.9512 | 0.9457 | |||
DR | 1.0000 | 0.9972 | 1.0000 | |||
Drusen | 0.9200 | 0.9580 | 0.9114 | |||
MH | 1.0000 | 1.0000 | 0.9971 | |||
Normal | 0.9563 | 0.9254 | 0.9571 | |||
Ave. | 0.9711 | 0.9713 | 0.9711 | |||
[52] | Focused Attention Transformer | Focused Attention employs iterative conditional patch resampling to produce interpretable predictions through high-resolution attribution maps. | D4* | Acc. (%) | Spec. (%) | Recall (%) |
T2T-ViT_14 | 94.40 | 98.13 | 94.40 | |||
T2T-ViT_19 | 93.20 | 97.73 | 93.20 | |||
T2T-ViT_24 | 93.40 | 97.80 | 93.40 | |||
[53] | ViT with Logit Loss Function | Captures global features via self-attention mechanism, reducing reliance on local texture features. Adjusts classifier’s logit weights and modifies them to a logit cross-entropy function with L2 regularization as a loss function. | D7* | Acc (%) | Sen. (%) | Spec. (%) |
Early DME | 90.87 | 87.03 | 93.02 | |||
Advanced DME | 89.96 | 88.18 | 90.72 | |||
Severe DME | 94.42 | 63.39 | 98.4 | |||
maculopathy | 95.13 | 89.42 | 96.66 | |||
OA: 87.3% | ||||||
[54] | Model-Based ViT (MBT-ViT), Model-Based ViT (MBT-SwinT), Multi-Scale Model-Based ViT (MBT-ViT) | Approximate sparse representation MBT utilizes ViT Swin ViT and multi-scale ViT for OCT video classification, then estimates key features before performing data classification. | D4 | Acc. | Recall | |
MBT ViT | 0.8241 | 0.8138 | ||||
MBT SwinT | 0.8276 | 0.8172 | ||||
MBT_MViT | 0.9683 | 0.9667 | ||||
[55] | Structure-Oriented Transformer (SoT) | SoT employs guidance mechanism that acts as a filter to emphasize the entire retinal structure. Utilizes vote classifier, which optimizes the utilization of all output tokens to generate the final grading results. | B-acc | Sen | Spe | |
D1SoT | 0.9935 | 0.9925 | 0.9955 | |||
D5 SoT | 0.9935 | 0.9925 | 0.9955 | |||
[56] | OCT Multihead Self-Attention (OMHSA) | OMHSA enhances self-attention mechanism by incorporating local information extraction, where a network architecture, called OCTFormer, is built by repeatedly stacking convolutional layers and OMHSA blocks at each stage. | D4 | ACC | Prec. | Sen. |
OCT Former-T | 94.36 | 94.75 | 94.37 | |||
OCT Former-S | 96.67 | 96.78 | 96.68 | |||
OCT Former-B | 97.42 | 97.47 | 97.43 | |||
[57] | Squeeze Vision transformer (S-ViT) | SViT combines SqueezeNet and ViT to capture local and global features, which enables more precise classification while maintaining lower computational complexity. | D5 Acc.: 0.9990, Sen.: 0.9990, Prec.: 1.000 | |||
[14] | Deep Relation Transformer (DRT) | DRT integrates both OCT and Vision Field (VF) data, where this model incorporates a deep reasoning mechanism to identify pairwise relationships between OCT and VF. | D10Ablation Study | |||
Back-bone | Acc (%) | Sen (%) | Spec (%) | |||
Light ResNet | 88.3 ± 1.0 | 93.7 ± 3.5 | 82.4 ± 4.1 | |||
ResNet-18 | 87.6 ± 2.3 | 93.1 ± 2.4 | 82.1 ± 4.3 | |||
ResNet-34 | 87.2 ± 1.6 | 90.4 ± 5.0 | 83.9 ± 3.6 | |||
[58] | Conv-ViT–inception V3 and ResNet50 | Integrates Inception-V3 and ResNet-50 to capture texture information by evaluating the relationships between nearby pixels. A Vision Transformer processes shape-based features by analyzing correlations between distant pixels. | D4 | Feature Level Concatenation | Decision Level Conc. | |
Acc. | 94.46% | 87.38% | ||||
Prec. | 0.94 | 0.87 | ||||
Recall | 0.94 | 0.86 | ||||
F1 Score | 0.94 | 0.86 | ||||
[59] | Multi-contrast Network | ViT cross-modal multi-contrast network utilizes multi-contrast learning to extract features. Then, a channel fusion head aggregates across different modalities. | D4 | Acc (%) | SE (%) | SP (%) |
Normal | 99.5 | 99.38 | 100 | |||
CNV | 100 | 100 | 100 | |||
DR | 99.5 | 100 | 99.42 | |||
AMD | 100 | 100 | 100 | |||
All | 99.75 | 99.84 | 99.85 | |||
[60] | Swin Transformer V2 with Poly Loss Function | Swin Transformer V2 leverages self-attention within local windows while using a PolyLoss function. | D4 | Acc. | Recall | Spec. |
CNV | 0.999 | 1.00 | 0.996 | |||
DME | 0.999 | 1.00 | 1.00 | |||
DRUSEN | 1.00 | 1.00 | 1.00 | |||
NORMAL | 1.00 | 1.00 | 1.00 | |||
D6 | Acc. | Recall | Spec. | |||
AMD | 1.00 | 1.00 | 1.00 | |||
CNV | 0.989 | 0.949 | 0.995 | |||
CSR | 1.00 | 1.00 | 1.00 | |||
DME | 0.992 | 0.977 | 0.995 | |||
DR | 1.00 | 1.00 | 1.00 | |||
DRUSEN | 0.988 | 0.934 | 0.995 | |||
MH | 1.00 | 1.00 | 1.00 | |||
NORMAL | 0.991 | 0.98 | 0.992 | |||
[61] | Lesion-Localization Convolution Transformer (LLCT) | LLCT combines CNN-extracted feature maps with a self-attention network to capture both local and global image context. The model uses backpropagation to adjust weights, enhancing lesion detection by integrating global features from forward propagation. | D4 | Acc (%) | Sens (%) | Spec. (%) |
CNV | 98.1 ± 1.9 | 99.4 ± 0.3 | 97.6 ± 2.7 | |||
DME | 99.6 ± 0.2 | 99.6 ± 0.0 | 99.5 ± 0.3 | |||
Drusen | 98.1 ± 2.3 | 92.8 ± 8.5 | 99.9 ± 0.2 | |||
Norm | 99.6 ± 0.6 | 98.8 ± 1.7 | 99.9 ± 0.2 | |||
[62] | Stitched MedViTs | Stitching approach combines two MedViT models to find an optimal architecture. This method inserts a linear layer between pairs of stitchable layers, with each layer selected from one of the input models, creating a candidate model in the search space. | D4 | Spec. | Acc. | |
micro MedViT | 0.928 ± 0.002 | 0.828 ± 0.007 | ||||
tiny MedViT | 0.933 ± 0.002 | 0.841 ± 0.007 | ||||
micro MedViT | 0.987 ± 0.001 | 0.977 ± 0.002 | ||||
tiny MedViT | 0.986 ± 0.002 | 0.977 ± 0.004 | ||||
[63] | Bidirectional Gated Recurrent Unit (GRU) | Combines a pre-trained Vision Transformer for slice-wise feature extraction with a bidirectional GRU to capture inter-slice spatial dependencies, enabling analysis of both local details and global structural integrity. | D4 | ACC | SEN | SPE |
ResNet34 + GRU | 87.39 (± 1.73) | 92.03 | 72.86 | |||
ViT-large + GRU | 90.27 (± 1.44) | 94.25 | 78.18 |
Ref | Adversarial Samples Introduced | Modality | Technique Employed | Future Direction |
---|---|---|---|---|
[19] | Gaussian distributed noise with various noise levels | OCT images | MKW-LBP local descriptor with SVM and Random Forest classifiers. | Analyze how the application of Gaussian noise to OCT images affects classification performance metrics across various disease classes, including CNV, CSR, macular hole, and others. |
[83] | Pepper noise with various noise densities | Skin cancer images | OS-LBP codes skin cancer images and is used to train CNN models. Trained models are employed for identifying potential skin cancer areas and mitigating the effects of image degradation. | Examine the effects of Salt and Pepper noise applied to OCT images across various diseases. |
[84] | Contrast degradation | Endoscopic Images | Encodes WCE images using CQ-MPP and is used to train CNN models. Experts are employed for identifying areas of lesions and mitigating the effects of contrast degradation. | Adjusting the brightness and contrast of OCT images and observing their effects on performance. |
[85] | Fast Gradient Sign Method (FGSM) | Skin cancer images, MRI | Adversarial training using inception for skin cancer classification and brain tumor segmentations. | Apply FGSM on OCT images on other deep learning networks such as ViT, Swin Transformer, and other DL techniques. Observe the effects on their performance metrics. |
[86] | FGSM Perturbations, Basic Iterative Method (BIM), Projected Gradient Descent (PGD), Carlini and Wagner (CW) Attack | Eye fundus, lung X-rays, skin cancer images | KD models normal samples within the same class as densely clustered in a data manifold, whereas adversarial samples are distributed more sparsely outside the data manifold. LID is a metric used to describe the dimensional properties of adversarial subspaces in the vicinity of adversarial examples. | Apply BIM, PGD, and CW on OCT images on other deep learning networks such as ViT, Swin Transformer, and other DL techniques. Observe the effects on their performance metrics. Modify DL technique to mitigate the adversarial effects. |
[90] | Frequency constraint-based adversarial attack | 3D-CT, a 2D chest X-Ray image dataset, a 2D breast ultrasound dataset, and a 2D thyroid ultrasound | A perturbation constraint, known as the low-frequency constraint, is introduced to limit perturbations to the imperceptible high-frequency components of objects, thereby preserving the similarity between the adversarial and original examples. | Apply frequency constraint-based attacks to OCT images in the frequency domain and observe their impact on the images in the spatial domain. |
[91] | Model Ensemble Feature Fusion (MEFF) | Fundoscopy, chest X-ray, dermoscopy | The MEFF approach is designed to mitigate adversarial attacks in medical image applications by combining features extracted from multiple deep learning models and training machine learning classifiers using these fused features. | Applying MEFF on two or more DL techniques to observe how well it mitigates classification errors. |
[92] | Multi-View Learning | Natural RGB images | A multi-view classification method with an adversarial sample uses the evidential dissonance measure in subjective logic to evaluate the quality of data views when subjected to adversarial attacks. | Generate multi-view representations of OCT images and assess their impact on classification performance. |
[93] | Medical morphological knowledge-guided | Lung CT scans | This approach trains a surrogate model with an augmented dataset using guided filtering to capture the model’s attention, followed by a gradient normalization-based prior knowledge injection module to transfer this attention to the main classifier. | Train the model on OCT images using a morphology-guided approach and evaluate its effectiveness in reducing classification errors. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liew, A.; Agaian, S. Comprehensive Survey of OCT-Based Disorders Diagnosis: From Feature Extraction Methods to Robust Security Frameworks. Bioengineering 2025, 12, 914. https://doi.org/10.3390/bioengineering12090914
Liew A, Agaian S. Comprehensive Survey of OCT-Based Disorders Diagnosis: From Feature Extraction Methods to Robust Security Frameworks. Bioengineering. 2025; 12(9):914. https://doi.org/10.3390/bioengineering12090914
Chicago/Turabian StyleLiew, Alex, and Sos Agaian. 2025. "Comprehensive Survey of OCT-Based Disorders Diagnosis: From Feature Extraction Methods to Robust Security Frameworks" Bioengineering 12, no. 9: 914. https://doi.org/10.3390/bioengineering12090914
APA StyleLiew, A., & Agaian, S. (2025). Comprehensive Survey of OCT-Based Disorders Diagnosis: From Feature Extraction Methods to Robust Security Frameworks. Bioengineering, 12(9), 914. https://doi.org/10.3390/bioengineering12090914