Artificial Intelligence in Decoding Ocular Enigmas: A Literature Review of Choroidal Nevus and Choroidal Melanoma Assessment

Karamanli, Konstantina-Eleni; Maliagkani, Eirini; Petrou, Petros; Papageorgiou, Elpiniki; Georgalas, Ilias

doi:10.3390/app15073565

Open AccessReview

Artificial Intelligence in Decoding Ocular Enigmas: A Literature Review of Choroidal Nevus and Choroidal Melanoma Assessment

by

Konstantina-Eleni Karamanli

^1,†

,

Eirini Maliagkani

^1,†

,

Petros Petrou

¹,

Elpiniki Papageorgiou

^2,*

and

Ilias Georgalas

¹

1st Department of Ophthalmology, G. Gennimatas Hospital, National and Kapodistrian University of Athens, 11527 Athens, Greece

²

Department of Energy Systems (ACTA Lab), University of Thessaly, Gaiopolis Campus, 41500 Larisa, Greece

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2025, 15(7), 3565; https://doi.org/10.3390/app15073565

Submission received: 25 February 2025 / Revised: 20 March 2025 / Accepted: 24 March 2025 / Published: 25 March 2025

(This article belongs to the Special Issue Robotics, IoT and AI Technologies in Bioengineering)

Download

Browse Figure

Versions Notes

Abstract

:

This review examines the role of artificial intelligence (AI) in differentiating choroidal nevus (CN) from choroidal melanoma (CM), focusing on diagnosis, classification, segmentation, and prediction of malignant transformation. A literature search was performed in PubMed, Google Scholar, and Science Direct up to December 2024. Eight studies met all the inclusion criteria, evaluating machine learning (ML) and deep learning (DL) applications for CN and CM assessment using various ophthalmic imaging modalities. Performance varied across AI models. U-Net achieved 100% sensitivity and an AUC of 0.88, while DenseNet121 reached an AUC of 0.9781, LASSO logistic regression an AUC of 0.88, RETFound (a self-supervised learning model) had an AUCROC of 0.92, and ResNet50 an accuracy of 92.65% in classification tasks. DeepLabv3 achieved Dice scores of 0.87 (melanoma) and 0.81 (nevi) in lesion-based segmentation, while nnU-Net yielded a Dice score of 0.78 and a recall of 0.77 for pigmented lesion segmentation. SAINT (XGBoost-based) demonstrated a strong predictive performance (AUC: 0.910), confirming its effectiveness in ophthalmic imaging. These results highlight the potential of AI models as effective diagnostic tools in ocular oncology. However, further research is needed to improve model generalizability, enhance dataset diversity, and facilitate clinical integration.

Keywords:

artificial intelligence; machine learning; deep learning; choroidal nevus; choroidal melanoma; ophthalmology

1. Introduction

Choroidal nevus (CN) and choroidal melanoma (CM) represent a diagnostic challenge in ocular oncology. The World Health Organization (WHO) defines CN as a benign melanocytic tumor, characterized histologically by small oval or non-aggressive spindle cells, less than 5 mm in diameter and less than 2 mm in thickness [1]. In the United States, data from the National Health and Nutrition Examination Survey indicate that the prevalence of CN is higher in Whites (5.6%), compared with Blacks (0.6%), Hispanics (2.7%), and other ethnic groups (2.1%) [2]. In contrast, CM, although rare, has an incidence of approximately six per million among Caucasians and carries a significant risk of metastasis [3]. Despite its low incidence, early and accurate differentiation between CN and CM is crucial, as delays in diagnosis can significantly impact treatment outcomes and survival rates.

Several scoring systems have been developed to aid in CN classification, including the Collaborative Ocular Melanoma Study (COMS) and the American Joint Committee on Cancer (AJCC) staging system. In addition, various clinical tools help specialists recognize and manage suspicious lesions, such as the MOLES scoring system, and TFSOM-UHHD criteria (Table 1) [4,5,6].

Uveal melanomas (UM) arise from melanocytes originating from the neural crest. These cells are typically distinguished by their spindle-shaped nuclei and pigmented cytoplasm [7]. UM represents the most prevalent form of primary eye cancer in adults, with an estimated global incidence of over 7000 cases annually [8]. The development of UM has been linked to variations in eye color, as demonstrated by two single nucleotide polymorphisms (SNPs) (rs12913832 in HERC2 and rs12203592 in IRF4) [9]. From a genetic perspective, melanoma is a multifaceted disease involving alterations in genes that regulate key signaling pathways, including:

Cell proliferation (NRAS, BRAF, NF1).
Growth and metabolism (STK11, PTEN, KIT).
Reproductive capacity (TERT).
Cell cycle regulation (CDKN2A).
Resistance to apoptosis (TP53).

These genetic alterations are critical for the pathogenesis and progression of melanoma, affecting cell growth, proliferation, and survival [10].

The diagnosis of a choroidal nevus is usually made during an ophthalmological examination. Imaging techniques such as fundus photography, optical coherence tomography (OCT), and fluorescein angiography (FA) are used to visualize the nevus and can provide detailed imaging for monitoring over time [1]. Conventional ultrasound (US) is a valuable diagnostic tool for detecting larger tumors. Recently, high-frequency ultrasound biomicroscopy (UBM) has been recognized as an effective method for detecting and monitoring small ciliary body melanomas (less than 4 mm) that might otherwise be missed by other conventional techniques [5]. Posterior UM has a strong propensity for metastasis, with a relative survival rate of 60% at 15 years after diagnosis [11]. Since distinguishing CN from CM is challenging with traditional imaging, artificial intelligence (AI) offers a valuable tool to improve diagnosis and risk assessment. By leveraging machine learning (ML) or deep learning (DL) algorithms, AI models can analyze complex imaging features that may not be visible to the human eye, helping with earlier and more accurate detection.

AI has rapidly advanced in medical imaging, particularly in ophthalmology. Early applications of AI in eye diseases included the CASNET-based glaucoma consultation program in 1976, which demonstrated the potential for ML in clinical settings [12]. Today, ML techniques, especially those employing supervised models trained on fundus photographs, hold considerable promise in distinguishing CN from melanoma and assessing malignancy risk. DL algorithms, particularly convolutional neural networks (CNNs), can identify previously unobserved features that may indicate malignant transformation surpassing conventional diagnostic methods [13,14].

CNNs, play a crucial role in medical image analysis by extracting hierarchical features from images. A typical CNN architecture consists of an input layer that processes retinal images, followed by multiple convolutional layers that apply filters to highlight relevant patterns. Pooling layers reduce data dimensionality while preserving important features, and fully connected layers generate final classification results. The structured way of learning and applying features allows the CNN to learn increasing levels of features and effectively detect and classify patterns in the images [15].

Various AI models have been applied in the studies selected for this review. DeepLabv3 is an advanced semantic image segmentation model that improves feature extraction by incorporating atrous (dilated) convolution. This technique allows the model to control the field of view of convolutional filters without increasing the number of parameters, making it efficient for detecting objects at multiple scales [16]. Residual networks (ResNet) utilize residual learning to train deep networks beyond 100 layers [17]. SAINT (Simple AI Nevus Transformation System) represents a hybrid DL approach for tabular data, integrating the strengths of transformers and contrastive learning, offering an alternative to traditional gradient boosting models such as XGBoost, CatBoost, and Light Gradient Boosting Model (LGBM). SAINT employs self-attention mechanisms and intersample attention to improve performance on structured datasets [18]. For segmentation tasks, U-Net, a fully convolutional network, has demonstrated high accuracy in biomedical image segmentation, even with limited training data [19]. nnU-Net, a further refinement, optimizes key parameters, including network architecture, training, and post-processing techniques, making it widely regarded as the gold standard for medical imaging segmentation [20]. DenseNet, a densely connected convolutional network, enhances performance through feature reuse across layers in a feedforward architecture [21]. RETFound is a self-supervised learning (SSL) foundation model designed for retinal image recognition. It is pre-trained on 1.6 million unlabeled fundus images, allowing it to learn generalizable features that can be adapted to multiple ophthalmic applications through transfer learning [22]. Additionally, LASSO (Least Absolute Shrinkage and Selection Operator) is a widely used statistical technique that performs variable selection and normalization to improve the predictive accuracy and interpretability of high-dimensional data, including genomics and ML applications [23].

Despite significant advancements, AI applications in CN and CM assessment still face challenges, including dataset limitations, model interpretability, and the need for external validation before clinical use. Addressing these barriers is essential for AI to transition from research settings to routine ophthalmological practice. This review evaluates the role of AI in accurate differentiation between CN and CM, focusing on diagnosis, classification, segmentation, and prediction of malignant transformation.

2. Materials and Methods

This research seeks to answer the following question: “How has AI been utilized for the detection, classification, and risk assessment of choroidal nevi and choroidal melanoma in ophthalmology?”. To address this question, a structured approach was employed to identify and evaluate studies that apply AI techniques in the diagnosis, segmentation, classification, and prediction of malignant transformation of CN into CM.

2.1. Search Strategy

A literature search was conducted in December 2024 using the PubMed, Google Scholar, and Science Direct databases. The following keywords were employed in the search: “artificial intelligence”, “machine learning”, “deep learning”, “choroidal melanoma”, “choroidal nevi”, “choroidal naevi”, “choroidal nevus”, and “uveal melanoma”.

2.2. Eligibility Criteria

Studies were considered eligible if they met the following inclusion criteria:

Observational studies, randomized clinical trials, and registry/database studies.
Published in English between 2018 and 2024.
Included human subjects aged 18 years or older.
Investigated the role of artificial intelligence, machine learning, or deep learning, in the detection, classification, segmentation, or prediction of malignant transformation of choroidal nevi into choroidal melanoma.
Utilized ophthalmic diagnostic tools, such as US, OCT, or fundus photography.
Reported measurable performance metrics, such as accuracy, sensitivity, specificity, Dice score, F1 score, or AUC (area under the curve).

Studies were excluded if they met any of the following exclusion criteria:

Systematic reviews, narrative reviews, scoping reviews, surveys, editorials, case reports, preprints, conference abstracts, or presentations.
Did not include human subjects.
Did not provide data specifically on choroidal nevi, choroidal melanoma, or uveal melanoma.
Relied solely on genome-based data from gene banking without incorporating ophthalmic imaging modalities.
Did not utilize artificial intelligence, machine learning, or deep learning techniques.
Did not utilize US, OCT, or fundus images.
Articles with full text unavailable.

2.3. Data Extraction

Data extraction was performed to collect relevant information from the selected studies. The extracted variables included author, year of publication, and disease type (CN, CM, or UM). The objective of the AI application was categorized into diagnosis, segmentation, classification, and prediction. For each study, details on the imaging modality used were recorded, including OCT, US, CFP (color fundus photography), UWF (ultra-wide-field), and FAF (fundus autofluorescence) images. Additional data included the number of images used for model training, the number of patients enrolled in this study, and whether XAI (explainability artificial intelligence) techniques were incorporated. The type of AI model (ML or DL) and the specific AI algorithms (e.g., U-Net, V-Net, LASSO, etc.) were documented. Finally, performance metrics used to evaluate the AI models on internal and external test sets were extracted, including accuracy, sensitivity, specificity, Dice score, F1 score, and AUC.

3. Results

3.1. Study Selection

A detailed literature search was conducted by two independent investigators (K.-E.K. and E.M.) yielding 1055 studies sourced from PubMed (n = 628), Science Direct (n = 60), and Google Scholar (n = 367). The studies underwent a two-stage exclusion process to ensure relevance and adherence to the study criteria. In the first stage, 355 manuscripts were excluded due to duplication or irrelevance. Of the remaining 700 studies, 628 were excluded after screening titles and abstracts, as they did not meet the predefined eligibility criteria. In the second stage, full-text reviews were conducted for the remaining 72 articles, resulting in the following additional exclusions: 3 studies involved experimental animals, 31 studies lacked data analysis or did not include information on CN or CM, 29 studies did not use ophthalmic imaging, and 1 study was a narrative review. After applying the established exclusion criteria, eight articles were identified that specifically addressed the use of AI in the analysis of CN and CM. The study selection process is visually represented at the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flowchart [24] to enhance transparency in the study selection (Figure 1).

3.2. Study Characteristics

The data extraction Table 2 summarizes key details from the selected studies and provides a structured overview of how AI has been utilized in the assessment of CN and CM, facilitating comparisons across different methodologies and performance metrics.

3.3. Overview of Selected Studies

Zabor et al. [25] developed a ML model for the distinction between small choroidal melanoma (SCM) and CN. The training data came from a retrospective cohort of 123 patients in the USA with small choroidal melanocytic tumors, ranging from 5.0 to 16.0 mm in basal diameter and 1.0 to 2.5 mm in height. A total of 61 patients were classified as having SCM based on either documented growth or pathological confirmation. The remaining 62 lesions were categorized as nevi due to their stable nature. A distinct dataset comprising 240 patients from an alternative clinic was employed for external validation purposes, with 11 cases of confirmed melanoma based on observed growth. A clinical diagnosis of SCM was assigned to each patient via a standard examination comprising a slit lamp and fundus examination. A comprehensive fundus drawing was created to provide a visual representation of the lesion’s full extent, which was supplemented by color fundus photographs. The baseline variables were preselected and included age at diagnosis, gender, laterality of the eye, presence of symptoms at presentation, initial optimally corrected visual acuity, distances from the optic nerve and fovea, largest basal diameter, lesion thickness, presence of subretinal fluid (SRF), orange pigment, drusen, and retinal pigment epithelium (RPE) atrophy. In model development, the LASSO logistic regression method was employed to identify the most predictive variables for classifying melanoma by reducing the coefficients of the least significant variables to zero. The most significant features identified were SRF, tumor height, proximity to the optic disc, and orange pigment. Bootstrap resampling with 10-fold cross-validation was conducted for internal validation purposes in order to assess the robustness of the model and to avoid overfitting. The generalizability of the model was tested on an external dataset, with both discrimination AUC and calibration evaluated. Consequently, a web-based calculator was created, providing a probability estimate for melanoma, which may assist with clinical decision-making. The analysis demonstrated that an optic disc distance of ≥3 mm in comparison to <3 mm, in conjunction with the presence of drusen, was associated with a reduced risk of melanoma. Conversely, male gender, increased lesion height, SRF, and orange pigmentation were identified as factors that were linked to an elevated risk of melanoma. The model demonstrated high accuracy in distinguishing melanoma from nevus, with an AUC of 0.880 in the training data and an optimism-corrected AUC of 0.849. The external validation demonstrated an AUC of 0.861, confirming the model’s capacity for robust discrimination. However, the calibration exhibited slight inaccuracy, with the model exhibiting an overestimation of melanoma risk across the prediction range [25].

Valmaggia et al. [26] evaluated the efficacy of automated image segmentation in identifying pigmented choroidal lesions (PCLs) at the pixel level. To achieve this, three distinct DL networks were employed: MD-GRU (multi-dimensional gated recurrent unit), V-Net and nnU-Net. The data were derived from 71 individuals in Switzerland, and a total of 121 OCT volumes were analyzed: 100 normal and 21 PCLs. Prior to OCT imaging, all subjects underwent a slit-lamp clinical examination and a fundus examination. The OCT data were recorded using a Triton DRI SS-OCT device, which produced volume scans with a resolution of 6 mm × 6 mm × 2.6 mm (256 B-scans, 512 × 992 pixels). Two independent human graders manually annotated OCT images to identify PCLs using the open-source FIJI imaging software (v2.1). The repeatability of the annotation was evaluated using Dice coefficients to assess consistency within and between graders. The MD-GRU, V-Net, and nnU-Net models were trained on both 2D and 3D variations and their performance was evaluated using accuracy, Dice coefficient, recall, precision, and the Hausdorff distance. The 3D nnU-Net outperformed the other models, demonstrating superior performance across key metrics. It achieved a higher recall (sensitivity) of 0.77 ± 0.22, accurately identifying approximately 77% of the lesion voxels. This exceeded the performance of the MD-GRU (0.60 ± 0.31) and V-Net (0.61 ± 0.25) models. The Dice coefficient for 3D nnU-Net was 0.78 ± 0.13, indicating a notable degree of overlap between the predicted and manually annotated PCLs, outperforming MD-GRU (0.62 ± 0.23) and V-Net (0.59 ± 0.24). In terms of the Hausdorff distance, the 3D nnU-Net exhibited the lowest average maximum Hausdorff distance (315 ± 172 μm), indicating a high degree of alignment with the manually annotated lesion boundaries in comparison to MD-GRU (1542 μm) and V-Net (2408 μm). Additionally, the 3D model demonstrated a superior performance to its 2D counterpart in nearly all the metrics, providing more consistent predictions and reducing the gaps observed in the 2D model. This research presents a notable achievement in the automated segmentation of PCLs using OCT data, highlighting the potential of DL in ophthalmic imaging [26].

Ma et al. [27] developed and evaluated a ML model for automated segmentation of UM, CN, and congenital hypertrophy of the retinal pigmented epithelium (CHRPE) using UWF fundus photography (Optos). A retrospective chart review was conducted at a tertiary academic medical center, including 529 lesion images from 479 patients, alongside 30 healthy controls. The dataset was divided into training (396 images) and testing (90 images) sets, with images based on superior quality and manually delineated lesion boundaries. The segmentation model was built using the DeepLabv3 architecture, employing pre-processing techniques such as image resizing and data augmentation. The images were resized to 500 × 500 pixels with zero padding. Furthermore, data augmentation techniques, including horizontal flipping and slight rotation, were employed to enhance the dataset. The model was trained in over 400 epochs, with the final selection occurring at epoch 263 based on the lowest validation loss. The model’s performance was assessed using the Dice coefficient, sensitivity, specificity, and confusion matrices, with evaluation conducted through lesion-based segmentation and image-based classification. The model achieved Dice coefficients of 0.86 for UM, 0.81 for nevi, and 0.85 for CHRPE, indicating high image-based segmentation accuracy. The sensitivity for lesion detection per image was 1.00 for UM, 0.90 for nevi, and 0.87 for CHRPE, while the specificity for images without lesions was 0.93 [27].

Dadzie et al. [28] examined the effect of different color fusion strategies on a DL model for the automated classification of UM and CN. The dataset was retrieved from the University of Illinois at Chicago Eye Clinic and included 798 UWF retinal images from 438 patients, of whom 157 had UM, 281 had CN, and 360 had control images. Images from the eye that did not show any lesions were used as controls. The study used CNN architecture based on DenseNet121 with pre-trained weights derived from the ImageNet dataset for image classification. The model was trained by augmenting the data with horizontal/vertical flips and rotations to improve its generalization. The collected data were classified in two steps; in step 1, lesions were distinguished from normal images, and in step 2, the blebs were classified as either UM or CN. The study used color fusion techniques with monochrome models, where the individual color channels (red, green, blue) were evaluated independently for their performance. Specifically, multi-color fusion strategies can be categorized as follows: early fusion, in which color channel combination is performed at the input level; intermediate fusion, in which each color channel is processed separately before merging the extracted features for classification; and finally, late fusion, in which the final predictions from each color channel are combined separately. The models were trained and evaluated using 5-fold cross-validation, with 80% of the data used for training and 20% for validation. Model performance was evaluated in terms of accuracy, specificity, sensitivity, F1 score, and AUC. The results show that all fusion strategies outperformed single-channel approaches. Among them, intermediate fusion showed the highest efficiency, achieving 83.31% accuracy, 0.8523 F1 score, and 0.8999 AUC in step 1. In step 2, its performance improved further, achieving an accuracy of 92.24%, an F1-score of 0.8788, and an AUC of 0.9781. Similarly, for multi-class classification, the intermediate fusion strategy proved to be the most effective, yielding an accuracy of 89.31%, an F1 score of 0.8471, and an AUC of 0.9467. Intermediate fusion had the fewest misclassifications between UM and CN. Among the single-color channels, the red channel showed the highest accuracy (88.12%), F1 score (0.8218), and AUC (0.9069) at step 2 among all the channels tested. The red channel provided the most detailed image of the retinal pigment epithelium and choroid, where UM and CN are usually located. In comparison, the green and blue channels showed inferior performance, indicating the presence of less useful information for the detection of UM and CN [28].

Hoffmann et al.’s study [29] was conducted at the Charité University Hospital in Berlin and included subjects with CN, untreated CM, and irradiated CM. The study collected a total of 762 CFPs of choroidal lesions during the period from 2010 to 2023. Imaging methods used included SD-OCT (spectral-domain optical coherence tomography), FAF, US, and biomicroscopy. The choice of imaging device was based on the location of the choroidal lesion, with CFPs obtained using either the Optos or Clarus system. The DL HyperTumorEyeNet model, a CNN based on ResNet50, was run with pre-trained weights using HSA KIT version 1.5.13.10 to classify images into three categories (nevi, untreated melanoma, and irradiated melanoma) and two categories (nevi vs. untreated melanoma). Training and validation of the models were conducted using 90% and 10% of the dataset, respectively, with additional testing performed on an independent test set. Lesion classification was based on the MOLES score, and the models were developed using CNNs, specifically HyperTumorEyeNet based on ResNet50, with comparisons to EfficientNet B4, Vision Transformer, and ConvNextV2. ResNet50 achieved the highest accuracy of 92.65%, outperforming other models. The superior performance of ResNet50 is likely attributed to its pre-trained weights and the relatively small dataset, which makes composite methods less effective. The binary classification (nevi vs. melanoma) achieved an accuracy of 90.9%, with a misclassification rate of 12.1% for malignant lesions. The multi-class classification performed with Optos and Clarus achieved a higher accuracy (84.8%) and AUC (0.96) compared to the multi-class classification performed with Clarus only. Classification accuracy was highest for irradiated CM (94.0%). The average AUC confirmed the strong performance of the model, with the highest accuracy (95.8%) being achieved in binary classification using Clarus images. The MOLES score outperformed the DL classification, incorrectly classifying 12.1% of melanomas as benign. The findings suggest that DL models have the potential to serve as a promising tool for the screening of choroidal tumors, although improvements are needed for borderline cases [29].

Tailor et al. [30] developed and validated ML models to predict the transformation of CN to melanoma using multimodal imaging. A retrospective multicenter analysis was used, including 2870 nevi cases from Wills Eye Hospital and Mayo Clinic. Various imaging techniques were used, including fundus photography, FAF, SD-OCT, and B-scan US. Fundus images were acquired using a Zeiss camera at Wills Eye Hospital and a Topcon camera at Mayo Clinic. Several tree-based ML algorithms were tested, including XGBoost, LGBM, Random Forest, and Extra Tree models, with SAINTS using XGBoost as the kernel. The Wills cohort was split into an 80% training set and 20% test set, with 5-fold cross-validation, while the Mayo cohort served as an external validation set. The metrics used to assess model performance were the area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). The ML models XGBoost (SAINTS), LGBM, Random Forest, and Extra Tree were optimized for AUROC. The SAINTS model achieved the highest performance, with an AUROC of 0.864 in the test cohort and 0.931 in external validation. Other models showed slightly lower AUROC values: LGBM (0.831/0.815), Random Forest (0.812/0.866), and Extra Tree (0.826/0.915) in the test and external validation cohorts, respectively. Similarly, SAINTS outperformed the other models in AUPRC, achieving 0.244 in the test cohort and 0.533 in external validation. The AUPRC values for the other models were LGBM (0.171/0.277), Random Forest (0.122/0.418), and Extra Tree (0.119/0.511) in the test and external validation cohorts, respectively. Key prognostic factors identified by SHAP (Shapley additive explanations) included tumor thickness, largest basal diameter, tumor shape, distance from the optic nerve, and amount of subretinal fluid. These findings highlight the accuracy and generalizability of ML in predicting choroidal nevus transformation, potentially aiding in clinical decision-making [30].

Sabazade and colleagues [31] developed and validated a DL algorithm for the differentiation of SCM from nevi based on the analysis of wide- and standard-range fundus photographs. The bottom images were sourced from several institutions, with a particular focus on those pertaining to SCM. A total of 802 images from 688 patients at St. Erik Eye Hospital in Stockholm were included in the study, comprising both wide-field (200°) (WF-FP) and standard-field (45°) (SF-FP) retinal photographs. The images were collected retrospectively, with a particular focus on small melanotic choroidal lesions and were obtained from 2010 onwards, with diagnoses confirmed by subspecialist ophthalmic oncologists. In the case of lesions diagnosed as nevi, a minimum of five years of follow-up was required to confirm their benign nature. Images of insufficient quality or those with partially masked lesions were excluded, resulting in 802 valid images. The dataset was divided into three groups: a training group (495 images), a validation group (168 images), and a test group for internal and external cohorts (86 and 53 images), respectively. During the image preprocessing stage, fundus images were resized to a resolution of 1024 × 1536 pixels, normalized for brightness, and scaled to a pixel range of (0, 1). A U-Net architectural model comprising three downsampling layers and eight basic filters, with the ReLU (rectified linear unit) as the activation function, was employed for the analysis of nevi and melanomas. The model demonstrated enhanced interpretability and a heightened focus on specific regions of the lesion. Two distinct U-Net networks were incorporated in Expligence’s Explipipe training pipeline: one for lesion segmentation and another for melanoma classification. The first model was designed to identify the damaged region, utilizing the categorical cross-entropy as the loss function and evaluation metric. At the output of the model, a center point was calculated for the center of a bounded box around the lesion (488 × 488 pixels), which then served as input to the second model. The second U-Net model was employed to categorize lesions as either melanoma or non-melanoma, utilizing categorical cross-entropy as the loss function and evaluating the performance using the AUC metric. For AUC calculation, the model used the pixel with the highest melanoma probability within the segmentation region. The model demonstrated optimal performance with an AUC score of 83.4% after 1189 epochs of training. The incorporation of a random forest classifier served to enhance the model’s output by assigning a weight of 10 to melanoma images, thereby focusing on enhancing specificity and reducing false negatives. This approach, which considered all probability data (not just the highest probability pixel), resulted in an elevated AUC score of 88.5% in the validation set. Moreover, an analysis of the external test cohort revealed that the model attained an AUC value of 0.88, which is equivalent to the result obtained in the internal test cohort. The disparities between the internal and external test results were evident in the analysis of sensitivity and specificity. To validate the algorithm, the main performance measures were sensitivity and specificity, with the AUC additionally used to evaluate the optimal network repetition. Additionally, the algorithm’s performance was evaluated in comparison to that of 12 ophthalmologists (resident ophthalmologists, consultant ophthalmologists, and ocular oncologists) and traditional classification systems, including MOLES and TFSOM-UHHD scores. The Kruskal–Wallis and Mann–Whitney U tests were employed to ascertain any statistically significant discrepancies between the human and AI classifications. The DeLong test was used to compare the AUC values between the algorithm systems and the MOLES and TFSOM-UHHD scores. The algorithm demonstrated 100% sensitivity in the detection of melanomas, indicating that it accurately identified all instances of small melanomas without any false negatives. The group of 12 ophthalmologists demonstrated a sensitivity of 85% for the three resident ophthalmologists, 83% for the six consultant ophthalmologists, and 98% for the three ocular oncologists, which was the best among the experts and very close to the algorithm. In general, the algorithm exhibited superior sensitivity (p = 0.006) in the assessment of melanomas when compared with that of human experts. Nevertheless, its specificity was 74%, indicating that 74% of moles were correctly identified and correctly classified as non-melanoma. In comparison to human experts, resident ophthalmologists demonstrated the lowest specificity at 63%, followed by ocular oncologists at 70%. However, the group of consultant ophthalmologists showed a higher level of specialization at 78%, which was higher than that of the algorithm. In comparison with established methodologies, the algorithm demonstrated an AUC of 0.88, outperforming traditional risk scoring systems such as MOLES and TFSOM-UHHD, which incorporate additional imaging modalities (OCT, US). Statistically significant improvements (p < 0.001) reinforce the model’s reliability, even when using fundus images alone. The algorithm demonstrated a markedly elevated sensitivity compared to that of human evaluators, particularly ophthalmology residents and consultants. However, its specificity exhibited a level of concordance with the outcomes achieved by specialists across experience levels [31].

Jackson et al. [32] investigated a self-tracking DL model, RETFound, for differentiating uveal (choroidal) melanoma from nevus. The study involved data from the Liverpool Ocular Oncology Centre, which comprised fundus images of 4255 patients using an Optos UWF camera. The study defined several criteria for the exclusion of poor-quality images, including images that were blurred, those with artifacts, images with significant occlusion of the eye, and images of patients with specific ocular conditions (e.g., vitreous hemorrhage, dense cataracts, tumors more than 50% out of view, etc.). The remaining 27181 images were classified as 18510 UM images, 8671 nevus images, and 1192 healthy eye images with a clinical diagnosis. Diagnosis was made using Optos images, autofluorescence, OCT scans, and US. The RETFound DL model, which was pretrained on more than 900.000 fundus images for various conditions, was used. The model was first fine-tuned for binary classification (UM vs. nevi) and then for tertiary classification (UM, nevi, and healthy eyes). The data were split into training (70%), test (20%), and validation (10%) sets. The images were resized to 224 × 224 pixels and converted to grayscale. The RETFound model used two stages: self-supervised pre-training and refinement for disease classification. The training process was performed on an NVIDIA GTX 4090 GPU with a batch size of 16 and a learning rate of 0.0005. The effectiveness of the model was evaluated through metrics such as accuracy, sensitivity, specificity, and F1 score. For binary classification, the model achieved an accuracy of 0.83, a specificity of 0.87, a sensitivity of 0.79, a F1 score of 0.84, and an AUCROC (area under the receiver operating characteristic curve) of 0.90, demonstrating a strong differentiation between UM and nevi. In multi classification, the model achieved a mean accuracy of 0.82, a mean sensitivity of 0.73, a mean specificity of 0.85, a mean F1 score of 0.72 and AUCROC of 0.92. The multi-classification model showed a similar success rate for identifying UM and nevi as the binary classification, but the model often misclassified healthy eyes and frequently confused nevi with healthy eyes. The study concludes that RETFound is a promising tool for automated differentiation of UM and nevi, but further validation with external datasets is needed to refine its accuracy, particularly in differentiating small nevi from healthy eyes [32].

4. Discussion

This literature review explores the role of AI in accurate differentiation of CN and CM, focusing on diagnosis, classification, segmentation, and the prediction of malignant transformation. Eight studies were selected, evaluating AI models on accuracy, sensitivity, specificity, Dice score, F1 score, and AUC. The variation of AUC values between studies ranged from 0.8800 to 0.9781, with the superior performance being attributed to the DenseNet 121 model (0.9781) in intermediate fusion for classification of UM and nevi. The studies by Tailor [30] and Jackson [32] evaluated AUCROC, achieving high values of 0.931 and 0.920, respectively. The heterogeneity of the studies is evident in sample size, patient numbers, imaging methods used to visualize choroidal lesions, and architectural models employed. As presented in Table 2, the models demonstrated an accuracy range of 83 to 92.65%. The study by Hoffmann [29] yielded the highest accuracy (92.65%) identifying ResNet 50 as the top-performing model among those evaluated. Nevertheless, the models by Dadzie [28] and Tailor [30] attained remarkably high accuracy of 92.24% and 91%, respectively. The U-net and DeepLabv3 algorithms demonstrated high sensitivity for UM segmentation in fundus images, achieving a 100% success rate for lesion detection. Meanwhile, nnU-Net, used for OCT-based segmentation, outperformed MD-GRU and V-Net, achieving the highest recall (0.77) and Dice coefficient (0.78). Another performance metric that has been utilized in certain research studies is the F1 score [28,30,32] with the Dadzie [28] study attaining the maximum value of 0.8788. The F1 score is a crucial evaluation metric in ML, particularly for imbalanced datasets where accuracy alone may provide misleading results. By combining precision and recall through their harmonic mean, the F1 score ensures a balanced assessment of a model’s predictive performance. Since precision and recall exist in a trade-off, the F1 score provides a single, comprehensive metric that encourages optimization of both.

Notably, the study by Sabazate et al. [31] was the only one to compare the performance of the AI model with 12 ophthalmologists, who classified small choroidal melanomas and choroidal nevi based on the MOLES and TFSOM systems. The AI model showed superior sensitivity (p = 0.006) in melanoma assessment compared to human experts, particularly outperforming resident and consultant ophthalmologists, but it did not significantly surpass ocular oncologists in specificity (p > 0.99). As part of ongoing efforts to enhance AI capabilities, Tailor’s study was the only study to employ XAI techniques, using SHAP values to analyze the most significant features in predicting mole progression to melanoma. SHAP values are a technique used in ML to explain model predictions by quantifying the contribution of each feature to the final outcome. By calculating the effect of each feature across all possible feature combinations, SHAP values provide a detailed and consistent measure of the importance of features for any given prediction. This approach helps to understand complex models by clarifying which factors drive predictions, thereby improving interpretability, transparency, and trust in AI systems; an essential goal of XAI, particularly in critical healthcare applications.

Despite the promising results demonstrated by AI models, several limitations and challenges must be addressed before full clinical integration. Most studies relied on limited datasets, potentially compromising model robustness. A notable exception was Jackson’s study, which used a large dataset of 27181 images from 4255 patients. However, even with a larger dataset, the risk of overfitting remains when compared to smaller studies. Additionally, external validation datasets were imbalanced, a factor that may impede the generalizability of the model. Most studies were conducted on a dataset from a single institution, introducing potential bias due to patient demographics, imaging protocols, and equipment variations. Furthermore, the datasets consisted of predominantly white patients, which limits the ability to generalize across diverse populations. Another limitation is the lack of histopathological or genetic confirmation, raising concerns about diagnosis accuracy. While all studies employed specific imaging device models for model training, cross-device performance was not evaluated, potentially limiting adaptability to different imaging systems. The majority of studies lacked longitudinal data, with the exception of Tailor’s study, as their primary focus was on initial diagnosis rather than long-term progression of lesions. Despite internal and external validation, models may still be affected by biases in training data. Furthermore, poor quality images (e.g., blurry, overexposed, or dark) were excluded, which may have biased the models toward high quality images. Consequently, their performance on lower quality real-world images remains uncertain, posing a significant risk. Another key limitation is the human annotation of choroidal lesions in each study, a process that introduces potential variability and impacts model training. Future studies should include multiple independent raters to improve annotation reliability.

Additionally, model calibration remains a significant challenge. The Zabor model [25] exhibited a tendency to overestimate the risk of melanoma in the validation dataset, suggesting potential issues with model calibration. Furthermore, the examination of a limited set of clinical characteristics in this study may have resulted in the exclusion of other potential prognostic factors. Measurements such as tumor size and the presence of pigment were based on clinical examination, which may be subject to interobserver variability. In the Valmaggia study [26], three neural networks were tested, yet their hyperparameters remained unoptimized. This suggests potential areas for enhancement, indicating a possibility for further refinement. This limitation was identified in all the studies reviewed. Furthermore, a paucity of research was identified in the field of 2D versus 3D segmentation, a gap explored in studies such as Valmaggia, Ma, and Sabazade. The Ma study [27] also faced issues with image quality, including blurry images, lesions located at image boundaries, and small lesions overlapping the fovea. These factors adversely affected the model’s accuracy and misclassification rates. The study was based on 2D fundus images, which means that lesion thickness, which is an important diagnostic factor, was not considered. Additionally, the Ma model was designed to detect lesions of a minimum size of 24 pixels, likely excluding very small lesions that may be of clinical significance.

Similarly, in the Dadzie model [28], no image preprocessing techniques, such as contrast enhancement or cropping, were applied, which could have improved the model’s performance. In the Hoffmann study [29], some borderline cases, especially those with moderate thickness and internal reflectivity, were misclassified by the DL model. A significant limitation of the Tailor study [30] was the inclusion of only the largest nevus per eye in patients with multiple nevi, resulting in the loss of crucial data. In the study, there was a strong concern about the risk of false negatives and false positives, as the failure to diagnose a melanoma could have serious consequences, while the erroneous diagnosis of a melanoma could lead to unnecessary interventions. The Sabazate study [31] highlighted another important consideration: the model’s accuracy depends on the clinical expertise of the ophthalmologists who initially diagnosed the lesions. Moreover, lesions classified as nevi may still have undetected malignant potential. The study used segmentation and classification techniques rather than an end-to-end DL model that might have enhanced performance. Lastly, the Jackson model [32] misclassified 11% of UM cases as moles, posing significant clinical implications. It also over diagnosed moles as UM, increasing the risk of unnecessary referrals. The model attempted to differentiate moles from healthy eyes; however, the limited number of healthy eye images and the small size of the moles contributed to the reduced accuracy of the model. Similar to Zabor, Valmaggia, Ma, and Tailor, no enhancement techniques were applied to the dataset. Finally, no study has validated the AI models on larger multicenter datasets, which is crucial for confirming their reliability and clinical utility.

Despite these limitations, the potential of AI models to contribute to the early detection of melanoma, the reduction in physician workload, and the standardization of diagnostic accuracy is significant. However, it is essential to emphasize that AI should serve as a complement to, rather than a replacement for clinical judgment. Physicians should play an active role in AI training, validation, and real-world implementation to ensure confidence in its applicability in clinical settings. Nevertheless, it is imperative to address several challenges before these models can be fully integrated into clinical practice. A significant number of studies were based on single-center data, thereby limiting the generalizability of their results, and raising concerns about potential bias in AI predictions. The conduction of real-world validation trials is therefore essential in order to confirm AI performance beyond controlled research settings. Future studies should incorporate diverse, multicenter datasets to improve robustness and generalizability. Furthermore, most studies relied on retrospective data; although useful for model development, this does not guarantee real-world applicability. Prospective trials in diagnostic centers specializing in the management of choroidal lesions are necessary to evaluate AI performance in routine clinical practice.

Another key future direction is the integration of imaging data with each patient’s genetic profile to enhance diagnostic accuracy. Genetic databases of choroidal melanomas have already been used in studies aiming to personalize treatment and identify risk factors for malignant transformation through various algorithmic models. AI-driven multi-omics fusion networks could analyze patterns across imaging and genetic data, improving early detection, risk prediction, and personalized treatment strategies, paving the way for precision medicine in ocular oncology.

Additionally, future research should focus on expanding data diversity, incorporating multimodal imaging (e.g., US, OCT, and UWF), and utilizing explainability tools to enhance transparency and improve interpretability. Additionally, advanced DL models, including Vision Transformers and generative adversarial networks (GANs), have shown promise in medical imaging applications, particularly for rare diseases where data availability is a challenge. Exploring these architectures could further improve melanoma detection and risk assessment, enhancing the robustness and clinical applicability of AI-driven models.

Beyond algorithmic improvements, AI’s integration into clinical workflows must address practical implementation challenges, such as cost, interoperability with existing systems, clinician acceptance, and training requirements. While AI-based tools can be valuable for ophthalmologists, their practical deployment remains unexplored. Understanding how these models can be widely adopted, from automated screening to decision-support systems, is essential for their successful implementation. AI-driven solutions should not only improve diagnostic accuracy but also be accessible, cost-effective, and seamlessly integrated into clinical practice.

To ensure the successful integration of these AI-driven approaches into clinical practice, regulatory and ethical considerations must also be addressed. The recent EU AI Act regulations introduced new compliance requirements for AI systems in healthcare, emphasizing transparency, accountability, and patient safety [33]. Another key consideration is patient data privacy and security. AI algorithms rely on large datasets, so strict adherence to data protection regulations (e.g., GDPR) is necessary to prevent unauthorized access and misuse of patient information [34].

5. Conclusions

AI has demonstrated significant potential in the early diagnosis of choroidal melanoma and its differentiation from choroidal nevus. AI models, particularly CNNs, have shown promising results in classification, segmentation, and risk assessment. AI models have achieved high accuracy, sensitivity, and AUC values, while AI-driven segmentation techniques have been highly effective in detecting choroidal lesions. These advancements highlight AI’s capability to enhance diagnostic accuracy and support clinical decision-making. Despite these promising results, challenges remain before AI can be fully integrated into clinical practice. Many studies rely on small, single-center datasets, limiting generalizability, while the lack of longitudinal data restricts AI’s ability to track disease progression. Additionally, the use of a single imaging modality, the absence of genetic confirmation, and the lack of external validation raise concerns about model robustness. Furthermore, the use of explainability techniques remains extremely limited, making AI-driven decisions less transparent.

Future research should address these limitations through larger (>1000 samples) and multicenter datasets, integrating multimodal imaging (e.g., US, OCT, UWF), incorporating genetic data, and leveraging advanced architectures such as Vision Transformers or GANs. Overcoming regulatory, ethical, and implementation challenges—such as cost, interoperability, and clinician training—will be critical for real-world adoption. While AI is a transformative tool, it should complement, rather than replace, clinical expertise to improve diagnostic accuracy and patient outcomes.

Author Contributions

Conceptualization, K.-E.K. and E.M.; methodology, K.-E.K. and E.M.; validation, K.-E.K., E.M. and E.P.; investigation, K.-E.K., E.M. and P.P.; data curation, K.-E.K. and E.M.; writing—original draft preparation, K.-E.K. and E.M.; writing—review and editing, K.-E.K. and E.M.; visualization, K.-E.K. and E.M.; supervision, E.P. and I.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial intelligence
AJCC	American joint committee on cancer
AUC	Area under the curve
AUROC/AUCROC	Area under the receiver operating characteristic curve
AUPRC	Area under the precision-recall curve
CFP	Color fundus photography
CHRPE	Congenital hypertrophy of the retinal pigmented epithelium
CN	Choroidal nevus
CNN	Convolutional neural network
CM	Choroidal melanoma
COMS	Collaborative ocular melanoma study
DenseNet	Densely connected convolutional network
DL	Deep learning
EU	European Union
FA	Fluorescein angiography
FAF	Fundus autofluorescence
GAN	Generative adversarial network
GDPR	General data protection regulation
LASSO	Least absolute shrinkage and selection operator
LGBM	Light Gradient Boosting Model
MD-GRU	Multi-dimensional gated recurrent unit
ML	Machine learning
NR	Not reported
OCT	Optical coherence tomography
PCL	Pigmented choroidal lesion
PRISMA	Preferred Reporting Items for Systematic Reviews and Meta-Analyses
ReLU	Rectified linear unit
ResNet	Residual network
RPE	Retinal pigment epithelium
SAINTS	Simple AI Nevus Transformation System
SCM	Small choroidal melanoma
SD-OCT	Spectral-domain optical coherence tomography
SF-FP	Standard-field fundus photography
SHAP	Shapley additive explanations
SNP	Single nucleotide polymorphism
SRF	Subretinal fluid
SSL	Self-supervised learning
UBM	Ultrasound biomicroscopy
UM	Uveal melanoma
US	Ultrasound
UWF	Ultra-wide-field
WF-FP	Wide-field fundus photography
WHO	World Health Organization
XAI	Explainable artificial intelligence
XGBoost	eXtreme Gradient Boosting

References

Gelmi, M.C.; Jager, M.J. Uveal melanoma: Current evidence on prognosis, treatment and potential developments. Asia-Pac. J. Ophthalmol. 2024, 13, 100060. [Google Scholar] [CrossRef] [PubMed]
Brouwer, N.J.; Marinkovic, M.; Bleeker, J.C.; El Filali, M.; Stefansson, E.; Luyten GP, M.; Jager, M.J. Retinal Oximetry is Altered in Eyes with Choroidal Melanoma but not in Eyes with Choroidal Nevi. Retina 2020, 40, 2207–2215. [Google Scholar] [CrossRef] [PubMed]
Marous, C.L.; Shields, C.L.; Yu, M.D.; Dalvin, L.A.; Ancona-Lezama, D.; Shields, J.A. Malignant transformation of choroidal nevus according to race in 3334 consecutive patients. Indian J. Ophthalmol. 2019, 67, 2035–2042. [Google Scholar] [CrossRef] [PubMed]
Singh, A.D.; Grossniklaus, H.E. What’s in a Name? Large Choroidal Nevus, Small Choroidal Melanoma, or Indeterminate Melanocytic Tumor. Ocul. Oncol. Pathol. 2021, 7, 235–238. [Google Scholar] [CrossRef]
Kaliki, S.; Shields, C.L. Uveal melanoma: Relatively rare but deadly cancer. Eye 2017, 31, 241–257. [Google Scholar] [CrossRef]
Cheung, A.; Scott, I.U.; Murray, T.G.; Shields, G.L. Distinguishing a choroidal nevus from a choroidal melanoma. EyeNet Magazine, 1 February 2012. [Google Scholar]
Stålhammar, G.; Gill, V.T. Digital morphometry and cluster analysis identifies four types of melanocyte during uveal melanoma progression. Commun. Med. 2023, 3, 60. [Google Scholar] [CrossRef]
Stålhammar, G.; Herrspiegel, C. Long-term relative survival in uveal melanoma: A systematic review and meta-analysis. Commun. Med. 2022, 2, 18. [Google Scholar] [CrossRef]
Santos-Buitrago, B.; Santos-Garcia, G.; Hernández-Galilea, E. Artificial intelligence for modeling uveal melanoma. Artif. Intell. Cancer 2020, 1, 51–65. [Google Scholar] [CrossRef]
Lee, C.H.; Lee, H.; Lee, S.M.; Choi, E.Y.; Lee, J.; Kim, M. Clinical and Multimodal Imaging Features of Choroidal Nevi in the Korean Population. J. Clin. Med. 2022, 11, 6666. [Google Scholar] [CrossRef]
Jager, M.J.; Shields, C.L.; Cebulla, C.M.; Abdel-Rahman, M.H.; Grossniklaus, H.E.; Stern, M.H.; Carvajal, R.D.; Belfort, R.N.; Jia, R.; Shields, J.A.; et al. Uveal melanoma. Nat. Rev. Dis. Primers 2020, 6, 24. [Google Scholar] [CrossRef]
Badillo, S.; Banfai, B.; Birzele, F.; Davydov, I.I.; Hutchinson, L.; Kam-Thong, T.; Siebourg-Polster, J.; Steiert, B.; Zhang, J.D. An Introduction to Machine Learning. Clin. Pharmacol. Ther. 2020, 107, 871–885. [Google Scholar] [CrossRef] [PubMed]
Honavar, S.G. Artificial intelligence in ophthalmology—Machines think! Indian J. Ophthalmol. 2022, 70, 1075–1079. [Google Scholar] [CrossRef] [PubMed]
Chawla, B.; Ganesh, K.B. Applications of Artificial Intelligence in Ocular Oncology. Adv. Ophthalmol. Optom. 2023, 8, 111–122. [Google Scholar] [CrossRef]
Lundervold, A.S.; Lundervold, A. An overview of deep learning in medical imaging focusing on MRI. Z. Med. Phys. 2019, 29, 102–127. [Google Scholar] [CrossRef]
Chen, L.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Somepalli, G.; Goldblum, M.; Schwarzschild, A.; Bruss, C.B.; Goldstein, T. SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training. arXiv 2021, arXiv:2106.01342. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar] [CrossRef]
Isensee, F.; Wald, T.; Ulrich, C.; Baumgartner, M.; Roy, S.; Maier-Hein, K.H.; Jaeger, P.F. nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image Segmentation. arXiv 2024, arXiv:2404.09556. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar] [CrossRef]
Zhang, J.; Lin, S.; Cheng, T.; Xu, Y.; Lu, L.; He, J.; Yu, T.; Peng, Y.; Zhang, Y.; Zou, H.; et al. RETFound-enhanced community-based fundus disease screening: Real-world evidence and decision curve analysis. npj Digit. Med. 2024, 7, 108. [Google Scholar] [CrossRef]
Jiang, Y.; He, Y.; Zhang, H. Variable Selection with Prior Information for Generalized Linear Models via the Prior LASSO Method. J. Am. Stat. Assoc. 2016, 111, 355–376. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef] [PubMed]
Zabor, E.C.; Raval, V.; Luo, S.; Pelayes, D.E.; Singh, A.D. A Prediction Model to Discriminate Small Choroidal Melanoma from Choroidal Nevus. Ocul. Oncol. Pathol. 2022, 8, 71–78. [Google Scholar] [CrossRef] [PubMed]
Valmaggia, P.; Friedli, P.; Hörmann, B.; Kaiser, P.; Scholl HP, N.; Cattin, P.C.; Sandkühler, R.; Maloca, P.M. Feasibility of Automated Segmentation of Pigmented Choroidal Lesions in OCT Data with Deep Learning. Transl. Vis. Sci. Technol. 2022, 11, 25. [Google Scholar] [CrossRef] [PubMed]
Ma, J.; Iddir, S.P.; Ganesh, S.; Yi, D.; Heiferman, M.J. Automated segmentation for early detection of uveal melanoma. Can. J. Ophthalmol. 2024, 59, e784–e791. [Google Scholar] [CrossRef]
Dadzie, A.K.; Iddir, S.P.; Abtahi, M.; Ebrahimi, B.; Le, D.; Ganesh, S.; Son, T.; Heiferman, M.J.; Yao, X. Colour fusion effect on deep learning classification of uveal melanoma. Eye 2024, 38, 2781–2787. [Google Scholar] [CrossRef]
Hoffmann, L.; Runkel, C.B.; Künzel, S.; Kabiri, P.; Rübsam, A.; Bonaventura, T.; Marquardt, P.; Haas, V.; Biniaminov, N.; Biniaminov, S.; et al. Using Deep Learning to Distinguish Highly Malignant Uveal Melanoma from Benign Choroidal Nevi. J. Clin. Med. 2024, 13, 4141. [Google Scholar] [CrossRef]
Tailor, P.D.; Kopinski, P.K.; D’Souza, H.S.; Leske, D.A.; Olsen, T.W.; Shields, C.L.; Shields, J.A.; Dalvin, L.A. Predicting Choroidal Nevus Transformation to Melanoma Using Machine Learning. Ophthalmol. Sci. 2024, 5, 100584. [Google Scholar] [CrossRef]
Sabazade, S.; Lumia Michalski, M.A.; Bartoszek, J.; Fili, M.; Holmström, M.; Stålhammar, G. Development and Validation of a Deep Learning Algorithm for Differentiation of Choroidal Nevi from Small Melanoma in Fundus Photographs. Ophthalmol. Sci. 2024, 5, 100613. [Google Scholar] [CrossRef]
Jackson, M.; Kalirai, H.; Hussain, R.N.; Heimann, H.; Zheng, Y.; Coupland, S.E. Differentiating Choroidal Melanomas and Nevi Using a Self-Supervised Deep Learning Model Applied to Clinical Fundoscopy Images. Ophthalmol. Sci. 2024, 5, 100647. [Google Scholar] [CrossRef]
European Parliament. EU AI Act: First Regulation on Artificial Intelligence (n.d.); European Parliament: Strasbourg, France, 2023; Available online: https://www.europarl.europa.eu/topics/en/article/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence (accessed on 25 February 2025).
European Union General Data Protection Regulation (GDPR). Available online: https://gdpr.eu/ (accessed on 25 February 2025).

Figure 1. PRISMA flowchart.

Table 1. Clinical Tools—Scoring Systems.

MOLES	TFSOM-UHHD
M: Mushroom shape	T: Thickness > 2 mm
O: Orange pigment	F: Fluid (subretinal)
L: Large size	S: Symptoms
E: Enlargement	O: Orange pigment
S: Subretinal fluid	M: Margin ≤ 3 mm to disc
	UH: Ultrasonographic hollowness
	H: Halo absence
	D: Drusen absence

Table 2. Descriptive overview of studies included in the literature review.

Author (Year)	Disease Type	Objective	Imaging Modality	No of Images	No of Patients	XAI	AI Type	AI Algorithm	Performance Metrics
Zabor (2021) [25]	Small choroidal melanoma, Choroidal nevi	Diagnosis, Prediction, Risk factor identification	CFP US	NR	123	NR	ML	LASSO logistic regression	Training data: AUC: 0.880, Optimism-Corrected AUC: 0.849 External data: AUC: 0.861
Valmaggia (2022) [26]	Pigmented choroidal lesions	Segmentation	OCT	121	71	NR	DL	MD-GRU, V-Net, nnU-Net	MD-GRU (3D): Recall: 0.60 ± 0.31, Dice: 0.62 ± 0.23 V-Net (3D): Recall: 0.61 ± 0.25, Dice: 0.59 ± 0.24 nnU-Net (3D): Recall: 0.77 ± 0.22, Dice: 0.78 ± 0.13
Ma (2024) [27]	Uveal melanoma, Choroidal nevi, CHRPE	Segmentation	UWF	516	479	NR	DL	DeepLabv3	Lesion-based segmentation: Dice (UM): 0.87, Dice (CN): 0.81, Dice (CHRPE): 0.85 Image-based segmentation: Dice (UM): 0.86, Dice (CN): 0.81, Dice (CHRPE): 0.85 Lesion detection per image: Sensitivity (UM): 1.00 Sensitivity (CN): 0.90 Sensitivity (CHRPE): 0.87 Detection without lesions per image: Sensitivity (UM): 0.77 Sensitivity (CN): 0.67 Sensitivity (CHRPE): 0.77 Specificity: 0.93
Dadzie (2024) [28]	Uveal melanoma, Choroidal nevi	Classification	UWF	798	438	NR	DL	DenseNet121	Best models for 2-step pipeline Red only (Control vs. Lesion): Accuracy: 80.45%, F1 Score: 0.7984, AUC: 0.8548 Red only (UM vs. nevi): Accuracy: 88.12%, F1 Score: 0.8218 AUC: 0.9069 Intermediate fusion (control vs. lesion): Accuracy: 83.31%, F1 Score: 0.8523, AUC: 0.8999 Intermediate fusion (UM vs. nevi): Accuracy: 92.24%, F1 Score: 0.8788, AUC: 0.9781 Best models for multi-class classification: Red only: Accuracy: 83.03%, F1 Score: 0.7364, AUC: 0.8605 Intermediate fusion: Accuracy: 89.31%, F1 Score: 0.8471, AUC: 0.9467
Hoffmann (2024) [29]	Choroidal nevi, Untreated choroidal melanoma, Irradiated choroidal melanoma	Diagnosis	CFP	762	NR	NR	DL	ResNet50, EfficientNet B4, Vision transformer (SAM weights), ConvNext Base	Multi-class classification ResNet50: Accuracy: 92.65% EfficientNet B4: Accuracy: 86.67% Vision transformer (SAM weights): Accuracy: 79.41% ConvNext Base: Accuracy: 77.94% Final binary classification Accuracy: 90.9%, AUC: 0.99 Final multi-class classification Accuracy: 84.8%, AUC: 0.96
Tailor (2024) [30]	Choroidal nevi, Choroidal melanoma	Prediction of transformation	CFP FAF SD-OCT B-scan US	NR	2870	SHAP	ML	XGBoost (SAINTS), LGBM, Random Forest, Extra Tree	XGBoost (SAINTS): AUROC (test): 0.864 (95% CI: 0.864–0.865) AUPRC (test): 0.244 (95% CI: 0.243–0.246) AUROC (validation): 0.931 (95% CI: 0.930–0.931) AUPRC (validation): 0.533 (95% CI: 0.531–0.535) LGBM: AUROC (test): 0.831 (95% CI: 0.831–0.832) AUPRC (test): 0.171 (95% CI: 0.169–0.172) AUROC (validation): 0.815 (95% CI: 0.814–0.815) AUPRC (validation): 0.277 (95% CI: 0.276–0.279) Random Forest: AUROC (test): 0.812 (95% CI: 0.811–0.813) AUPRC (test): 0.122 (95% CI: 0.121–0.123) AUROC (validation): 0.866 (95% CI: 0.866–0.867) AUPRC (validation): 0.418 (95% CI: 0.417–0.420) Extra Tree: AUROC (test): 0.826 (95% CI: 0.826–0.827) AUPRC (test): 0.119 (95% CI: 0.118–0.119) AUROC (validation): 0.915 (95% CI: 0.915–0.916) AUPRC (validation): 0.511 (95% CI: 0.509–0.513) Predicting performance for SAINTS In testing data: AUC: 0.910 (0.910–0.910) F1 score: 0.344 (0.342–0.345) Sensitivity: 0.635 (0.633–0.638) Specificity: 0.981(0.978–0.982) External validation: AUC: 0.889 (0.889–0.890) F1 score: 0.535 (0.534–0.536) Sensitivity: 0.869 (0.868–0.870) Specificity: 0.98 (0.982–0.986)
Sabazade (2024) [31]	Small choroidal melanoma, Choroidal nevi	Classification, Segmentation	WF-FP SF-FP	802	688	NR	DL	U-net	In testing data: AUC: 0.88 (95% CI, 0.82–0.95) Sensitivity: 100%, Specificity: 74% External validation: AUC: 0.88 (95% CI, 0.74–1.00), Sensitivity: 80%, Specificity: 81%
Jackson (2024) [32]	Uveal (choroidal) melanoma, Choroidal nevi	Classification	CFP	27181	4255	NR	DL	RETFound	Binary classification: Accuracy: 0.83, Specificity: 0.87, Sensitivity: 0.79, F1 score: 0.84, AUCROC: 0.90 Multi-class classification: Accuracy: 0.82, Specificity: 0.85, Sensitivity: 0.73, F1 score: 0.72, AUCROC: 0.92

AUC (area under the curve); AUCROC (area under the receiver operating characteristic curve); AUROC (area under the receiver operating characteristic curve); AUPRC (area under the precision-recall curve); CFP (color fundus photography); CHPRE (congenital hypertrophy of the retina pigmented epithelium); CN (choroidal nevi); DenseNet (densely connected convolutional network); DL (deep learning); FAF (fundus autofluorescence); LASSO (Least Absolute Shrinkage and Selection Operator); LGBM (Light Gradient Boosting Model); MD-GRU (multi-dimensional gated recurrent unit); ML (machine learning); NR (not reported); OCT (optical coherence tomography); ResNet (residual network); SD-OCT (spectral-domain optical coherence tomography); SAINTS (Simple AI Nevus Transformation System); SF-FP (standard-field fundus photography); SHAP (Shapley additive explanations); UM (uveal melanoma); US (ultrasonography); UWF (ultra-wide-field fundus photography); WF-FP (wide-field fundus photography); XAI (explainable artificial intelligence); XGBoost (extreme gradient boosting).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Karamanli, K.-E.; Maliagkani, E.; Petrou, P.; Papageorgiou, E.; Georgalas, I. Artificial Intelligence in Decoding Ocular Enigmas: A Literature Review of Choroidal Nevus and Choroidal Melanoma Assessment. Appl. Sci. 2025, 15, 3565. https://doi.org/10.3390/app15073565

AMA Style

Karamanli K-E, Maliagkani E, Petrou P, Papageorgiou E, Georgalas I. Artificial Intelligence in Decoding Ocular Enigmas: A Literature Review of Choroidal Nevus and Choroidal Melanoma Assessment. Applied Sciences. 2025; 15(7):3565. https://doi.org/10.3390/app15073565

Chicago/Turabian Style

Karamanli, Konstantina-Eleni, Eirini Maliagkani, Petros Petrou, Elpiniki Papageorgiou, and Ilias Georgalas. 2025. "Artificial Intelligence in Decoding Ocular Enigmas: A Literature Review of Choroidal Nevus and Choroidal Melanoma Assessment" Applied Sciences 15, no. 7: 3565. https://doi.org/10.3390/app15073565

APA Style

Karamanli, K.-E., Maliagkani, E., Petrou, P., Papageorgiou, E., & Georgalas, I. (2025). Artificial Intelligence in Decoding Ocular Enigmas: A Literature Review of Choroidal Nevus and Choroidal Melanoma Assessment. Applied Sciences, 15(7), 3565. https://doi.org/10.3390/app15073565

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Artificial Intelligence in Decoding Ocular Enigmas: A Literature Review of Choroidal Nevus and Choroidal Melanoma Assessment

Abstract

1. Introduction

2. Materials and Methods

2.1. Search Strategy

2.2. Eligibility Criteria

2.3. Data Extraction

3. Results

3.1. Study Selection

3.2. Study Characteristics

3.3. Overview of Selected Studies

4. Discussion

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI