Author Contributions
The authors confirm contribution to the paper as follows: X.D. and Y.H. supervised the study; W.Y. (Wenhan Yang) and Y.H. designed the study; X.D., L.S., L.H., S.L., X.L., Y.J., W.S., W.Y. (Wenjia Yan) and J.L. collected the data; W.Y. (Wenhan Yang) analyzed the data and conducted the main experiments; H.Z., Y.Z. and J.D. conducted parts of experiments; W.Y. (Wenhan Yang) and Y.H. prepared and wrote the manuscript; Z.X. reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.
Figure 1.
The workflow for automatic assessment of the severity level of retinopathy of prematurity: (A) represents the data collection, model training and prediction, and lesion stitching, and finally predicts the stage and zone results of ROP; (B) represents data collection, predicting plus disease and obtaining the final result of whether each eye has plus disease or not. The severity grade is ultimately inferred based on the stage, zone, and whether it is a plus lesion in ROP according to clinical guidelines. Z I represents zone I; Z II S II+ represents zone II and stage II with plus disease.
Figure 1.
The workflow for automatic assessment of the severity level of retinopathy of prematurity: (A) represents the data collection, model training and prediction, and lesion stitching, and finally predicts the stage and zone results of ROP; (B) represents data collection, predicting plus disease and obtaining the final result of whether each eye has plus disease or not. The severity grade is ultimately inferred based on the stage, zone, and whether it is a plus lesion in ROP according to clinical guidelines. Z I represents zone I; Z II S II+ represents zone II and stage II with plus disease.
Figure 2.
The flowchart of pre-training [
53]. Two retinal images were sent to a feature extraction module based on ResNet50 and an image registration prediction module, resulting in a registered image. The model weights generated during this process were used for downstream task training.
Figure 2.
The flowchart of pre-training [
53]. Two retinal images were sent to a feature extraction module based on ResNet50 and an image registration prediction module, resulting in a registered image. The model weights generated during this process were used for downstream task training.
Figure 3.
The performance of three training strategies for lesion detection: (A) represents the AUC metric for the training set; (B) represents the AUC metric for the validation dataset.
Figure 3.
The performance of three training strategies for lesion detection: (A) represents the AUC metric for the training set; (B) represents the AUC metric for the validation dataset.
Figure 4.
The flowchart of domain adaptation. The target domain ZOC images and their cropped patches are transformed into the flowchart of domain adaptation based on CycleGAN. The blue parts in the image represent the processing module or results for the entire fundus image in the ZOC, while the red parts represent the processing module or results for the cropped patches from ZOC. We utilize the source domain PY vessel segmentation task model and the feature style alignment module to constrain the model. The final output will be images with a style similar to the source domain PY data.
Figure 4.
The flowchart of domain adaptation. The target domain ZOC images and their cropped patches are transformed into the flowchart of domain adaptation based on CycleGAN. The blue parts in the image represent the processing module or results for the entire fundus image in the ZOC, while the red parts represent the processing module or results for the cropped patches from ZOC. We utilize the source domain PY vessel segmentation task model and the feature style alignment module to constrain the model. The final output will be images with a style similar to the source domain PY data.
Figure 5.
The confusion matrices of our method and clinical doctors in assessing the stage of ROP tasks: (A) represents our system; (B) represents clinical doctor A; (C) represents clinical doctor B; (D) represents clinical doctor X; (E) represents clinical doctor Y; (F) represents clinical doctor Z. The horizontal axis of the confusion matrix ranges from 0 to 4, representing the predicted stages from stage 0 (indicating no ROP lesions) to stage Ⅳ.
Figure 5.
The confusion matrices of our method and clinical doctors in assessing the stage of ROP tasks: (A) represents our system; (B) represents clinical doctor A; (C) represents clinical doctor B; (D) represents clinical doctor X; (E) represents clinical doctor Y; (F) represents clinical doctor Z. The horizontal axis of the confusion matrix ranges from 0 to 4, representing the predicted stages from stage 0 (indicating no ROP lesions) to stage Ⅳ.
Figure 6.
Performance of various methods in assessing the stage of ROP tasks; (A) represents the comparison between various methods on the kappa index in assessing the stage of ROP; (B) represents the comparison between various methods on the accuracy index in assessing the stage of ROP. Method 1 represents our method; method 2 represents random initialization plus domain adaptation; method 3 represents using ImageNet plus domain adaptation; method 4 represents using homologous pretrain; method 5 represents using random initialization; method 6 represents using ImageNet.
Figure 6.
Performance of various methods in assessing the stage of ROP tasks; (A) represents the comparison between various methods on the kappa index in assessing the stage of ROP; (B) represents the comparison between various methods on the accuracy index in assessing the stage of ROP. Method 1 represents our method; method 2 represents random initialization plus domain adaptation; method 3 represents using ImageNet plus domain adaptation; method 4 represents using homologous pretrain; method 5 represents using random initialization; method 6 represents using ImageNet.
Figure 7.
The confusion matrices of our method and clinical doctors in assessing the zone of ROP tasks: (A) represents our system; (B) represents clinical doctor A; (C) represents clinical doctor B; (D) represents clinical doctor X; (E) represents clinical doctor Y; (F) represents clinical doctor Z.
Figure 7.
The confusion matrices of our method and clinical doctors in assessing the zone of ROP tasks: (A) represents our system; (B) represents clinical doctor A; (C) represents clinical doctor B; (D) represents clinical doctor X; (E) represents clinical doctor Y; (F) represents clinical doctor Z.
Figure 8.
Performance of various methods in assessing the zone of ROP tasks: (A) represents the comparison between various methods on the kappa index in assessing the zone of ROP; (B) represents the comparison between various methods on the accuracy index in assessing the zone of ROP.
Figure 8.
Performance of various methods in assessing the zone of ROP tasks: (A) represents the comparison between various methods on the kappa index in assessing the zone of ROP; (B) represents the comparison between various methods on the accuracy index in assessing the zone of ROP.
Figure 9.
Performance in assessing the severity level of ROP tasks between our system and ophthalmologists.
Figure 9.
Performance in assessing the severity level of ROP tasks between our system and ophthalmologists.
Figure 10.
The performance of severity level of ROP between three methods which were adopted by domain adaptation: the red line represents method 1, which is our method; the blue line represents method 2, which is using domain adaptation with random initialization; the green line represents method 3, which is using domain adaptation with ImageNet; the yellow line represents method 4, which is using homologous pretrain; the black line represents method 5, which is using random initialization; the purple line represents method 6, which is using ImageNet.
Figure 10.
The performance of severity level of ROP between three methods which were adopted by domain adaptation: the red line represents method 1, which is our method; the blue line represents method 2, which is using domain adaptation with random initialization; the green line represents method 3, which is using domain adaptation with ImageNet; the yellow line represents method 4, which is using homologous pretrain; the black line represents method 5, which is using random initialization; the purple line represents method 6, which is using ImageNet.
Figure 11.
The visualization of our method. Box outlines in (A–D) indicate the type and sites of lesions: (A) stage I: demarcation line; (B) stage II: ridge; (C) stage III: ridge with extra retinal fibrovascular involvement; (D) stage IV: subtotal retinal detachment. The red circle in the middle represents zone one; the region between the purple and red circles represents zone two; and the area between the green and purple circles represents zone three. The yellow rectangle and red rectangle in the figure represent the area predicted by the model for the lesion and annotated by the doctor, respectively. The yellow letters and red letters represent the lesion type predicted and annotated by the doctor, respectively.
Figure 11.
The visualization of our method. Box outlines in (A–D) indicate the type and sites of lesions: (A) stage I: demarcation line; (B) stage II: ridge; (C) stage III: ridge with extra retinal fibrovascular involvement; (D) stage IV: subtotal retinal detachment. The red circle in the middle represents zone one; the region between the purple and red circles represents zone two; and the area between the green and purple circles represents zone three. The yellow rectangle and red rectangle in the figure represent the area predicted by the model for the lesion and annotated by the doctor, respectively. The yellow letters and red letters represent the lesion type predicted and annotated by the doctor, respectively.
Table 1.
The information about training data and validation data for PY.
Table 1.
The information about training data and validation data for PY.
Data | Stage I Lesion | Stage II Lesion | Stage III Lesion | Stage IV Lesion |
---|
training data | 77 | 177 | 39 | 36 |
validation data | 11 | 35 | 10 | 10 |
Table 2.
Compared our method with clinical doctors in assessing the performance of the stage of ROP task.
Table 2.
Compared our method with clinical doctors in assessing the performance of the stage of ROP task.
Methods | Acc | Kappa |
---|
our system | 0.69 | 0.62 |
clinical doctor A | 0.57 | 0.52 |
clinical doctor B | 0.37 | 0.28 |
clinical doctor X | 0.47 | 0.47 |
clinical doctor Y | 0.51 | 0.45 |
clinical doctor Z | 0.45 | 0.36 |
Table 3.
Comparison of our method with clinical doctors in assessing the performance of the zone of ROP.
Table 3.
Comparison of our method with clinical doctors in assessing the performance of the zone of ROP.
Methods | Acc
| Kappa
|
---|
our system
| 0.74
| 0.55
|
clinical doctor A
| 0.61
| 0.51
|
clinical doctor B
| 0.61
| 0.42
|
clinical doctor X
| 0.62
| 0.54
|
clinical doctor Y
| 0.68
| 0.59
|
clinical doctor Z
| 0.73
| 0.64
|
Table 4.
Comparison of methods with domain adaptation techniques and clinical doctors in assessing the performance of detecting plus disease in ROP.
Table 4.
Comparison of methods with domain adaptation techniques and clinical doctors in assessing the performance of detecting plus disease in ROP.
Methods | Acc | F1 |
---|
our system | 0.96 | 0.7 |
clinical doctor A | 0.92 | 0.52 |
clinical doctor B | 0.93 | 0.64 |
clinical doctor X | 0.91 | 0.65 |
clinical doctor Y | 0.94 | 0.67 |
clinical doctor Z | 0.9 | 0.58 |
Table 5.
Comparison using method and domain adaptation with clinical doctors in assessing the performance of plus of ROP task.
Table 5.
Comparison using method and domain adaptation with clinical doctors in assessing the performance of plus of ROP task.
Methods | Acc | F1 |
---|
I-ROP ASSIST with domain adaptation | 0.96 | 0.7 |
I-ROP ASSIST | 0.92 | 0.35 |
Table 6.
Performance of our method and compared methods in assessing the severity level of ROP.
Table 6.
Performance of our method and compared methods in assessing the severity level of ROP.
Methods | AUC (95%CI)
| Recall
| Specificity
|
---|
domain adaptation with homologous pretrain
| 0.95 (0.90–0.98)
| 1
| 0.7
|
domain adaptation with random initialization
| 0.92 (0.86–0.96)
| 1
| 0.43
|
domain adaptation with ImageNet
| 0.93 (0.88–0.98)
| 1
| 0.45
|
homologous pretrain
| 0.93 (0.88–0.98)
| 1
| 0.68
|
random initialization
| 0.92 (0.87–0.97)
| 1
| 0.54
|
ImageNet
| 0.88 (0.81–0.94)
| 1
| 0.46
|