Next Article in Journal
Fast 3D Analytical Affine Transformation for Polygon-Based Computer-Generated Holograms
Next Article in Special Issue
Multifactorial Model Based on DWI-Radiomics to Determine HPV Status in Oropharyngeal Squamous Cell Carcinoma
Previous Article in Journal
SPEEDY Quantum Circuit for Grover’s Algorithm
Previous Article in Special Issue
Explanations of Machine Learning Models in Repeated Nested Cross-Validation: An Application in Age Prediction Using Brain Complexity Features
 
 
Article
Peer-Review Record

Automatic Screening of the Eyes in a Deep-Learning–Based Ensemble Model Using Actual Eye Checkup Optical Coherence Tomography Images

Appl. Sci. 2022, 12(14), 6872; https://doi.org/10.3390/app12146872
by Masakazu Hirota 1,2,*, Shinji Ueno 3, Taiga Inooka 4, Yasuki Ito 5, Hideo Takeyama 6, Yuji Inoue 2, Emiko Watanabe 2 and Atsushi Mizota 2
Appl. Sci. 2022, 12(14), 6872; https://doi.org/10.3390/app12146872
Submission received: 29 May 2022 / Revised: 2 July 2022 / Accepted: 5 July 2022 / Published: 7 July 2022

Round 1

Reviewer 1 Report

1. This paper presents a Deep-Learning–Based Ensemble Model Using Actual Eye Checkup (low-quality) Optical Coherence Tomography Images. The model employs convolutional neural networks (CNNs) and random forest models.  This paper showed high screening performance in the single-shot OCT images captured during the actual eye checkups using the machine-learning (ML) model and algorithms. However, there are concerns that the screening performance will be degraded because this study excluded OCT images in which the ophthalmologists had difficulty determining the disease by reading the images alone. Therefore, the accuracy of our ensemble model during actual eye checkups would require further study. The manuscript is well-written in general.  In the introduction, the authors are suggested to comment on other ML models used for OCT imaging analysis and why they chose this particular model.  Appropriate references should be added.  Further, there are some suggestions and corrections that that this reviewer recommend for the authors' consideration. 

2. Line 116, The ellipsoid zone (EZ) luminance needs a definition or reference to explain its meaning and significance.

3. Line 202.  The words after Table 1 should be deleted. These were instruction about placement of the table from the template.  On the other hand a table caption should be given. 

4. Line 139.  Some of the contents of the paragraph following line 139, which is a figure caption, should be incorporated into the caption.  Specifically, the summary of the contents of Fig. 2 (A) and (B).  This is not an isolated case.  Similar comments apply to Lines 266-268.  These give explanation of Fig. 4.  However, the significance of Fig. (A), (B) and (C) each was not touched upon.  I suspected that these are images of different eyes.  This is confirmed somewhat in the discussion following Fig. 5.  Also, lines 271-272, lines 300-301 explain the meanings of OCT and CNN, which should preferably be in the figure captions. Figure 5 show typical wrongly diagnosed OCT images by the CNN models.  The authors are suggested to comment on the statistical significance of this outcome.  That is, out of N images, how many are wrongly diagnosed?

5. Screening accuracy is reported to be 0.999 at 0.025 image/s. This is quite impressive.  Does the time needed include postprocessing?

Author Response

Reviewer #1

  1. This paper presents a Deep-Learning–Based Ensemble Model Using Actual Eye Checkup (low-quality) Optical Coherence Tomography Images. The model employs convolutional neural networks (CNNs) and random forest models.  This paper showed high screening performance in the single-shot OCT images captured during the actual eye checkups using the machine-learning (ML) model and algorithms. However, there are concerns that the screening performance will be degraded because this study excluded OCT images in which the ophthalmologists had difficulty determining the disease by reading the images alone. Therefore, the accuracy of our ensemble model during actual eye checkups would require further study. The manuscript is well-written in general. In the introduction, the authors are suggested to comment on other ML models used for OCT imaging analysis and why they chose this particular model. Appropriate references should be added.  Further, there are some suggestions and corrections that that this reviewer recommend for the authors' consideration. 

 

Response:

  • We sincerely thank you for your comments, as they have helped us improve our paper. We agree with each of your comments, and we have made revisions in the manuscript accordingly. Our point-by-point responses to your comments are as follows.

 

  • We mentioned the selecting of CNN model in Introduction (Lines 56−69). We selected ResNet-152 as a baseline. Densenet was selected to evaluate the classification accuracy when the number of parameters was varied, while Imagenet Top 1 accuracy remained unchanged. EfficientNet-B7 was selected to evaluate the classification accuracy when Imagenet Top 1 accuracy was increased, while the number of parameters was unchanged.

Lines 57−69.

The development speed of ML is quite fast, and new models are proposed yearly to improve the discriminability.[9, 16-19] Generally, newer models are better for image classification tasks. However, those newer models are not necessarily better when performing transfer learning using OCT images.[20] The convolutional neural network (CNN) is a machine learning technique that provides superior image classification. The CNN model is evaluated in both the number of parameters in the network and the accuracy of the model. We selected three convolutional neural network models of ResNet-152,[16] DenseNet-201,[18] and EfficientNet-B7,[19] to classify “Abnormal” or “Normal” of OCT images in this study. ResNet-152 has a parameter of 60 million, and Imagenet Top 1 accuracy is 77.8%. DenseNet-201 reduces the number of parameters to 1/3 of ResNet-152 (20 million), and maintains Imagenet Top 1 accuracy close to that of ResNet-152 (77.42%). EfficientNet-B7 is close to ResNet-152 in the number of parameters (66 millions), and significantly higher Imagenet Top 1 accuracy (84.3%) than RsNet-152.

  • Moreover, we mentioned the software to prevent data shifting in the eye images (Lines 77−80). A previous study prevented data shift by adding a model to determine the image quality that satisfies the model's analysis conditions and prompting re-measurement for image quality that does not meet the criteria. Although the retake approach may be effective in a clinical setting, it is not suitable for a health checkup where a large number of people are examined in a short period of time. Therefore, we added the purpose of our study: the ML model for screening retinal disease without requesting retakes from low-quality OCT images”.

Lines 77−84

In order to manage dataset shifting, the software has been developed to evaluate the image quality after image acquisition and retake the image if the image quality is poor.[24] However, if retakes are required, the examination takes more than twice as long, making it unsuitable for health checkups where the examination speed is critical.

Thus, in this study, we used single-shot OCT images captured during actual eye checkups to prevent a dataset shift and investigated the ML model for screening retinal disease without requesting retakes from low-quality OCT images.

 

 

 

  1. Line 116, The ellipsoid zone (EZ) luminance needs a definition or reference to explain its meaning and significance.

Response:

  • Thank you for your constructive comment. We explained the meaning of ellipsoid zone (EZ) which associates the inner/outer segment of photoreceptor cells. Also, we mentioned the EZ illuminance is reduced by ocular diseasese in Lines 135−137.

Lines 135−137

The ellipsoid zone (EZ), the inner/outer segment of photoreceptors (IS/OS), is the second hyper-reflective band on an OCT image.[25, 26] The the EZ illuminance in OCT images are reduced when ocular diseases impair photoreceptor cells.[27, 28]

 

  1. Line 202.  The words after Table 1 should be deleted. These were instruction about placement of the table from the template.  On the other hand a table caption should be given. 

Response:

  • Thank you for pointing out our mistake. We deleted the template sentence, and added the caption in Table 1.

Table 1 (line 228)

Detailed overview of the abnormal OCT images.

  1. Line 139.  Some of the contents of the paragraph following line 139, which is a figure caption, should be incorporated into the caption.  Specifically, the summary of the contents of Fig. 2 (A) and (B).  This is not an isolated case.  Similar comments apply to Lines 266-268.  These give explanation of Fig. 4.  However, the significance of Fig. (A), (B) and (C) each was not touched upon.  I suspected that these are images of different eyes.  This is confirmed somewhat in the discussion following Fig. 5.  Also, lines 271-272, lines 300-301 explain the meanings of OCT and CNN, which should preferably be in the figure captions. Figure 5 show typical wrongly diagnosed OCT images by the CNN models.  The authors are suggested to comment on the statistical significance of this outcome.  That is, out of N images, how many are wrongly diagnosed?

Response:

  • Thank you for your pointing out our mistake. We made a mistake in the figure captioning method. We wrote the title of the figure followed by a new line and a description of the figure. We apologize for any confusion caused by the lack of explanation. Figure 4 shows the representative OCT images of the diseases that the CNN models answered correctly. We have changed Fig. 4C in the revision because old Fig. 4C was macular edema that was the same disease as Fig. 4B. The latest Fig. 4C was the epiretinal membrane.

Figure 4 (lines 287−293)

Figure 4. Representative visual explanations of the feature map of the CNN model in the corrected OCT image. The heat maps in (A) macular hole, (B) macular edema, and (C) epiretinal membrane indicate the relative activation intensity of predicting abnormalities in the OCT images. Warm colours indicate areas of high attention for classification. The CNN model made a decision based on the location of the warm color. The CNN model focused on the retinal structural changes in the abnormal eyes. OCT, optical coherence tomography; CNN, convolutional neural network.

Figure 5 (lines 313−320)

Figure 5. Representative visual explanations of the feature map of the CNN model in the misrecognized OCT images. The heat maps in (A), (B), (C), (D), and (E) indicate the relative activation intensity of predicting abnormalities in OCT images. The CNN model made a decision based on the location of the warm color. The CNN model classified (A) an abnormal eye (retinitis pigmentosa) as normal and (B–E) a normal eye as abnormal. 

ResNet-152 and DenseNet-201 were 5/100 images wrong. EfficientNet-B7 was 4/100 images wrong. The ensemble model was 2/100 images wrong. OCT, optical coherence tomography; CNN, convolutional neural network.

Lines 214−217.

The accuracies of the CNN models with ResNet-152, DenseNet-201, and EfficientNet-B7, and the ensemble model were 95.0% (abnormal 49/50, normal 46/50), 95.0% (abnormal 49/50, normal 46/50), 96.0% (abnormal 48/50, normal 48/50), and 98.0% (abnormal 50/50, normal 48/50), respectively (Fig. 2).

 

  • We added the information for “how many were wrongly screening?” in lines 213−216. Also we described the result of statistical analysis in screening performance between each model (Line 219–221).

Line 220−222.

Although the screening performance between each model did not significantly differ, the screening performance in the ensemble model was the highest among the four models.

 

 

  1. Screening accuracy is reported to be 0.999 at 0.025 image/s. This is quite impressive.  Does the time needed include postprocessing?

Response:

  • Thank you for your constructive comment. Screening time was averaged by the number of images to output the results after soft voting of the data analyzed by CNN and RFC. The initial run to put data into the GPU was slow; on average, the analysis took 0.025 image/s. When outputting images with grad-cam in a single CNN model, an additional average of 0.12 seconds is required. Since the image output time may change depending on functions to be added in the future, only the predicted output time is shown in this study.

Author Response File: Author Response.docx

Reviewer 2 Report

Below there are some comments/suggestions that can help to improve the article.

 

  1. In Section I, the authors should highlight the contributions of the study.
  2. Experiment I: Would be possible to use ML techniques for the automatic classification of ocular disease?
  3. Please check the caption of Table 1.
  4. Some comments related to the results shown in Figures 2 and 3 will be welcome.
  5. In Experiment 2, please add a reference to Mann–Whitney test with Bonferroni correction.
  6.  Maybe some references in the discussion section can be moved to the introduction to complete the state-of-the-art.
  7. The conclusion is a little bit poor. Please, improve this important section. Maybe, the authors can join Discussion and Conclusion sections in only one.

Author Response

Reviewer #2

  1. In Section I, the authors should highlight the contributions of the study.

Response:

  • We sincerely thank you for your comments, which have helped us improve our paper. We agree with each of your comments and have accordingly made revisions to the manuscript. Our point-by-point responses to your comments are as follows.
  • We mentioned the software to prevent data shifting in the eye images (Lines 77−80). A previous study prevented data shift by adding a model to determine the image quality that satisfies the model's analysis conditions and prompting re-measurement for image quality that does not meet the criteria. Although the retake approach may be effective in a clinical setting, it is not suitable for a health checkup where many people are examined in a short time. Therefore, we added the purpose of our study: the ML model for screening retinal disease without requesting retakes from low-quality OCT images”.

Lines 77−84

In order to manage dataset shifting, the software has been developed to evaluate the image quality after image acquisition and retake the image if the image quality is poor.[24] However, if retakes are required, the examination takes more than twice as long, making it unsuitable for health checkups where the examination speed is critical.

Thus, in this study, we used single-shot OCT images captured during actual eye checkups to prevent a dataset shift and investigated the ML model for screening retinal disease without requesting retakes from low-quality OCT images.

 

  1. Experiment I: Would be possible to use ML techniques for the automatic classification of ocular disease?

Response:

  • Thank you for your constructive comment. Although theoretically possible to use ML techniques for the automatic classification of ocular disease using actual eye checkup OCT images, we consider it challenging to accumulate specific diseases from eye checkup images alone since the majority of people who undergo eye checkups are healthy individuals (normal 6050/7,703 images; abnormal 655/7,703 images; difficulty determining the ocular disease (subnormal) 998/7,703). There is also the limitation of diseases that can be determined only by OCT imaging.

 

  1. Please check the caption of Table 1.

Response:

  • Thank you for pointing out our mistake. We deleted the template sentence and added the caption in Table 1.

Table 1 (line 228)

Detailed overview of the abnormal OCT images.

  1. Some comments related to the results shown in Figures 2 and 3 will be welcome.

Response:

  • Thank you for your comment. We added discussion related to Figures 2-4 in Lines 550−562.

Lines 548−560

The CNN models focused on the structural changes in the retina in abnormal eyes, with accuracy from 95% to 96%, and the screening performance did not differ between each model (Figs.2 - 4). Our finding is consistent with earlier studies that reported a classification accuracy of a single-CNN model of about 70%–95%; this model may also miss some diseases using OCT images.[10, 11, 13-15] Also, our findings support the earlier study that the latest CNN model is not necessarily better when performing transfer learning using OCT images.[20] We have considered that the number of classification categories relates to our findings. We did develop the CNN models for the classification of two categories of “Abnormal” or “Normal”, and did not develop the CNN models for classification of every disease in this study. The binary classification is the simplest classifier in the CNN models. Therefore, we expect the differences between CNN models to be more significant for multi-class classifications, such as those for classifying retinal diseases.

 

 

 

  1. In Experiment 2, please add a reference to Mann–Whitney test with Bonferroni correction.

Response:

  • Thank you for your comment. We added a reference to Bonferroni correction (line 373). This study evaluates the sum of the retinal and choroidal area between abnormal and normal eyes in five segments. The type I error occurs at 0.226 (1 – 0.955). Thus, we corrected p-values using Bonferroni correction.

 

  1. Maybe some references in the discussion section can be moved to the introduction to complete the state-of-the-art.

Response:

  • Thank you for your constructive comment. We mentioned the accuracy of previous ML models as about 70%-95% in lines 54–55. Covered by comment 1, I also mentioned that the avoidance strategy of data shift is difficult to apply to eye checkups (lines 77–81).

Lines 54−55

The classification accuracy of the ML model is about 70%–95%.[10, 11, 13-15]

Lines 77−81

In order to manage dataset shifting, the software has been developed to evaluate the image quality after image acquisition and retake the image if the image quality is poor.[24] However, if retakes are required, the examination takes more than twice as long, making it unsuitable for health checkups where the examination speed is critical.

 

 

  1. The conclusion is a little bit poor. Please, improve this important section. Maybe, the authors can join Discussion and Conclusion sections in only one.

Response:

  • We agree with your comment. We deleted the Conclusion section and wrote conclusions and prospects at the end of the Discussion section.

Lines 592−601

In this study, the ensemble model that the random forest model complements the weaknesses of the CNN showed high screening performance in the single-shot OCT images captured during the actual eye checkups. These findings suggest that our ensemble model can screen for retinal diseases without requiring retakes in the actual eye checkups. On the other hand, we have been concerned that the screening performance will be degraded when our ensemble model is applied to the actual eye checkup sites because we excluded OCT images in which the ophthalmologists had difficulty determining the disease by reading the images alone. Therefore, the accuracy of our ensemble model during actual eye checkups will need to be confirmed in a future investigation.

Author Response File: Author Response.docx

Back to TopTop