Next Article in Journal
A Context-Based Multimedia Vocabulary Learning System for Mobile Users
Next Article in Special Issue
Cloud-Based Platforms for Health Monitoring: A Review
Previous Article in Journal
Knowledge-Based Intelligent Text Simplification for Biological Relation Extraction
Previous Article in Special Issue
A Machine Learning-Based Multiple Imputation Method for the Health and Aging Brain Study–Health Disparities
 
 
Article
Peer-Review Record

EndoNet: A Model for the Automatic Calculation of H-Score on Histological Slides

Informatics 2023, 10(4), 90; https://doi.org/10.3390/informatics10040090
by Egor Ushakov 1,*, Anton Naumov 1, Vladislav Fomberg 1, Polina Vishnyakova 2,3, Aleksandra Asaturova 2, Alina Badlaeva 2, Anna Tregubova 2, Evgeny Karpulevich 1, Gennady Sukhikh 2 and Timur Fatkhudinov 3,4
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Informatics 2023, 10(4), 90; https://doi.org/10.3390/informatics10040090
Submission received: 31 August 2023 / Revised: 24 November 2023 / Accepted: 27 November 2023 / Published: 12 December 2023

Round 1

Reviewer 1 Report (Previous Reviewer 2)

Comments and Suggestions for Authors

While the revised manuscript demonstrates progress, it requires significant revisions before it can be considered for publication.The following is my review report:  

 

The manuscript introduces EndoNet, which uses deep learning to predict the H-Score in histological images. The revisions have improved the manuscript; however, further refinement is essential.

1. The presentation of other methods is scattered across the method and discussion sections. I suggest the authors reorganize the introduction section to comprehensively review other methods.

2. The process for extracting keypoints is not clearly defined. Please provide a clear definition of the keypoints and detail the parameters used for their calculation. Also, is the H-score calculation solely based on these keypoints?

3. The mechanism by which EndoNet differentiates between stromal and epithelial cells remains ambiguous.

4. The comparison with QuPath is limited to only Slide

5. A broader comparison is necessary, encompassing all seven slides at the very least.

6.  Slide 4 was annotated by two pathologists to measure consistency. Please provide a more in-depth description of this measurement process. Also, indicate which annotations were used for analysis.

 

In conclusion, while I commend the authors for their efforts, the manuscript would greatly benefit from enhanced clarity and a more streamlined presentation of the research findings.

Author Response

Dear Reviewer,

Thank you for appreciating our article. In the revised version of the manuscript, we added several text fragments and tried to resolve all issues:

  • The presentation of other methods is scattered across the method and discussion sections. I suggest the authors reorganize the introduction section to comprehensively review other methods.

Thank you for this comment. We have slightly reworked the Methods section and moved the descriptions of other methods to the Introduction and Discussion sections.

 

  • The process for extracting keypoints is not clearly defined. Please provide a clear definition of the keypoints and detail the parameters used for their calculation. Also, is the H-score calculation solely based on these keypoints?

We apologize for not describing these keypoints and their extraction process clearly enough. We described in more detail what keypoints are in lines 121-125. Furthermore, we added more information about the keypoint extraction and parameters we used for it in lines [159-168]. H-score calculation is based on keypoints and image pixels around these keypoints. We described it in lines 222-224.

 

  • The mechanism by which EndoNet differentiates between stromal and epithelial cells remains ambiguous.

Thank you for this comment. We provided a detailed description of classification of keypoints to two classes in lines 165-167.

 

  • The comparison with QuPath is limited to only Slide. A broader comparison is necessary, encompassing all seven slides at the very least.

We extended the comparison with QuPath in Table 4. Moreover, we now provided new additional research such as calculating Absolute Error of our model and Qupath (Figure 7), statistical analysis of the results (lines 308-317).

 

  • Slide 4 was annotated by two pathologists to measure consistency. Please provide a more in-depth description of this measurement process. Also, indicate which annotations were used for analysis.

Thank you for this note. We provided a detailed description of the measurement process in lines 104-108. Annotations of both experts were used for further analysis (line 110).

Thank you for these comments. Hope you find the changes satisfactory.

Reviewer 2 Report (New Reviewer)

Comments and Suggestions for Authors

The Authors developed a deep learning-based method called EndoNet to calculate the H-score on histological slides. The model was trained on endometrium slides from two datasets with annotated positions of nuclei and tissue type (stroma or epithelium). The developed method is interesting, and the study is well-designed, however, the overall presentation of the results is very poor, and the manuscript's organization is weak. It is really hard to read and follow the manuscript. Despite that, some descriptions are missing. Therefore, I suggest a major revision of the manuscript. Below, please find my specific comments:The whole manuscript is very hard to follow. First, some paragraphs are only 1 or 2 sentences long (e.g. lines 58-66 or 72-78), so it’s hard to trace the logical flow of the manuscript. Next, in the Methods section, there are multiple descriptions of existing approaches that are not finally used by the authors (e.g. lines 107-115 or 206-210). These parts need to be moved to the Introduction or Discussion section. Also, some descriptions in the Methods section should be moved to Results (e.g. lines 187-193) as they show the results of experiments performed. Please include in the Methods section only a description of algorithms used in EndoNet together with the description of the dataset, evaluation methods, and others.

·       The description of the EndoNet model is not consistent, not straightforward, and not detailed enough. In Figure 2, the pipeline (not sure why Authors call it Architecture) of the method is shown, however, all components presented on the plot should be immediately described in the text in detail (e.g.  Histoscore module is described two subsections later). It is very important to describe well the method that you introduce.

·       The term „keypoints” is already introduced in the abstract, but finally, I’m not sure of the definition, since it’s not straightly described. Is it mean only the central pixel of each nucleus? Is the annotation as stromal nuclei or epithelial nuclei also included in the definition? The description shown in lines 116-130 is not clear. Please explain it carefully in the Methods section.

·       Please show the architectures of the tested models for nuclei detection (UNet and others) and describe the differences between them.

·       What about tuning other hyperparameters like learning rate, batch size, regularization parameters, or dropout values?

·       The results obtained with different deep architectures tested by the Authors should be provided at least in the supplement.

·       The resulting H-scores are not statistically compared between different methods or manual annotation. Please provide the statistical analysis.

·       The last paragraph of the Introduction section is not clear. Please re-write.

·       What markers were stained and what staining was used in both datasets?

·       What type of interpolation was used to resize images?

·       It is not enough to write that the dataset consists of manually annotated tiles. What was annotated? Placement of nuclei, type of tissue, H-score? Please re-write the description of the dataset more carefully.

·       Line 25: Why is “H-score” written twice?

·       Line 216: how the distribution peaks were found and how the threshold was obtained?

·       Line 245: what was the actual value of the threshold distance?

·       Line 263: by classes on nuclei you understand stroma or epithelium?

·       Formula 1: please explain all symbols in the equation.

·       Figure 3 could be moved into a supplement.

·       Tables 1 and 2 could be merged into one table.

·       Tables 6 and 7 could be merged into one table.

·       It is very good that the Authors provided the code for EndoNet, but it would be better to introduce it in the text, for example somewhere in the Discussion section.

·       In many places, there is no space between text and reference number, e.g. lines 28, 81, 112. 

·       The text needs moderate language correction.

Comments on the Quality of English Language

Some sentences and paragraphs are hard to follow (see examples in my comments).

Author Response

Dear Reviewer,

Thank you for appreciating our article. In the revised version of the manuscript, we added several text fragments and tried to resolve all issues:

1. First, some paragraphs are only 1 or 2 sentences long (e.g. lines 58-66 or 72-78), so it’s hard to trace the logical flow of the manuscript.

Thank you for your careful reading and evaluation of the manuscript. We accept your comments and hope that you find the changes in the new version of the Manuscript satisfactory. Lines 58-66 were reformulated. Paragraphs 72-78 were deleted. 

 

2. Next, in the Methods section, there are multiple descriptions of existing approaches that are not finally used by the authors (e.g. lines 107-115 or 206-210). These parts need to be moved to the Introduction or Discussion section

Thank you for this comment. We moved lines 107-115 into the Introduction section and 206-210 into the Discussion section.

 

3. Also, some descriptions in the Methods section should be moved to Results (e.g. lines 187-193) as they show the results of experiments performed. Please include in the Methods section only a description of algorithms used in EndoNet together with the description of the dataset, evaluation methods, and others.

We moved these lines into the Results section (line 265-275).

 

4. The description of the EndoNet model is not consistent, not straightforward, and not detailed enough. In Figure 2, the pipeline (not sure why Authors call it Architecture) of the method is shown, however, all components presented on the plot should be immediately described in the text in detail (e.g.  Histoscore module is described two subsections later). It is very important to describe well the method that you introduce.

We apologize for not describing the architecture of the EndoNet clearly enough. We made a new subsection General architecture, which describes all components of our model.

 

5. The term „keypoints” is already introduced in the abstract, but finally, I’m not sure of the definition, since it’s not straightly described. Is it mean only the central pixel of each nucleus? Is the annotation as stromal nuclei or epithelial nuclei also included in the definition? The description shown in lines 116-130 is not clear. Please explain it carefully in the Methods section.

A keypoint is a vector of length 3, which contains the coordinates of the center of the object x and y and the class of the object. We added a description to the General architecture section in lines 121-125. 

 

6. Please show the architectures of the tested models for nuclei detection (UNet and others) and describe the differences between them.

We showed the tested architectures in Supplementary Materials (Figures S3, S4, S5) and added the describing of the differences between them.

 

7. What about tuning other hyperparameters like learning rate, batch size, regularization parameters, or dropout values?

Thank you for the comment. We tried to vary these parameters (except dropout) when we tried different approaches to solving this problem. Also, we relied on experiments with the same models, but in other projects, where these parameters are very similar to those that we used. Moreover, parameters such as learning rate, batch size may be specific to the GPU that is used for training or to the implementation of the learning process. Therefore, we decided not to mention these parameters.

 

8. The results obtained with different deep architectures tested by the Authors should be provided at least in the supplement.

We added a new table with the results to Supplementary Materials (Table S1).

 

9. The resulting H-scores are not statistically compared between different methods or manual annotation. Please provide the statistical analysis.

Thank you for this comment. We conducted a statistical analysis for the obtained results (lines 308-309). Moreover, we added a new graph of Absolute Error (Figure 7) of calculating H-score by different methods. We also conducted a statistical analysis for this graph (lines 309-317).

 

10. The last paragraph of the Introduction section is not clear. Please re-write.

Thank you. We rewrote this fragment.

 

11. What markers were stained and what staining was used in both datasets?

Thank you. Samples in the dataset were stained with antibodies to Progesterone Receptor and Estrogen Receptor. We added this information in the text (lines 81-85).

 

12. What type of interpolation was used to resize images?

We used bilinear interpolation. We added this information in the text (lines 93, 112).

 

13. It is not enough to write that the dataset consists of manually annotated tiles. What was annotated? Placement of nuclei, type of tissue, H-score? Please re-write the description of the dataset more carefully.

Thank you. As part of manual annotation of the dataset, nuclei stained with antibodies to the progesterone or estrogen receptor were marked on slides which were scanned and loaded into the system with the definition of one of two localizations: “stroma” and “epithelium” classes. Experts annotated tiles (small pices of slide). Experts put a keypoint on each nucleus on each tile in the stroma and epithelium. We added this information in the manuscript (lines 87-90).

 

14. Line 25: Why is “H-score” written twice?

Thank you, we fixed that.

 

15. Line 216: how the distribution peaks were found and how the threshold was obtained?

Thank you for your question. In order to find the thresholds that will show the best separating ability, we were iterating on a list of intervals. Therefore, there is no need to look for distribution peak centers. But during our experiments, when we were not yet sure about the nature of the distribution, we had the idea to use a smoothing algorithm (for example, kernel density estimation or others), after which we would search for the maximum value. We didn't describe it correctly enough in the previous text, so we apologize for that. We added up-to-date information to the text (lines 235-242).

 

16. Line 245: what was the actual value of the threshold distance?

The actual threshold value is 15.26 pixels for 512x512 image size. We added this information to the text (line 284).

 

17. Line 263: by classes on nuclei you understand stroma or epithelium?

To calculate H-score we need to separate each nucleus by the colors. Here it means that for each nucleus (both for the stroma and for the epithelium) an additional class of color intensity (weak, medium or strong) was assigned.

 

18. Formula 1: please explain all symbols in the equation.

Thank  you for this note. We added description to this equation (lines 193-194).

 

19. Figure 3 could be moved into a supplement.

We moved this figure to Supplementary Materials (Figure S1)

 

20. Tables 1 and 2 could be merged into one table.

We merged these tables into Table 1.

 

21. Tables 6 and 7 could be merged into one table.

We merged these tables into Table 4.

 

22. It is very good that the Authors provided the code for EndoNet, but it would be better to introduce it in the text, for example somewhere in the Discussion section.

Thank you for this comment. We mentioned it in the Discussion section (line 372).

 

23. In many places, there is no space between text and reference number, e.g. lines 28, 81, 112.

Thank you. Fixed in these and several other places.

 

24. The text needs moderate language correction.

Thank you, we have corrected the text in the manuscript.

 

Thank you for these comments. Hope you find the changes satisfactory.

Reviewer 3 Report (New Reviewer)

Comments and Suggestions for Authors

The article presents EndoNet i.e. the model for the automatic calculation of H-score on histological slides. The authors based the main body of the manuscript on endometrium tissue with stratification into stroma and epithelial area.

The use of English throughout the article is of an acceptable standard, and it is worth highlighting that all tables and figures are thoughtfully included, enhancing the clarity of the content.

Experimental design - The research is original and fills the gap in the literature.

Validity of the findings - Analysis and conclusions are supported by the data.

The discussion is clear, and lead the readers to better understand the proposed H-score module.

It is worth noting that the authors compared the proposed module to QuPath.

In the discussion section literature is nicely cited.

There is, however, some confusion regarding the purpose behind the yellow highlighting of specific paragraphs. Providing clarification or removing the highlights would be beneficial if they serve no discernible function.

 

One minor suggestion for improvement would be to include a scale bar in Figure 1, as this would offer readers a more precise understanding of the presented data.

Comments on the Quality of English Language

Moderate editing of the English language is required, in my opinion. 

Author Response

Dear Reviewer,

Thank you for appreciating our article. In the revised version of the manuscript, we added several text fragments and tried to resolve all issues:

1. There is, however, some confusion regarding the purpose behind the yellow highlighting of specific paragraphs. Providing clarification or removing the highlights would be beneficial if they serve no discernible function.

Thank you for your comment. We apologize for the oversight. The highlighting remained after resubmission. Now new fragments are highlighted where the new changes were made.

 

2. One minor suggestion for improvement would be to include a scale bar in Figure 1, as this would offer readers a more precise understanding of the presented data.

Thank you for this comment. We have added scale intervals in Figure 1.

 

Thank you for these comments. Hope you find the changes satisfactory.

Round 2

Reviewer 1 Report (Previous Reviewer 2)

Comments and Suggestions for Authors

The revised manuscript has made a significant improvement and can be published after a minor revision. However, on lines 50-52, it is not appropriate to state that CNNs were invented later, as AlexNet is a conventional neural network architecture.

Author Response

Dear Reviewer,

Thank you for your latest comments. In the revised version of the manuscript, we tried to resolve this issue:

On lines 50-52, it is not appropriate to state that CNNs were invented later, as AlexNet is a conventional neural network architecture

We apologize for the incorrect statement. We changed this sentence in lines 52-53.

Hope you find the changes satisfactory.

Reviewer 2 Report (New Reviewer)

Comments and Suggestions for Authors

The revised version of the manuscript is much better in terms of organization and clarity. However, there are still minor problems that I believe should be corrected:

·       Figure 1: In the description it is written that on the left side, the whole slide image (WSI) is presented. Is that true? Or is it only the fragment of WSI? First, I don’t see any borders of the tissue, and second, scanned slide images are rather rectangular with one border bigger than the other and a lot of background signal. 

·       Line 92: “… values. But. ..” – it looks like these two sentences should be connected.

·       Line 105: what is the name of the similarity measure?

·       Lines 169-172 describe the models that were used in the nuclei detection task, so this paragraph should be moved to line 157.

·       Lines 208-209: tiles or images? It is good to use the same terminology within the manuscript.

·       The description of the mAP metric could be moved to the Methods section.

·       Similarly, the description of statistical tests used in the work could be moved to Methods under the Statistical Testing section.

 

·       Why Conclusion section is before the Discussion section?

Comments on the Quality of English Language

The English language has been improved.

Author Response

Dear Reviewer,

Thank you for your latest comments. In the revised version of the manuscript, we tried to resolve all issues:

  • Figure 1: In the description it is written that on the left side, the whole slide image (WSI) is presented. Is that true? Or is it only the fragment of WSI? First, I don’t see any borders of the tissue, and second, scanned slide images are rather rectangular with one border bigger than the other and a lot of background signal. 

Thank you for this comment. You are right, this is a cropped image though the percentage of it is close to the whole slide. We did this to remove a large amount of empty space on the slide. We indicated that in the caption to Figure 1.

 

  • Line 92: “… values. But. ..” – it looks like these two sentences should be connected.

Thank you, we corrected this.

 

  • Line 105: what is the name of the similarity measure?

The name of the similarity measure is Keypoint Similarity. We added a formula in the text in line 104.

 

  • Lines 169-172 describe the models that were used in the nuclei detection task, so this paragraph should be moved to line 157.

Thank you for the comment, we fixed this.

 

  • Lines 208-209: tiles or images? It is good to use the same terminology within the manuscript.

Thank you for this comment, we changed images to tiles.

 

  • The description of the mAP metric could be moved to the Methods section.

We moved the description to lines 184-189.

 

  • Similarly, the description of statistical tests used in the work could be moved to Methods under the Statistical Testing section.

Thank you, we created the Statistical Testing section and moved the description to this section.

 

  • Why Conclusion section is before the Discussion section?

We apologize for this mistake, the order is correct now.

 

Thank you for these comments. Hope you find the changes satisfactory.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In this paper, the authors introduce a method called EndoNet which uses neural networks to calculate H-score on histological slides. However, I do not see enough novelty in the proposed algorithm. The idea of using neural networks in the histological image context is not new, and several tools already exist. The proposed method is not compared with the state-of-the-art methods. Moreover, I could not see any significant contribution to this manuscript. The manuscript is not well structured as an original research paper.

Author Response

We appreciate the reviewer's valuable input regarding our paper. Thank you for highlighting your concerns regarding novelty, comparison, contribution, and structure. In the new version, we tried to take this into account and fix it.

Reviewer 2 Report

Comments and Suggestions for Authors

The authors propose a model, EndoNet, to calculate the H-Score on histological slides. The model includes a detection model and an H-score model. However, the manuscript lacks clarity in terms of data and method descriptions. Furthermore, the results section appears to be weak. Here are my comments on the manuscript.

 

1. The authors should provide more details about the dataset, including the number of slides used from each source and the number of tiles in the training, validation, and test sets. They should also clarify if the PathLab dataset is publicly available and provide a reference if possible.

 

2. The authors should explain how the tiles from EndoNuke are resized to 512*512 pixels and why this is not necessary for the tiles from PathLab.

 

3. Line 89.If the tiles from PathLab are not annotated, the authors should explicitly state how they were annotated. They should also provide information about the number of tiles in the combined dataset that are not annotated.

 

4. Line 158. If "Pathlib" is a typo of "PathLab," the authors should correct it.

 

5. The meaning of Table 2 should be clarified by the authors. Dose the results mean that the pre-trained model are not significantly different from the baseline model.

 

6. The authors should explain why the results of the combined dataset are exactly the same as the EndoNuke dataset in Table 3. 

 

7. The authors should describe in more detail how the set of possible parameters was determined and how the threshold for the second H-score model was found. Providing information on the methodology used or any statistical analysis conducted would help address this issue.

 

8. The authors should explain why only 7 slides were selected for H-score calculation and how the calculation can be done automatically without relying on pathologists' annotations.

 

9. The authors should provide a more in-depth explanation of why deviations between predicted and manual H-scores are considered normal.

Author Response

We sincerely appreciate the reviewer's thoughtful assessment of our manuscript. We have carefully considered your comments and have made substantial revisions to address these concerns:

 

  1. We have described each dataset in more detail. Information about the availability of the dataset can be found in the Data Availability Statement section.
  2. We used standard interpolation methods that provide Python packages such as OpenCV.
  3. Thanks for the comment, we added this to the description of the datasets.
  4. Thanks, corrected it.
  5. We described this in more detail in the section about pre-training. This table should show that the results obtained with the help of pre-training are significant, and this method of action helps to improve the quality.
  6. Described it in the results section, lines 246-252.
  7. Thanks for this comment, we added a description of the choice of parameters (lines 140-143).
  8. Since the annotation process was very time-consuming, we managed to use only 7 slides for the dataset.
  9. We consider that, despite the presence of discrepancies in the calculation of H-score between experts and the model, such discrepancies do not lead to discrepancies in the interpretation of the expression class - weak, moderate or strong.

 

Back to TopTop