A Bio-Inspired Visual Perception Transformer for Cross-Domain Semantic Segmentation of High-Resolution Remote Sensing Images
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis manuscript proposes a transformer-based strategy for cross-domain remote sensing image segmentation, which effectively acquires semantic information in high-resolution remote sensing images. The method is novel and described in detail in the manuscript. Furthermore, experimental results on benchmark remote sensing datasets demonstrate the effectiveness of the proposed module. However, there are some concerns that need to be addressed before possible publication:
1) The abstract lacks clarity in explaining the respective roles of the proposed modules and their relationship to semantic information extraction.
2) The utilization of experimental datasets in this study is considered inadequate; it is advisable to incorporate more datasets for robust evaluations.
3) Providing a more specific network structure for the proposed GSV-Trans would be helpful.
4) Although the backbone of the proposed algorithm utilizes transformers, it should be noted that transformer-based state-of-the-art (SOTA) methods are missing from Table II of the manuscript.
Author Response
Thank you for your invaluable suggestions. We have addressed each of your comments and made the necessary revisions to the manuscript.
Please see the attachment.
Once again, we sincerely appreciate your time and effort in reviewing our paper. Your valuable insights have been immensely helpful.
Author Response File: Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsLines 114 to 117. This can not be considered a contribution.
Line 201: be careful with the notation. In xt1 one must have t1 as a subscript.
Lines 325 and 326: information regarding R must be clarified. Is R a variable? The authors point out that R is a process. In this sense, in Equations (1) and (2), what should the readers understand regarding R?
Lines 425 to 430: How were the values ​​chosen? Explain how the learning rate was chosen. Explain how the search for the best values ​​was carried out?
There are many problems related to formatting references:
- The name of the journal must be capitalized with the first letter of each word. Example: Exchange “IEEE transactions on pattern analysis and machine intelligence” for “IEEE Transactions on Pattern Analysis and Machine Intelligence”.
- There is a need to standardize: Volume versus vol.; Pages versus pp.
Author Response
Thank you for your invaluable suggestions. We have addressed each of your comments and made the necessary revisions to the manuscript.
Please see the attachment.
Once again, we sincerely appreciate your time and effort in reviewing our paper. Your valuable insights have been immensely helpful.
Author Response File: Author Response.docx
Reviewer 3 Report
Comments and Suggestions for AuthorsDear authors,
I enjoyed reading your paper, Your approach to simulating human eye movements to enhance semantic segmentation is particularly interesting. but I have two points I think that need to be explained further and fixed:
1. Figure 5 Clarity: Could you provide more detail in the text explaining the different elements of Figure 5? I have trouble understanding this AEMA attention mechanism.
2. Minor Typo: There appears to be a typo in Figure 7, where "Gaze" is spelled "Caze."
kind regards,
Author Response
Thank you for your invaluable suggestions. We have addressed each of your comments and made the necessary revisions to the manuscript.
Please see the attachment.
Once again, we sincerely appreciate your time and effort in reviewing our paper. Your valuable insights have been immensely helpful.
Author Response File: Author Response.docx