Next Article in Journal
Assessment of the IMERG Early-Run Precipitation Estimates over South American Country of Chile
Previous Article in Journal
Scale Factor Determination for the GRACE Follow-On Laser Ranging Interferometer Including Thermal Coupling
 
 
Article
Peer-Review Record

Multi-Source Interactive Stair Attention for Remote Sensing Image Captioning

Remote Sens. 2023, 15(3), 579; https://doi.org/10.3390/rs15030579
by Xiangrong Zhang, Yunpeng Li, Xin Wang, Feixiang Liu, Zhaoji Wu, Xina Cheng * and Licheng Jiao
Reviewer 2:
Reviewer 3:
Remote Sens. 2023, 15(3), 579; https://doi.org/10.3390/rs15030579
Submission received: 12 November 2022 / Revised: 5 January 2023 / Accepted: 11 January 2023 / Published: 18 January 2023

Round 1

Reviewer 1 Report

Please to read the attached file, thank you.

Comments for author File: Comments.pdf

Author Response

Please check the attached file.

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper proposes a novel stair attention mechanism for multi-source interactive. The stair attention is designed to highlights the most concerned image region and weaken non critical features. Meanwhile, the CIDEr-based reward method is adopted to enhance the quality of long-range transitions and make the model more stable.

Although the model is novel and has good performance, I have the following questions and suggestions: 

1. What is the design basis of Formula 12? Or why based on max and min values?

2. Is there a visual result of stair attention?

3. The structure of Fig3 can be clearer and include detailed execution processes.

Author Response

Please check the attached file.

Author Response File: Author Response.pdf

Reviewer 3 Report

The author of this paper proposed a multi-source interactive stair attention (MSISAM) framework for captioning the remote sensing geo-images. The algorithm they proposed mainly contains two parts. The MSIAM in front can selectively pay attention to the feature maps, then a stair attention network was used to adjust the weight. Based on their validation, their method achieved a competitive performance compared with other captioning methods. The work in the paper is interesting and meaningful. The overall quality of the manuscript is good, but the analysis of their experimental validation is not comprehensive enough. Therefore, I would suggest the authors make some revisions to their manuscript before formally publishing.

 

To sum up,

-Advantages:

1. The performance of their framework is competitive.

2. The methodology presentation is clear.

3. The method in the paper may have good application prospects.

 

-Major disadvantages:

4. Could the authors make an overall evaluation of soccer based on different types of evaluation metrics? Table 1-6 contains 7 metrics, which is a little “messy” for readers. I suggest the authors count how many metrics that a method dominates, and show it in the last column.

 

5. The manuscript lacks a report and analysis of deficiencies. I suggest the author should give some examples of their framework errors or mistakes in captioning made by their framework, and make a brief discussion on it.

 

6. What is the time consumption of their methods? I suggest the author report their method’s time consumption of training and testing procedure, and it would be better to compare with other methods.

 

-Small flaws:

7. There are some typos and typesetting errors. Eg. In table 2, there are two bold scores in Meteor and Rouge, and none for Bleu4. Please revise it and proofread it carefully.

 

8. Could the authors point out how large the GPU memory consumption is for their method?

 

 

9. If possible, please make the images in figures 5 and 6 larger. It is a little bit blurry for me.

Author Response

Please check the attached file.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

The revision looks good! I don't have further suggestions.

Back to TopTop