Next Article in Journal
Risk Assessment and Analysis of Its Influencing Factors of Debris Flows in Typical Arid Mountain Environment: A Case Study of Central Tien Shan Mountains, China
Previous Article in Journal
Satellite Estimation of pCO2 and Quantification of CO2 Fluxes in China’s Chagan Lake in the Context of Climate Change
 
 
Technical Note
Peer-Review Record

Exploring Self-Supervised Learning for Multi-Modal Remote Sensing Pre-Training via Asymmetric Attention Fusion

Remote Sens. 2023, 15(24), 5682; https://doi.org/10.3390/rs15245682
by Guozheng Xu 1, Xue Jiang 1,*, Xiangtai Li 2, Ze Zhang 1 and Xingzhao Liu 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2023, 15(24), 5682; https://doi.org/10.3390/rs15245682
Submission received: 14 October 2023 / Revised: 30 November 2023 / Accepted: 6 December 2023 / Published: 10 December 2023
(This article belongs to the Section AI Remote Sensing)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Authors have proposed an Asymmetric Attention Fusion (AAF) framework to explore the potential of multi-modal representation learning, compared to two simple fusion methods: early fusion and late fusion. However, some of the major observations are as follows:

1.     Under the abstract section, the experimental results have not been mentioned including the comparative analysis.

2.     No figure comes first before writing the text. Check out the same error throughout the manuscript.

3.     In Fig. 1 no information about the input images. What is the meaning of the layer in this model, please elaborate on each & every point.

4.     In the introduction part, some of the abbreviations are unspecified such as Moco, BYOL, GATE etc. Check this out throughout the manuscript.

5.     Each reference must start with Kumar et al. [12].

6.      Representation of the objectives needs to be revised with a brief statement instead of detailed information. However, it is recommended to make or mention these points in the form of a paragraph.

7.      Could you add a table of comparison to explain or represent the state-of the-techniques in the SSL under section “related work”

8.       In Fig. 5, put a separator in the figure with borders on each sub-image.

9.      In Table 5, what is the meaning of w/o, explain it in the footnotes of the Table. Also, check the same throughout the tables.

10.   The conclusion needs to be enhanced on the basis on the more specific towards the outcomes of the study.

 

11.   Recheck the reference number “45”. “Contributor”..?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper proposed an augmentation-invariant self-supervised learning method for multi-modal remote sensing image representation learning. The training process consists of two stages: self-supervised learning with unlabeled data and fine-tuning with limited labeled data. AAF module is proposed to fuse multi-modal remote sensing data asymmetrically. In addition, a transfer GATE module is devised to enhance downstream task performance.

 

The paper is well-written and easy to understand. However, I think there is still some room for improvement in this paper, which is mainly reflected in the following aspects:

 

1. In Section 2.1, I recommend that the authors incorporate additional discussions to analyze the limitations of current self-supervised learning methods in the field of remote sensing.

 

2. The motivations of the AAF is unclear. I suggest the authors add more descriptions in the Introduction.

 

3. In the proposed GATE module, channel attention, spatial attention, and scale attention operate sequentially. Please add additional experiments to thoroughly analyze the performance when these three attentions are arranged in parallel.

 

4. Several recently developed unsupervised multi-modal remote sensing data classification methods have not been discussed in the Introduction. These include the Spatial-Spectral Masked Auto-encoder and Nearest Neighbor-Based Contrastive Learning.

 

5. I suggest the authors to add more recent multi-modal fusion methods for comparative study in the experiments.

 

6. There are many critical parameters in the proposed methods, such as the details of the shared MLP in channel attention, number of layers in the backbone. Please add more experiments to analyze these parameters.

 

7. The bibliography should be updated by adding some more recent works, and it will be better to perform a thorough spellchecking and proofreading, preferably by a native speaker.

Comments on the Quality of English Language

English language is fine.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

In this paper, exploration on self-supervised learning for multi-modal remote sensing has been practiced. The Asymmetric Attention Fusion (AAF) framework is proposed to explore the potential of multi-modal representation learning. And the TransferGATE module is designed to select more useful information from fused features for different downstream tasks. The research topic is interesting and the following issues should be concerned:

                                                                                                     

(1). In section 3.3, there is some confusion here between the symbols and the actual meaning. channel attention and spatial attention do not agree with the meaning of the symbols as interpreted. For example, in the section that is supposed to deal with channel attention, the notation for spatial attention is used.

(2). For a more intuitive understanding, the SA (Self Attention) in Figure 2 should be aligned with that of SA (Soft Attention) in 3.2.

(3). The English expression of this paper should be polished further.

Comments on the Quality of English Language

The English expression of this paper should be polished further.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Authors have made all the necessary corrections

Back to TopTop