Next Article in Journal
CURI-YOLOv7: A Lightweight YOLOv7tiny Target Detector for Citrus Trees from UAV Remote Sensing Imagery Based on Embedded Device
Next Article in Special Issue
A New Architecture of a Complex-Valued Convolutional Neural Network for PolSAR Image Classification
Previous Article in Journal
Analysis of Infrared Spectral Radiance of O2 1.27 μm Band Based on Space-Based Limb Detection
Previous Article in Special Issue
TCUNet: A Lightweight Dual-Branch Parallel Network for Sea–Land Segmentation in Remote Sensing Images
 
 
Article
Peer-Review Record

GLF-Net: A Semantic Segmentation Model Fusing Global and Local Features for High-Resolution Remote Sensing Images

Remote Sens. 2023, 15(19), 4649; https://doi.org/10.3390/rs15194649
by Wanying Song 1,*, Xinwei Zhou 1, Shiru Zhang 1, Yan Wu 2 and Peng Zhang 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Remote Sens. 2023, 15(19), 4649; https://doi.org/10.3390/rs15194649
Submission received: 8 August 2023 / Revised: 18 September 2023 / Accepted: 19 September 2023 / Published: 22 September 2023
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)

Round 1

Reviewer 1 Report

The paper proposes GLF-Net A Semantic Segmentation Model Fusing Global and Local Features for High-Resolution Remote Sensing Images, which uses three models: Covariance Attention Module, Fine-Grained Extraction Module, and wavelet self-attention module. The paper is interesting to read and follow. However, the paper has the following issues to be fixed.

1. What is the meaning of F in equation (1) and what do i, m, N, C, H, W mean respectively?

2. What do w, h, t, and D stand for in equation (3)? Is it the same meaning as the letter w, h that is shared with the global feature?

3. 3.1. the first paragraph of the Global Feature Extraction section, how do semantic information, and location information relate to multi-scale contexts and different target scales?

4. Why do you extract the third, fourth, and fifth layers of Resnet for contextual information?

5. Figures 1 and 2 are a bit blurry, increase the resolution.

The paper is written in a understandable English.

Author Response

Dear Reviewer #1,

Thank you so much for your guidance on and approval to our manuscript. The comments concerning our manuscript are all valuable and they are helpful for revising and improving our paper, as well as guiding our research. We carefully revised the manuscript according to your comments, and gave our specific answers in the attachment, hoping to get your approval.

Kind regards,

Sincerely yours Wanying.

Author Response File: Author Response.docx

Reviewer 2 Report

 The paper introduces GLF-Net architecture for performing semantic segmentation on high-resolution remote sensing images. GLF-Net enhances the processing of these images by integrating contextual data and refining detailed features. The architecture includes an encoder-decoder network based on ResNet50. The Covariance Attention Module captures features of different scales from ResNet stages, while the Local Fine-Grained Extraction Module refines feature maps by encoding semantic and spatial data. A wavelet self-attention module harmonizes high and low-frequency information in the decoder stage. Experiments on two real datasets validate GLF-Net's effectiveness in improving the segmentation accuracy of high-resolution remote sensing images.  
    The paper reads well and the experiments are convincing. I have the following comments that need to be addressed for the next round of reviews:   1. One of the important challenges for using UNet-based architectures for semantic segmentation is the need for large-scale data annotation. This challenge has been addressed by unsupervised domain adaptation:   a. Uncertainty reduction for model adaptation in semantic segmentation, 2021   b. Scale variance minimization for unsupervised domain adaptation in image segmentation. Pattern Recognition, 2021   c. Source-free domain adaptation for semantic segmentation, CVPR2021   d. Domain Adaptation for the Segmentation of Confidential Medical Images, BMVC2022   e. Unsupervised domain adaptation for semantic segmentation of high-resolution remote sensing imagery driven by category-certainty attention. IEEE Transactions on Geoscience and Remote Sensing, 2022   I think explaining the challenge of data annotation and the above work in the "related work" section, can broaden the context for the reader.   2. Figure 14 is not clear to me. Please add more explanations about your observations and conclusion for this figure.   3. Please repeat the experiments and report both the average performance and the standard deviation values to make your comparisons statistically meaningful. This is particularly important because the numbers on some tables are too close. For example, the number of Table 5 are too close to make a conclusive observation.   4. On Figures with segmentation, could you also add the quantitative metric values below each image? This addition will allow quantitative and qualitative comparisons possible.   5. Please release the code for this work on a public domain such as GitHub to make reproducing the results straightforward.

Author Response

Dear Reviewer #2,

Thank you so much for your guidance on and approval to our manuscript. The comments concerning our manuscript are all valuable and they are helpful for revising and improving our paper, as well as guiding our research. We carefully revised the manuscript according to your comments, and gave our specific answers in the attachment, hoping to get your approval.

Kind regards,

Sincerely yours Wanying.

Author Response File: Author Response.docx

Reviewer 3 Report


Comments for author File: Comments.pdf

Authors are recommended to carefully check the entire manuscript, including misspellings, unreadable descriptions, or chaotic logical expressions. We list some cases in the attachment above, but not limited to these.

Author Response

Dear Reviewer #3,

Thank you very much for your guidance and approval of our manuscript. Comments on our manuscript are valuable and help revise and improve our paper and guide our research. We have carefully revised the manuscript based on your comments and have given our specific responses in the attachment. We hope to get your approval.

Kind regards,

Sincerely yours Wanying.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The authors have resolved all my previous concerns. This paper is ready for publication now.

 English in this papaer is easy to understand.

Author Response

Dear Reviewer #1,

Thank you so much for your guidance on and approval to our manuscript. 

Kind regards,

Sincerely yours Wanying.

Reviewer 2 Report

The authors have addressed my concerns well.

Author Response

Dear Reviewer #2,

Thank you so much for your guidance on and approval to our manuscript. 

Kind regards,

Sincerely yours Wanying.

Reviewer 3 Report

The authors have addressed all the comments, but I think there are even some problems that need to be revised before publication:

1. The authors categorize two kinds of attention mechanisms, but all the descriptions are not perfectly accurate.

(1) “One is to use the extracted global information to enhance the local regions or channels, rather than using the global information directly as a feature representation.” As the following examples, does it mean one is to learn the attention weights from global statistical information to enhance the key local regions or channels, and what is the meaning of using the global information directly as a feature representation, please give an example.

(2) “The other is to simultaneously extract global and local features for feature enhancement.” This one is even more ambiguous. From the subsequent examples, DANet is a self-attention mechanism to capture global information directly, and CBAM is similar as SENet to learn attention map. What in common is that they all combine the advantages of channel and spatial attentions but not the global and local features.

2. The attention mechanism is to model the contextual dependencies over local and global features, but not to extract global and local features. The authors are advised to revise the wording in the whole manuscript. Such as, “Models that introduce both global and local attention also bring higher complexity and stronger training difficulty.” The authors did not mention models which combine global and local attention (such as the local attention model CoT[1]) earlier. Dose it mean the channel and spatial attention as DANet, where the self-attention mechanism consumes large computation to get correlation matrix. 

[1] Li Y, Yao T. Pan Y, et al. Contextual transformer networks for visual recognition(J]. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2022.45(2): 1489-1500

3. Reference [23] uses covariance matrices to modify the dependencies over global and local cues, but not to extract global and local features separately.

Comments for author File: Comments.docx

Authors are recommended to carefully check the manuscript.

Author Response

Dear Reviewer #3,

非常感谢您对我们稿件的指导和批准。对我们稿件的评论很有价值,有助于修改和改进我们的论文并指导我们的研究。我们根据您的意见仔细修改了稿件,并在附件中给出了我们的具体回应。我们希望得到您的认可。

亲切问候

真诚的你的婉莹。

Author Response File: Author Response.docx

Back to TopTop