Next Article in Journal
Monitoring and Stability Analysis of the Deformation in the Woda Landslide Area in Tibet, China by the DS-InSAR Method
Previous Article in Journal
Efficient Instance Segmentation Paradigm for Interpreting SAR and Optical Images
 
 
Article
Peer-Review Record

Semantic Segmentation of Very-High-Resolution Remote Sensing Images via Deep Multi-Feature Learning

Remote Sens. 2022, 14(3), 533; https://doi.org/10.3390/rs14030533
by Yanzhou Su 1, Jian Cheng 1,*, Haiwei Bai 1, Haijun Liu 2 and Changtao He 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2022, 14(3), 533; https://doi.org/10.3390/rs14030533
Submission received: 17 December 2021 / Revised: 12 January 2022 / Accepted: 18 January 2022 / Published: 23 January 2022
(This article belongs to the Section Remote Sensing Image Processing)

Round 1

Reviewer 1 Report

This paper proposes a multi-feature learning framework for very-high-resolution remote sensing images with the aim to improve feature learning. Specifically, contexture features, low-level features and class-specific discriminative features are extracted from different branches and are fused for classification. Experimental results on real data sets demonstrate superior performance than existing methods. In general, the paper is well organized and can be easily understood. My comments are:

1) The paper can be improved by thorough proofreading as there are many grammatical errors and typos. I list some of them:

Line 209: “the convolution operation are only process”

Line 216: “a output feature”;

Line 219: “Expect the three…”;

Figure 5: “(b) prediction with” -> “(d) prediction with”:

Line 261: “is learning from”;

Line 471: “expensive experiments…”

 

2) Line 233: How the features F_{con}^{‘} are grouped is not clear. This is done by supervised classification or some clustering algorithms? The authors need to provide additional details on it.

3) Give formally the definition of the losses L_{ce} and L_{aux} and motive why L_{aux} is used in the model.

4) Inconsistent forms of titles in subsections, figures and tables. The first letter in the caption should be capitalized.

5) Since the developed architecture consists of multiple branches, it is interesting to see the complexity of the model. Thus, analysis and comparison on time complexity/running time are necessary.

6) In figure 8, what is the baseline?

Author Response

Manuscript remotesensing-1536555

Response to Reviewer

 

Dear reviewer,

 

Thank you for giving us the opportunity to submit a revised draft of the manuscript “Semantic Segmentation of Very-High-Resolution Remote Sensing Images via Deep Multi-Feature Learning” to Remote Sensing. We appreciate the time and effort that you dedicated to providing positive feedback on our manuscript and are grateful for the insightful comments on and valuable improvements to our paper. We have incorporated most of the suggestions made by the reviewer. Those changes are highlighted within the manuscript.

 

Appended to this letter is our point-by-point response to the comments raised by the reviewers. The comments are reproduced and our responses are given directly afterward in a different color (blue).

 

Comments from Reviewer,

This paper proposes a multi-feature learning framework for very-high-resolution remote sensing images with the aim to improve feature learning. Specifically, contexture features, low-level features and class-specific discriminative features are extracted from different branches and are fused for classification. Experimental results on real data sets demonstrate superior performance than existing methods. In general, the paper is well organized and can be easily understood. My comments are:

1) The paper can be improved by thorough proofreading as there are many grammatical errors and typos. I list some of them:

Line 209: “the convolution operation are only process”

Line 216: “a output feature”;

Line 219: “Expect the three…”;

Figure 5: “(b) prediction with” -> “(d) prediction with”:

Line 261: “is learning from”;

Line 471: “expensive experiments…”

Author response: Thank you for underlining this deficiency. We check and correct the grammatical errors and typos in revised manuscripts.

2) Line 233: How the features F_{con}^{‘} are grouped is not clear. This is done by supervised classification or some clustering algorithms? The authors need to provide additional details on it.

Author response: We are extremely grateful to reviewer for pointing out this problem. We add the detailed information in corresponding place. You can find it in revised version. To be more clear and in accordance with the reviewer concerns, we have added a brief description as follows: we group the feature  into a set of group , =1,...,c, in our implementation, we split the feature map along the channel axis with a total of c groups corresponding to c classes, and each group  is a 2-channel feature map, one of which is , and the other is , fg means the foreground feature, while bg is the background feature.

3) Give formally the definition of the losses L_{ce} and L_{aux} and motive why L_{aux} is used in the model.

Author response: We are extremely grateful to reviewer for pointing out this problem. We add the definition of the losses and  in revised version. To be more clear and in accordance with the reviewer concerns, we have added a brief description about the . The auxiliary loss is also used as a common practice by numerous advanced models, such as, PSPNet[1], CCNet[2], DANet[3]. It helps optimize the learning process, while the  takes the most responsibility.

4) Inconsistent forms of titles in subsections, figures and tables. The first letter in the caption should be capitalized.

Author response: We are extremely grateful to reviewer for pointing out this problem. We check and modify the inconsistent forms in revised manuscripts.

5) Since the developed architecture consists of multiple branches, it is interesting to see the complexity of the model. Thus, analysis and comparison on time complexity/running time are necessary.

Author response: We are extremely grateful to reviewer for pointing out this problem. We add a subsection – Computational Complexity to list the complexity of the model with some state-of-the-art models. Macs (number of multiply-accumulate operations or floating point operations), model parameters and the GPU inference time (FPS) are adopted to evaluate the computational complexity between our model and some state-of-the-art methods, detailed information can be seen in Table.14 in revised manuscripts.

6) In figure 8, what is the baseline?

Author response: We are extremely grateful to reviewer for pointing out this problem. Baseline is FPN. We add it in Fig.8.

Sincerely,

Yanzhou Su

Reference:

[1] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2881-2890.

[2] Huang Z, Wang X, Huang L, et al. Ccnet: Criss-cross attention for semantic segmentation[C],  Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 603-612.

[3] Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation[C], Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3146-3154.

Author Response File: Author Response.docx

Reviewer 2 Report

Solid work that is publication-worthy.

A proof-reading is required though, since some spelling and grammar mistakes were detected. For example, the caption of Fig. 3 has a minor spelling mistake.

Author Response

Manuscript remotesensing-1536555

Response to Reviewer

 

Dear reviewer,

 

Thank you for giving us the opportunity to submit a revised draft of the manuscript “Semantic Segmentation of Very-High-Resolution Remote Sensing Images via Deep Multi-Feature Learning” to Remote Sensing. We appreciate the time and effort that you dedicated to providing positive feedback on our manuscript and are grateful for the insightful comments on and valuable improvements to our paper. We have incorporated most of the suggestions made by the reviewer. Those changes are highlighted within the manuscript.

 

Appended to this letter is our point-by-point response to the comments raised by the reviewers. The comments are reproduced and our responses are given directly afterward in a different color (blue).

 

Comments from Reviewer,

Solid work that is publication-worthy.

A proof-reading is required though, since some spelling and grammar mistakes were detected. For example, the caption of Fig. 3 has a minor spelling mistake.

Author response: Thank you for your suggestion. We agree with this and have incorporated your suggestion throughout the manuscript. We do my best to check and correct the spelling and grammar mistake.

Sincerely,

Yanzhou Su

Author Response File: Author Response.docx

Reviewer 3 Report

The manuscript has developed new methods of deep learning for semantic segmentation of satellite images.  

I have a few comments

  1. I found that the text has too many misprints.
  2. The abstract is not informative. There is no goal and there is novelty shown.
  3. The overview of the methods has to be shortened, this is the well know information.
  4. All acronyms have to be disclosed.
  5.  You need to take different datasets to get more optimal comparisons. 

Author Response

Manuscript remotesensing-1536555

Response to Reviewer

 

Dear reviewer,

 

Thank you for giving us the opportunity to submit a revised draft of the manuscript “Semantic Segmentation of Very-High-Resolution Remote Sensing Images via Deep Multi-Feature Learning” to Remote Sensing. We appreciate the time and effort that you dedicated to providing positive feedback on our manuscript and are grateful for the insightful comments on and valuable improvements to our paper. We have incorporated most of the suggestions made by the reviewer. Those changes are highlighted within the manuscript.

 

Appended to this letter is our point-by-point response to the comments raised by the reviewers. The comments are reproduced and our responses are given directly afterward in a different color (blue).

 

Comments from Reviewer,

The manuscript has developed new methods of deep learning for semantic segmentation of satellite images.

I have a few comments

  1. I found that the text has too many misprints.

Author response: Thank you for underlining this deficiency. We check and correct those errors in revised version.

 

  1. The abstract is not informative. There is no goal and there is novelty shown.

Author response: Thank you for pointing this out. We rewrite the abstract in revised version to highlight our goal and novelty.

 

  1. The overview of the methods has to be shortened, this is the well know information.

Author response: Thank you for your suggestion. We rewrite the overview in the revised version to remove a lot of unnecessary information, which, as the reviewer said, is well known information. You can check the detailed information in revised manuscripts.

 

  1. All acronyms have to be disclosed.

Author response: Thank you for your suggestion. We check and correct this case in revised version,

 

  1. You need to take different datasets to get more optimal comparisons.

Author response: Thank you for your suggestion. We add the UAVid dataset to evaluate our proposed method, The UAVid dataset is an aerial image segmentation dataset for UAV (Unmanned Aerial Vehicle) tilt angle capture of urban scenes, detailed information is discussed in [1]. Detailed experiments are listed in Table.13 in revised manuscripts.

 

Sincerely,

Yanzhou Su

 

Reference:

[1] Lyu Y, Vosselman G, Xia G S, et al. UAVid: A semantic segmentation dataset for UAV imagery[J]. ISPRS journal of photogrammetry and remote sensing, 2020, 165: 108-119.

 

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

I have no more questions.

Reviewer 3 Report

The manuscript has been improved and may satisfy the journal requirements. 

Back to TopTop