Next Article in Journal
Application of Machine Learning to Debris Flow Susceptibility Mapping along the China–Pakistan Karakoram Highway
Previous Article in Journal
C-Band Dual-Doppler Retrievals in Complex Terrain: Improving the Knowledge of Severe Storm Dynamics in Catalonia
 
 
Communication
Peer-Review Record

Multi-Scale Residual Deep Network for Semantic Segmentation of Buildings with Regularizer of Shape Representation

Remote Sens. 2020, 12(18), 2932; https://doi.org/10.3390/rs12182932
by Chengyi Wang 1,† and Lianfa Li 2,*,†
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Remote Sens. 2020, 12(18), 2932; https://doi.org/10.3390/rs12182932
Submission received: 15 July 2020 / Revised: 24 August 2020 / Accepted: 31 August 2020 / Published: 10 September 2020

Round 1

Reviewer 1 Report

This paper proposes a semantic segmentation method on multi-scale remote sensed images based on a residual autoencoder deep leaning, U-Net. The proposed  model is used to segment buildings. The paper is organized onto five sections. In the introduction, authors criticize few deep learning architectures for semantic segmentation of buildings. Authors should compare concrete results obtained via these architectures to their proposed model results. A table summarizing at least results obtained with global convolutional, deepLab Version 3+, U-Net, autoencoder networks  and compared to results presented in table 2. Also authors could add a column "Residual multi-scale model with shape regularizer" to improve the consideration of the shape  representation. In addition, the section four (Results and discussion) do not compare the proposed method with the similar architecture proposed by Lin et al(Ref. Number 52). Some metrics should be added like F-measure, recall, confusion matrix. 

The section Two "Deep Residual Segmentation Method with shape representation and multi-scaling" is the main contribution of the paper.  This section intends to describe the principal components of the proposed model. following the reading of this section, hereafter a few questions that arise from the lack of detail, lack of precision and sometimes contradictions:

1) The figure 1 do not illustrate the U-Net structure as said line 137. So it is important to clarify first residual U-Net structure and then to explain how a residual unit was enhanced. What are the differences between traditional U-Net,  residual U-Net architecture and residual regularizer U-Net

2) The ASPP is not detailed. It should be added as a subsection

3) the ensemble learning of multi-scale models is not clear. In fact we can imagine a generalization of atrous dilatation which could be related to the scale, but here only 3 static scales are considered. So how you consider your model as a generalization to multi-scale building's detection?

4) The concatenation operator of the different ASPP is not defined. So what kind of concatenation operator is used by your model?

5) In line 197, the term "morphological feature of buildings" is used to say shape? building morphology  and shape are two different concepts! So what is the morphological feature?

6) In line 200,   mask labels of the training samples are used for shape representation autoencoder pre-training. How many  masks are used? How they are  constructed? And how many classes of masks?

7) What is the reason to use 3 models and not only one with scale adaptation? If we have a massive number of scale (N) we should create a model per scale?

8)Why a filtration step is used to remove boundaries? and how the distance of 16 pixels is chosen?

9)the paper seems to be a draft version and in writing  progress. So it is hard to read!

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The manuscript has been revised according to the comments. It is noted that the figures need to be revised so that the characters are more readable.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

All comments are well considered by the authors. Some minor corrections should be given to introduction line 60 and line 85 .. the term ."Recently" is used to describe new methods but deep learning are a kind of machine learning. So the state of the art of methods could  be more structured.

The conclusion is short and could be organized into 3 parts : recall of the problem and solutions  proposed, recall of results (pros and cons of the methodology) and the perspective !

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop