Next Article in Journal
A Machine-Learning Model Based on the Fusion of Spectral and Textural Features from UAV Multi-Sensors to Analyse the Total Nitrogen Content in Winter Wheat
Previous Article in Journal
Attention-Embedded Triple-Fusion Branch CNN for Hyperspectral Image Classification
 
 
Article
Peer-Review Record

An Improved SAR Image Semantic Segmentation Deeplabv3+ Network Based on the Feature Post-Processing Module

Remote Sens. 2023, 15(8), 2153; https://doi.org/10.3390/rs15082153
by Qiupeng Li * and Yingying Kong
Reviewer 1:
Reviewer 2:
Reviewer 3: Anonymous
Remote Sens. 2023, 15(8), 2153; https://doi.org/10.3390/rs15082153
Submission received: 7 February 2023 / Revised: 14 April 2023 / Accepted: 15 April 2023 / Published: 19 April 2023

Round 1

Reviewer 1 Report

In this work proposed the Deeplabv3+ network to address the semantic segmentation problem of SAR images. The authors have combined several published modules into Deeplabv3+ and have demonstrated improved performance. The language is also acceptable. However, the authors still need more materials and experiments to prove such a combination of published modules is efficient and novel enough.

1. The author mentioned that the Coordinate Attention(CA) method can effectively captures the relationship between location information and channel information. To validate this, it is better to compare with more recently published attention mechanisms, such as SENet, GALA and CBAM, as listed below. These works are open-source accessible and should be compared in the next version.

[1] Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.

[2] A Multi-Scale Feature Pyramid Network for Detection and Instance Segmentation of Marine Ships in SAR Images. Remote Sens. 2022, 14, 6312.

[3] Squeeze-and-Excitation Networks, CVPR, 2018:7132-7141. 10.1109/cvpr.2018.00745.

[4] ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020.

2. This work presented an improved ASPP method, and the 3×3 void convolution in ASPP was decomposed into 2D, which can maintain the void ratio and effectively reduce the calculation of the module, shorten the training time, and improve the semantic segmentation effect. To prove the efficiency of such a modification, it is necessary to compare the performance of the original ASPP method with their ASPP method.

3. The font used in some pictures is not consistent, for example, figure 1.

4. The language is in general acceptable, but still can be improved.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

1)

In the introduction, it seems necessary to summarize the problem to be solved from the existing network in a single paragraph. Problems that appear in various networks have been pointed out, and various methods have been proposed. However, since there is no matching between problems and methods, it is judged that it will be easier to understand if they are summarized.

 

2)

Chapter 2 is part of the model's description constructed in this paper rather than a literature review. In the introductory part, representative semantic segmentation models are reviewed, and issues about segmentation performance and computational resource efficiency for SAR images are raised. Chapter 2 should be included in Chapter 3 to explain improvements of "deeplabv3+".

 

3)

The structure of the deeplabv3+ architecture has been improved with various performance improvement modules and techniques, but the purpose of each modification needs to be clarified. It would be nice to summarize the modifications made in the existing deeplabv3+ in a table.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

I found the text difficult to read. There are many typing errors and confusing paragraphs, for instance. Also, an adjustment in the structure of the text could be done; for example, presenting some important concepts, and its reference papers, previously to the explanation of the proposed solution. In the following lines, some observations are presented. A careful revision must be made.

 

- The term "Novel" is used in the title, but "improved" is used in the text. The title should be suitably chosen.

 

- In the "Abstract", the results are not presented.

 

- "Deeplabv3+" and "Deeplabv3 +" [space added] are used. One must be chosen and used. (see line 12 and Keywords, page 1, for example)

 

- Line 13. "this paper introduced ..."; maybe "this paper added ..." should be more suitable.

 

- Line 41. Does "video memory" refer to "GPU RAM"?

 

- Line 48. Is "codec" synonym of "encoder-decoder"? If so, the last one is preferred because it appears in illustrative figures.

 

- Line 53. Check whether "ZHANG Zejun" is the correct form for this kind of citation.

 

- "Introduction". "Results" and "document structure" should be presented.

 

- Section 2 "Literature Review". Although its title, only one reference ([12]) is cited.

 

- Line 91. "Mobilenet-v2 is a further improvement on ...". Is it necessary to provide this kind of information?

 

- Line 100. "... Coordinated Information" or "Coordinate Information"?

 

- Line 108. What does "remote spatial interaction with accurate location information" mean?

 

- Section 2. Formula (1) is mentioned in the text, but the other ones are not. Besides, the representation for variable for "sum" in Formula (1) is written differently in formulas (2) and (3). 

 

- Line 122. "... height ..." -> "... width ..." 

 

- Lines 127 - 130. The statement should be detailed and reference papers presented.

 

- Line 139. What is "C"?

 

- Line 143. "fw", no subscript. 

 

- Line 153. "Deeplabv3+is" (without space after "+"). 

 

- Figure 1. "rate" was not explained.

 

- Figure 2. "XC" is not explained. 

 

- Line 195. "...w, As shown..." (Capital A after comma).

 

- Figure 3. GAP is acronym for ...? (the same for NSS, line 227, and MVG, line 261)

 

- Lines 237-238. Why N=96x96?

 

- Figure 4. Is "0.03" correct for the first orange bar?

 

- Lines 340-355. Difficulties for understanding, confusing.

 

- "Results". Similarity between the scenes in 2011 and 2017 is not clear. 

 

- Formula (20). What are P and T?

 

- Line 389. "... Section 3.4, ...". Check the section number.

 

- Figures on results are split into two pages.

 

- Line 586. MIOU?

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors have addressed most of my concerns, except the first one (see below). Not all the works were properly compared, and at least one work was wrongly cited. I think the comparison with these recent works is important. please make sure to compare your result with these models.

1. The author mentioned that the Coordinate Attention(CA) method can effectively captures the relationship between location information and channel information. To validate this, it is better to compare with more recently published attention mechanisms, such as SENet, GALA and CBAM, as listed below. These works are open-source accessible and should be compared in the next version.

[1] Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.

[2] A Multi-Scale Feature Pyramid Network for Detection and Instance Segmentation of Marine Ships in SAR Images. Remote Sens. 2022, 14, 6312.

[3] Squeeze-and-Excitation Networks, CVPR, 2018:7132-7141. 10.1109/cvpr.2018.00745.

[4] ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020.

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Some adjustments were applied and information was included, but it seems a careful revision was not made. Some inconsistencies follows.

Title: "Deepalabv3+" ("a" after Deep, and "l" in lower case; "L" in upper case is used in the Abstract)

Abstract: DeepLab v3+network [space before v3] (line 16) (see line 84 too)

"focus loss function" (lines 18, 97) and "Focal loss function" (line 24, Keywords)

"3.3 Focus loss function" (line 312) and "focal loss function" (line 320) [Same Subsection)

 

"added" (line 13, past tense) "uses" (line 18, present tense)

 

NSS (Natural image statistics( (line 260) [Is second "S" for "image"?]

 

MVG (Model View Controller) (line 294) [Is "G" for "Controller"?]

 

Point 4 (12) : line number, after revision, is referred wrongly (Cover Letter)

 

Point 6: "Chapter" is used in Cover Letter, "Section" is used in the manuscript

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop