Next Article in Journal
Reducing Motion Blur in Ghost Imaging Via the Hessian Matrix
Previous Article in Journal
Oil Sludge Deposition in Storage Tanks: A Case Study for Russian Crude Oil in Mo-he Station
 
 
Article
Peer-Review Record

A Deep Residual U-Type Network for Semantic Segmentation of Orchard Environments

Appl. Sci. 2021, 11(1), 322; https://doi.org/10.3390/app11010322
by Gaogao Shang, Gang Liu *, Peng Zhu, Jiangyi Han, Changgao Xia and Kun Jiang
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2021, 11(1), 322; https://doi.org/10.3390/app11010322
Submission received: 5 December 2020 / Revised: 21 December 2020 / Accepted: 26 December 2020 / Published: 31 December 2020

Round 1

Reviewer 1 Report

In the study,  an automatic semantic segmentation method for the orchard environment is proposed based on a deep residual U-Net model. Overall, the paper is well-written. However, there are some concerns that should be clarified and addressed in the manuscript.

1. It is stated in the manuscript that this paper introduces residual structures to the U-Net model. I believe this should not be specified as a novelty since there are many applications of residual U-nets, for example, the below studies use residual U-Nets.

Khanna, Anita, et al. "A deep Residual U-Net convolutional neural network for automated lung segmentation in computed tomography images." Biocybernetics and Biomedical Engineering 40.3 (2020): 1314-1327.

Zhang, Zhengxin, Qingjie Liu, and Yunhong Wang. "Road extraction by deep residual u-net." IEEE Geoscience and Remote Sensing Letters 15.5 (2018): 749-753.

2. It would be great if the authors could provide the data along with the labelled segmentation maps in the submission. I believe that this is important for the justification of the experimental evaluations. Because, the ground truth is not constructed by manually labelling the images.

3. Experimental evaluations should be improved by comparing more deep model structures, for example, models that are reviewed in L50-73 or L127-132.

Author Response

Answers to the Reviewers’ Comments

Dear Professor,

Thank you very much for your meaningful comments. We believe the comments have improved our manuscript to a different level, which is good for our research and reader’s understanding. We have revised the original manuscript carefully under the comments, and carefully proofread the manuscript to minimize the typographical, grammatical, and other errors. The changes to the manuscript have been highlighted in red in the revised manuscript. We hope that the revision is acceptable,and we look forward to hearing from you soon. Thanks again.

Sincerely yours,

Gaogao Shang, Gang Liu, Peng Zhu, Jiangyi Han, Changgao Xia, Kun Jiang

Responses to Reviewer #1

Reviewer #1:In the study, an automatic semantic segmentation method for the orchard environment is proposed based on a deep residual U-Net model. Overall, the paper is well-written. However, there are some concerns that should be clarified and addressed in the manuscript.

Q1: It is stated in the manuscript that this paper introduces residual structures to the U-Net model. I believe this should not be specified as a novelty since there are many applications of residual U-nets, for example, the below studies use residual U-Nets.

Khanna, Anita, et al. "A deep Residual U-Net convolutional neural network for automated lung segmentation in computed tomography images." Biocybernetics and Biomedical Engineering 40.3 (2020): 1314-1327.

Zhang, Zhengxin, Qingjie Liu, and Yunhong Wang. "Road extraction by deep residual u-net."IEEE Geoscience and Remote Sensing Letters 15.5 (2018): 749-753.

Answer1: Thank you for pointing this out. We have carefully read two papers you provided, and they are indeed very helpful to our research. This paper has cited and praised their research in the introduction. Let us explain the difference between this paper and the two papers presented.

Firstly, the application scenarios are different. The proposed paper 1 is used in biomedicine, which is of great help to the treatment of lung diseases; paper 2 is used to extract roads from aerial images and has achieved excellent results. Compared with the papers mentioned above, this paper is mainly applied to the segmentation of the orchard environment. And there are many categories in the image, which leads to the boundary information is difficult to be processed.

Secondly, this paper adopts a nine-layer deep residual U-type network. And the convolution kernel with a size of 1×1 and a step size of 1 and the batch normalization layer are used as the identity mapping function. The identity mapping and the residual unit are added before the ReLU activation function, and the feature information in the image can be better integrated;

Thirdly, in the bottleneck layer, this paper adopts three convolution modules which include 3*3, 1*1, and 3*3 convolution units to make the network model transition from the encoding layer to the decoding layer more smoothly and reduces the loss of feature information effectively;

Finally, the residual module is not used in the decoding layer, which reduces the parameters of the network model and improves the training speed of the model.

 

Q2: It would be great if the authors could provide the data along with the labelled segmentation maps in the submission. I believe that this is important for the justification of the experimental evaluations. Because the ground truth is not constructed by manually labelling the images.

Answer2: Thank you for pointing this out. This is indeed something that was neglected during the writing of the paper. Now in the revised draft of the paper, we have added the labelled segmentation maps. As shown in Figure 2 and Figure 7.

 

Q3: Experimental evaluations should be improved by comparing more deep model structures, for example, models that are reviewed in L50-73 or L127-132.

Answer3: Thank you for pointing this out. It does make sense to compare multiple sets of models. This paper has provided the fully convolutional neural network (FCN) in L127-132, as shown in Figure 5, Figure 7 (c) and Table 2. This paper also added the Front-end+Large network model of reference 12 mentioned in L50-73 as a comparative experiment and analyzed its results in detail. As shown in Figure 7, 8 and Table2. In future research, we will continue to follow your guidance and select multiple models for comparative analysis.

Author Response File: Author Response.docx

Reviewer 2 Report

The paper compares three possible network architectures to segment images in orchard environments.
In particular, each pixel is classified as Background, Road, Trees or Debris (Sundry).
The compared network architectures are: fully convolutional, U-Net, and U-Net with residuals.
The networks where tested on images acquired in orchard environments using a GoPro4 camera.
U-Net with residual gives the best results.

The writing is understandable, but grammar and syntax are awkward at times. The authors should proofread their paper.

The introduction of the Residual Network in Section 3.1 is too abrupt. An introduction should be added before this paragraph, to explain the general structure of the proposed network.

Please explain explicitly the differences between U-Net with residuals and the other two approaches. For example, a subsection could be added at the beginning of Section 4.

Page 11: "The main work of this paper includes the following points:" these should be listed in the introduction.

In Fig. 7, please use a different shade of gray for trees and road. Also, please add a legend in this image with the meaning of each color.
Table 1 should be removed, as your images are not in color anyway.

Minor errors:
Page 2: "the application scene is relatively single" clarify
Page 3: "environment the acquisition tool"
Page 4: "The error" capitalization
Page 4: "according to this problem, He" who is He?
Page 5: "The residual network consists of a series of stacking residual units." duplicate sentence
Page 5, Fig. 2b: please clarify that the line F(X_i) is not an additional connection (e.g. use a curly bracket?)
Page 7: "occupancy ration and accuracy" ration?
Page 9: "lead to insensitive to details"

Author Response

Answers to the Reviewers’ Comments

Dear Professor,

Thank you very much for your meaningful comments. We believe the comments have improved our manuscript to a different level, which is good for our research and reader’s understanding. We have revised the original manuscript carefully under the comments, and carefully proofread the manuscript to minimize the typographical, grammatical, and other errors. The changes to the manuscript have been highlighted in red in the revised manuscript. We hope that the revision is acceptable,and we look forward to hearing from you soon. Thanks again.

Sincerely yours,

Gaogao Shang, Gang Liu, Peng Zhu, Jiangyi Han, Changgao Xia, Kun Jiang

Responses to Reviewer #2

The paper compares three possible network architectures to segment images in orchard environments.
In particular, each pixel is classified as Background, Road, Trees or Debris (Sundry).
The compared network architectures are: fully convolutional, U-Net, and U-Net with residuals.
The networks where tested on images acquired in orchard environments using a GoPro4 camera.
U-Net with residual gives the best results.

The writing is understandable, but grammar and syntax are awkward at times. The authors should proofread their paper.

Q1:The introduction of the Residual Network in Section 3.1 is too abrupt. An introduction should be added before this paragraph, to explain the general structure of the proposed network.

Answer1: Thank you for pointing this out. This was indeed an oversight in our paper writing, and we have now added an introduction in section 3.1 to explain the general structure of the proposed network. As shown in line 165-175.

 

Q2: Please explain explicitly the differences between U-Net with residuals and the other two approaches. For example, a subsection could be added at the beginning of Section 4.

Answer2: Thank you for pointing this out. We have added “The fully convolutional neural network ... of the above four networks, as follows:” in section 4 to explain explicitly the differences between U-Net with residuals and the other three approaches. As shown in line 252-266.

 

Q3: Page 11: "The main work of this paper includes the following points:" these should be listed in the introduction.

Answer3: Thank you for pointing this out. We have listed the main work of this paper in the introduction. As shown in line93-102. And the conclusion has been modified accordingly. As shown in line 371-390.

 

Q4: In Fig. 7, please use a different shade of gray for trees and road. Also, please add a legend in this image with the meaning of each color.

Answer4: This is indeed our negligence in the writing of the paper. We have added a legend in the image with the meaning of each color and use the original result image to replace the grayscale result. As shown in Figure 2 and Figure 7.


Q5: Table 1 should be removed, as your images are not in color anyway.

Answer5: This was indeed an oversight in our paper writing. We have added the color image with legends. As shown in Figure 2 and Figure 7.

 

Q6: Minor errors:
1) Page 2: "the application scene is relatively single" clarify


2): Page 3: "environment the acquisition tool"
3): Page 4: "The error" capitalization
4): Page 4: "according to this problem, He" who is He?
5): Page 5: "The residual network consists of a series of stacking residual units." duplicate sentence
6): Page 5, Fig. 2b: please clarify that the line F(X_i) is not an additional connection (e.g. use a curly bracket?)
7): Page 7: "occupancy ration and accuracy" ration?
8): Page 9: "lead to insensitive to details"

Answer 6: Sorry for these mistakes and thanks for this suggestion. In the revised manuscript, we have revised the whole paper concerning modifying such issues.

Answer 1): We have clarified the above issues, as shown in line 80-83. ”The environment recognition based on ... relatively single.”

Answer 2): We have changed “environment the acquisition tool” to “the orchard environment acquisition tool”, as shown in line 115-116.

Answer 3): We have changed “... contextual information, so that The error” to “... obtain more contextual information. The error will not ...”. As shown in line 162-163.

Answer 4): ‘He’ in the paper refers to Dr. Kaiwen He, one author of reference 24. We have changed “He proposed a deep residual network ...” to “He et al. [24] proposed a deep residual network ... ”. As shown in line 161-162.

Answer 5) We have removed the duplicate sentence, as shown in line 174-175.

Answer 6): F(X_i) is not an additional connection, it represents the residual function. We have modified the original figure and attached the new figure, as shown in Figure 3(b).

Answer 7): We have changed “ memory occupancy ration and accuracy ” to “memory footprint and accuracy”. As shown in line 280-281.

Answer 8): We have modified the original statement, the revised sentence is “ When recognizing the complex ... and the training time is long.”, as shown in line 318-322.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

I thank the authors for adequately addressing my concerns and suggestions in the revised version of the manuscript.

Back to TopTop