Next Article in Journal
Deep Learning on Airborne Radar Echograms for Tracing Snow Accumulation Layers of the Greenland Ice Sheet
Previous Article in Journal
A Satellite View of an Intense Snowfall in Madrid (Spain): The Storm ‘Filomena’ in January 2021
 
 
Article
Peer-Review Record

Recognizing Zucchinis Intercropped with Sunflowers in UAV Visible Images Using an Improved Method Based on OCRNet

Remote Sens. 2021, 13(14), 2706; https://doi.org/10.3390/rs13142706
by Shenjin Huang 1,2, Wenting Han 1,2,3,*, Haipeng Chen 1,2, Guang Li 1,2 and Jiandong Tang 4
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Remote Sens. 2021, 13(14), 2706; https://doi.org/10.3390/rs13142706
Submission received: 31 May 2021 / Revised: 6 July 2021 / Accepted: 6 July 2021 / Published: 9 July 2021

Round 1

Reviewer 1 Report

Overall, I believe this paper is well-written and the research aim and objectives have been well defined to complete the tasks of recognizing zucchinis intercorpped with sunflowers through the drone. The analysis is also robust. However, before its acceptance, I believe authors should do some minor revisions.

  1. Fig.1 should be improved as such contents have been presented in the subsequent analysis. Authors should develop a general flow chart that might be useful to guide other's study.
  2. Authors should present the cross section (real image) of how are these two species intercropped. Please take the photo or the on-site image to say this. Only the 2D image taking from the top is not enough to indicate the results. 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The article entitled: Recognizing Zucchinis Intercropped with Sunflowers in UAV Visible Images using an Improved Method based on OCRNet, presents the methodology used to semantically segment several aerial orthophotos captured by a multispectral camera from a drone to distinguish intercropped crops of zucchinis and sunflowers in Hetao Irrigation District, Inner Mongolia, China with a high success rate.

The authors have experimented with various neural network architecture (CNN) frameworks such as OCRNet, PSPNet, DeeplabV3+ and DNLNet). For the training of the semantic segmentation networks used they have defined 3 scenarios with an approximate area of 65.4 ha and for the validation in two scenarios another 40 ha. The training dataset consisted of 27130 images of 256x256 pixels and the validation dataset consisted of 16800 images.  The way of generating these training sets is original, taking advantage of the same surface to generate several displaced tiles, i.e. there is a lot of surface intersection in the training tesserae (75%), and could be considered a data augmentation technique. If they had done it with non-overlapping tiles, they would only have had ~1770 tiles for training, which is a very small number. Therefore, it could be said that a weakness of the methodology used and the scope or significance of the results is the small size of the training and validation datasets. A percentage comparison of the number of tessellations in the datasets shows that they represent 62% and 38% respectively. Furthermore, the decision to reserve some areas for training and others for validation also reduces the generalisability of the network learning.

All the tessellations could have been pooled, and on the total number of tessellations it could have been decided to generate three datasets: training, test and validation by randomly choosing a percentage of tessellations for each subset.

 

The structure of the article is correct, although references to the use of semantic segmentation and transfer learning techniques applied to other types of feature extraction could have been identified in the state of the art. If the authors want to reinforce this section they could do so and I suggest they check the following references::to identify tourists in images (https://doi.org/10.3390/ijgi10030137), to identify roads.

(https://doi.org/10.3390/app10207272), or lost person search (https://doi.org/10.3390/ijgi10020080), among many others that can be found in the Web of Science.

I would also invite the authors to describe the extent of the areas in metric units not only in pixels. For example in table 1.

It would be nice if they compare the results of training and mIoU with the current configuration 3 whole areas for training and 2 for test or how to join the 5 and then randomly generate the two or three datasets: Train, Test and Validation. In this current configuration there is no validation.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

Authors do not have to include a thank you for every comment received. Some may be more accurate, some may not, and some may be put in context.

The authors have taken into account some of the suggestions for improvement: Improve the references related to the state of the art of semantic segmentation and include in the zone table the extension.

The rest of the suggestions related to the justification of the method used to divide the dataset between training, validation and testing have been limited to trying to justify their criteria in response to my comments, but they have not even tried to justify it in the paper methodology. That is, they provide some references that justify doing so in order not to change their approach, but they do not introduce them in the article to justify their approach.

I do not understand, and I think it is not justified, as far as I can understand, that the overlapping rate of image capture by the drone can affect the fact that the training, validation and test image sets are unbalanced.

Correct me if I have misunderstood: The images have been taken with a drone, I assume using pix4dcapture type mission planning software or similar, in which the % overlap and the final resolution of the image are set. With these images the orthomosaic of the images taken by the drone will have been generated and the images of the training, validation and test dataset will have been cut out from the mosaic. Is this not the case? If not, the way in which the dataset was generated from the drone frames should be indicated, as my misinterpretation would be generalisable to any reader when looking at figures 8 and 9.

If the authors have used this strategy and have justified it in previous works, they should indicate it in the article including the corresponding references.

If the dataset splitting strategy used by openly available tools does not meet the requirements for this type of use cases, authors can modify or develop their own algorithm that generates from all image tiles the three subsets.

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop