Detection of Maize Tassels from UAV RGB Imagery with Faster R-CNN
Round 1
Reviewer 1 Report
This paper focuses on the detection of maize tessels using Faster R-CNN from UAV images and mobile images. Authors used the VGGNet and ResNet as feature extraction network for Faster R-CNN. The research article is within the scope of the journal. It lacks motivation behind research. Please address following comments for the improved quality of the manuscript along with improving English language quality:
Overall Paper:
This paper focuses on the detection of maize tessels using Faster R-CNN from UAV images and mobile images. Authors used the VGGNet and ResNet as feature extraction network for Faster R-CNN. The research article is within the scope of the journal. It lacks motivation behind research. Please address following comments for the improved quality of the manuscript along with improving English language quality:
Introduction:
Overall motivation behind research is somewhat week which can be improved by adding more literature studies in introduction section.
Materials and Methods:
Line 62: “5280×2979 resolution” mean “with resolution of 5280×2979 pixels”?
Line 57 and Line 62-63: Please add the illustrative diagram to make it clear for the reader. All of the readers may not be aware of terms you used.
Section 2.2.2: In the beginning please provide clear explanation why only these two networks (VGGNet and ResNet) were used in this study. Why not other networks (AlexNet, SqueezeNet, GoogleNet, InceptionV3, DenseNet201, MobileNetV2, etc.) were used and tested as feature extractors?
Line 102-111: Among VGG16 and VGG19 which network you used? Please provide the complete description of the network you used (i.e. kernel, stride, padding, activation function etc.).
Line 112-119: There are several types of pretrained ResNet (ResNet18 and ResNet50, etc.) networks are available having different number of convolutional layers (In know you provided the network names in Section 3 but these information are critical in material and method section). Which one you used. Moreover, please provides the compete description of the network including kernel, stride, padding, activation function etc. at each layer.
Line 122-123: Please also provide information about the loss function you used?
Section 2: How you labelled your images? How you train your network (trained from scratch or pretrained)? What were the weights from which you initialized your network? How you divided your data into training and testing datasets (randomly or selected manually)? How you selected the network training parameters? Etc. Please provide necessary information in materials and method section, I know you provided the citation. But it is necessary to give complete description in your manuscript so that reader won’t have to go back and forth to find the required information
Results and Discussion:
Line 165-168: How you found out anchor sizes? What was the criteria? Moreover, determining the anchor sizes and their criteria should be part of materials and method section.
Author Response
The reply is in word
Author Response File: Author Response.docx
Reviewer 2 Report
I enjoyed reading this paper, it is well referenced and nicely put together. However, there are few issues that should be addressed before this paper is ready for publication.
The paper emphasizes that “the application of the UAV on image detection of maize tassels was not seen yet, because it is still challenging in natural environment due to light conditions, possible occlusions, and different maize genotypes.” I was surprise to learn this. Is this correct? I have read at least 5 papers this year that addressed detection of maize tassels using UAV and some type of deep learning/data mining method.
If authors really want to make an impact, this paper needs to incorporate at least cost benefit analysis of implementing this new method in current practice. For example, processing cost, training cost, time cost, data cost, scaling issues (i.e implementing this method on large databases) and etc. It should also compare the method authors used to the current state of art, and discuss areas and applications where new method performs better.
Furthermore, I would be very cautious using MAE and RMSE in comparing performances between two datasets, especially for violate, non-randomly distributed variables, as your data set implies to be. Additionally, RMSE is known to be very sensitive to learning rate, and if authors still decide to use this method, sensitivity analysis should be included.
Author Response
The reply is in word
Author Response File: Author Response.docx
Reviewer 3 Report
Dear Authors
The topic of tassel detection is important probably mainly for breeders to learn about the time duration reaching this development stage. The tassel is an organ, the day it appear might be a trait. The importance of tassel detection should be better described. In the case of commercial fields it might be complicated to cover the entire field, in some cases are very big farms. In the case of commercial fields it might be better to detect the color change of the tassel in comparison to development stages prior to tasseling and this can be done in coarser resolution than mentioned in the current study.
It is suggested to consider proof reading for minor errors (e.g., missing space) and more scientific language.
Specific comments:
L23 – what are the units of RMSE?
L25 – what tassel traits were calculated here saying that in the future other traits will be calculated?
L29 – the tassel is not a trait but an organ of the plant – please rephrase this sentence.
L45 – this is dependent in the field size.
L47 – it is better to present more examples then to use etc.
L47-49 – in case talking about detecting the tassel stage in maize from a UAV there was at least one paper (Herrmann et al. 2019) that did it.
L102 – VGGNet is not explained before
Equations 4 & 5 – what is the big difference between MAE and RMSE? Is there a possibility that the MSE will show relatively high and the RMSE will show relatively low error for the same sample? Is it needed to show both of them?
L145 – “and tk is the mean reference count” might be redundant.
Figure 3 – in the figure there are numbers what does they mean? Please elaborate in the caption. “maize tassel: 99” “maize tassel: 100” – is it the probability that this is a tassel? In case yes please mention it in the caption.
Figure 4 – what does the hue of the green means? After (b) there should be a space.
L178 – cannot – could not – make sure that this is not a mistake.
L191 – the pixel size of 400 and 1000 was in what units? (cm, mm ….?)
References
Herrmann, I., Bdolach, E., Montekyo, Y., Rachmilevitch, S., Townsend, P.A., & Karnieli, A. (2019). Assessment of maize yield and phenology by drone-mounted superspectral camera. Precision Agriculture
Author Response
The reply is in word
Author Response File: Author Response.docx
Round 2
Reviewer 1 Report
Thank you. Did good effort in revising the paper. However, figure#2 still needs to be edited. Boundaries of bounding boxes are not clearly visible. Moreover, don't need to provide the box from software (LabelImg).
Author Response
We sincerely thank you for your constructive comments and suggestions. We have changed Figure2 and made bounding boxes more visible. and checked English language and style carefully. Revised portions are marked in red words in the modified version.
Author Response File: Author Response.docx
Reviewer 2 Report
Authors addressed my concerns.
However before this paper is ready for publishing I highly recommend incorporating authors' comment #2 into Conclusion section. It seems very relevant and pertains to the overall goal of the study.
Thanks.
Comment #2 "Yes, we want to reduce the labor-intensive with UAV in monitoring agricultural field conditions. The purpose of this paper is to detect maize tassel with Faster R-CNN. We used UAV to get images of maize in 10500 m2 field and it took 0.5 hours. We got 1125 maize images with 600 × 600 resolution, 89 maize images with 4000 × 2250 resolution. We annotated the train dataset for two weeks. Each model took one day to train. For the same size field, more than 10 people will be needed to observe the situation of maize. It is very time consuming. Furthermore, different people have different evaluation criteria, the results won’t be objective. "
Author Response
We sincerely thank you for your constructive comments and suggestions. We have added Comment #2 into Conclusion section and checked English language and style carefully. Revised portions are marked in red words in the modified version.
The details are in word
Author Response File: Author Response.docx