Pumpkin Yield Estimation Using Images from a UAV
Round 1
Reviewer 1 Report
In the earlier revision round, this reviewer judged the manuscript interesting in itself with small language inaccuracies. It appears that those inaccuracies have been emended, therefore the manuscript can be accepted for publication.
Author Response
Thank you for the kind review. Upon comments from other reviewers, we have expanded the paper to include ‘Threshold Sensitivity Analysis’ with in depth analysis of all assumed thresholds. Also, some additional descriptions and references were added to the introduction.
We hope that the revised paper will remain acceptable to you.
Reviewer 2 Report
This study presented a color transform based (HSV) approach to Pumpkin Counting. A new dataset was presented with ground-truthing patches and UAV data, and strong performance in the method was shown. Such studies are definitely valuable for specialty crops such as pumpkin. However, I believe more work should be spent scientifically exploring the various choices in the algorithm and their effect on the performance to provide readers with more insight into designing a good counting algorithm for pumpkin and speciallty crops similar to it from an imagery perspective. In addition, more literature review is needed; for example, it would be nice to compare and contrast this detailed pumpkin study that I found while reviewing this study: Wittstruck, L.; Kühling, I.; Trautz, D.; Kohlbrecher, M.; Jarmer, T. UAV-Based RGB Imagery for Hokkaido Pumpkin (Cucurbita max.) Detection and Yield Estimation. Sensors 2021, 21, 118. https://doi.org/10.3390/s21010118
L90-L91: Very quick jump into using HSV as the best transform to detecting pumpkins; did the literature (even for other crops if not pumpkins) discuss other transforms or analysis methods simpler than CNN. There is some good literature review in L25-L45; but the papers seems to be arguing a simpler method like HSV transform should be used (and there is a good argument for that in that it requires less training data unless you are doing something like one-shot learning) and it feels like more discussion of the pros and cons from the literature and even perhaps analysis with data from this study should be performed. Also no reference or evidence is given for "Pumpkins’ natural high response in the Saturation band of HLS (Hue, Lightness, 90
Saturation) color space is used here."
L93: What is the size of the pumpkin relative to the size of the superpixel? How is the performance changed by changing this ratio
L96: What evidence or literature review is used to support the 3 times higher than saturation threshold for pumpkins?
L108: HLS has highly non-linear interactions between the three components. Why does a linear correlation distance metric (Mahalanobis) make sense
L122-L125: Why should I use these morphological parameters and not some others?
L129: Why not 1.2 standard deviations or 1.31 standard deviations?
Author Response
Thank you for the kind review. We have extended the manuscript by adding an additional subchapter - Threshold Sensitivity Analysis. All arbitrarily chosen values have been addressed there. We have also slightly expanded the Introduction.
This study presented a color transform based (HSV) approach to Pumpkin Counting. A new dataset was presented with ground-truthing patches and UAV data, and strong performance in the method was shown. Such studies are definitely valuable for specialty crops such as pumpkin. However, I believe more work should be spent scientifically exploring the various choices in the algorithm and their effect on the performance to provide readers with more insight into designing a good counting algorithm for pumpkin and specialty crops similar to it from an imagery perspective. In addition, more literature review is needed; for example, it would be nice to compare and contrast this detailed pumpkin study that I found while reviewing this study: Wittstruck, L.; Kühling, I.; Trautz, D.; Kohlbrecher, M.; Jarmer, T. UAV-Based RGB Imagery for Hokkaido Pumpkin (Cucurbita max.) Detection and Yield Estimation. Sensors 2021, 21, 118. https://doi.org/10.3390/s21010118
Thank you for pointing this publication out. We have included it in the introduction. We have also found some other papers of relevance. A subchapter on thresholds has also been added.
L90-L91: Very quick jump into using HSV as the best transform to detecting pumpkins; did the literature (even for other crops if not pumpkins) discuss other transforms or analysis methods simpler than CNN. There is some good literature review in L25-L45; but the papers seems to be arguing a simpler method like HSV transform should be used (and there is a good argument for that in that it requires less training data unless you are doing something like one-shot learning) and it feels like more discussion of the pros and cons from the literature and even perhaps analysis with data from this study should be performed. Also no reference or evidence is given for "Pumpkins’ natural high response in the Saturation band of HLS (Hue, Lightness, 90 Saturation) color space is used here."
A subchapter 2.2.1 was added to discuss color space choice. Lines 93-101 and Figure 2.
L93: What is the size of the pumpkin relative to the size of the superpixel? How is the performance changed by changing this ratio
Pumpkin varieties grown within surveyed fields have a diameter of approximately 25 cm. Taking into consideration the size of the reference tile - 2.5x2.5 m 120 of model pumpkins would fit the field of view. We have also added different superpixels sizes to justify this choice. Lines 170 - 185 and Figures 5 and 6.
L96: What evidence or literature review is used to support the 3 times higher than saturation threshold for pumpkins?
We don’t have literature proof, only empirical. Accordingly, we have tested different values of the threshold to compare results and choose the appropriate threshold. Lines 186 - 197, Figure 7.
L108: HLS has highly non-linear interactions between the three components. Why does a linear correlation distance metric (Mahalanobis) make sense.
The Mahalanobis distance metric is used, as the distribution of color values in the Hue and Saturation bands follow a multivariate normal distribution. During earlier development, the Mahalanobis distance metric was used directly on the RGB pixel values, which also gave decent results. Lines 198 - 204, Figure 8.
L122-L125: Why should I use these morphological parameters and not some others?
The most important operation here is noise reduction median blur. Since the segmented image is binary, it seemed like the most appropriate operation to conduct to remove single-pixel noise. Dilatation was chosen to compensate for low response in pumpkins edge pixels. We have added threshold sensitivity to both parameters. Lines 205 - 210, Figure 9.
L129: Why not 1.2 standard deviations or 1.31 standard deviations?
1.2 would be perfect - as visible in the results of threshold sensitivity analysis. We have however used threshold 1.0 based on a less inquisitive threshold analysis at the earlier stage. Lines 211-215, Figure 10.
Round 2
Reviewer 2 Report
I appreciate the efforts the authors have put in to improve the paper, its a mark of scientific quality when the authors respond to a review with evidence in tables and figures, as these authors have with Figure 8, Figure 7, Figure 6, Figure 5 and Table 2.
I put comments for minor revision below, which I encourage the authors to consider. Minor revision is a stage where the scientific quality has been shown to be good by the authors, but there study would benefit from manuscript improvements to be better presented to the community; you always want to put your best foot forward in the final phase of describing your hard work in the study to the community with a journal publication. The ROC curve (Table 2) comment and the discussion section are probably the most important of these.
Fig 2 and Section 2.2.2 - The new figure does a great job at visually explaining why HLS was chosen. Perhaps maybe a sentance or two description of Figure 2 would be helpful in the text, just so readers don't miss the impact of the figure
Table 2- Very good to have this data, but this the type of data for a ROC curve, it should be replaced with a plot of the ROC curve and then the y=x line should be drawn on the ROC curve to compare to random classifier
Figure 5, 6, 7 and 8- excellent qualtative presentation, perhaps a quantitative metric would be nice too but this gets the point across
Discussion/Conclusion - I note this paper goes directly from results to conclusion, without a discussion section and the conclusion is very short. I would add a 2-4 paragraph discussion section comparing/contrasting the methods/results to some of the referances discussed in introduction (I appreciate the improvement in the introduction)
Author Response
Thank you again for all your valuable input. We truly believe you helped us improve significantly.
I appreciate the efforts the authors have put in to improve the paper, its a mark of scientific quality when the authors respond to a review with evidence in tables and figures, as these authors have with Figure 8, Figure 7, Figure 6, Figure 5 and Table 2.
I put comments for minor revision below, which I encourage the authors to consider. Minor revision is a stage where the scientific quality has been shown to be good by the authors, but there study would benefit from manuscript improvements to be better presented to the community; you always want to put your best foot forward in the final phase of describing your hard work in the study to the community with a journal publication. The ROC curve (Table 2) comment and the discussion section are probably the most important of these.
Fig 2 and Section 2.2.2 - The new figure does a great job at visually explaining why HLS was chosen. Perhaps maybe a sentence or two description of Figure 2 would be helpful in the text, just so readers don't miss the impact of the figure.
We have added more description in section 2.2.1 and one more sentence in section 2.2.2.
Table 2- Very good to have this data, but this the type of data for a ROC curve, it should be replaced with a plot of the ROC curve and then the y=x line should be drawn on the ROC curve to compare to random classifier
We have replaced the table with a plot. Our data proves to be a bit tricky for the ROC curve, as we neither have a binary classifier nor do we have True Positive values for this set. However, we have created a figure to visualize our results, so that they would be more readable for the readers.
Figure 5, 6, 7 and 8- excellent qualitative presentation, perhaps a quantitative metric would be nice too but this gets the point across.
Yes, we agree that quantitative metrics would be nice too. However, the time needed to perform quantitative metrics is much greater than the additional benefits brought by it. Thus, we have decided not to pursue it.
Discussion/Conclusion - I note this paper goes directly from results to conclusion, without a discussion section and the conclusion is very short. I would add a 2-4 paragraph discussion section comparing/contrasting the methods/results to some of the references discussed in introduction (I appreciate the improvement in the introduction)
We have created a discussion subsection to discuss the Results and differences between our method and the methods described in the introduction. We have also slightly expanded the Conclusions.
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
Dear Authors,
The paper looks interesting but more like application than the scientific study. The methods used in the study are not very innovative and also not that much challenging as the methods are known and the data are clear (there are significant differences in color of pumpkins and the background). The 2% accuracy of the method should be clarify more (more in the notes below).
Here are some notes from my site for improving the manuscript.
line 4-5: From the last sentence it looks like also the manual harvest count indicate the error less than 2%. According to your results from the manual harvest it is less than 6%. Consider to write here both numbers for clarification.
line 86: it would be better to mention also here what is the µ and S. It is mentioned in line 111, but the first appearance is in this equation.
Figure 1: add the labels 1), 2) and 3) also to the figures there are only in the caption.
line 112 (equation 3): There is missing some spacing, right?
line 115: Here is two times "number", it should be probably only once.
line 120: It would be great to describe more precisely how do you get the 2% error. You just look at it and said, it could be less than 2%? Or is there some more scientific background? As you used it in abstract as one of the highlights it should be probably some more explanation behind it. If it is not possible to count the pumpkins manually on the whole image you can use some subsets.
Best Regards
Author Response
Please see attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
General comments
The paper entitled “Counting pumpkins using images from a UAV” treats about a topic of the highest interest in remote sensing applied to the agricultural scope. The title is appropriate and represents the actual content of the manuscript. The abstract summarizes in short the content and outlines well the aims of the contribution. Keywords are appropriate. The highlights are missing; therefore this reviewer cannot judge them. Introduction contains a brief but complete review of the state-of-the-art of machine learning methodologies for discrete recognizing in agriculture. References are abundant and very recent. Figures and graphics are clear and of good quality. The manuscript content then falls without doubt under the journal scope. Used language is overall fluent and grammatically correct. Substantially, Authors’ contribution focuses on the counting of the pumpkins harvest by means of machine learning tools applied to images acquired by UAV. The topic is inherently interesting because addresses the issue of identifying particular objects within a complex image.
The manuscript contains some typos that this reviewer invites Authors to amend. Some of them are reported in the specific comments section. In conclusion, this reviewer recommends considering the paper for publication after “minor revisions”.
Specific comment
Line 69: Please, change “boudnaries” with “boundaries”;
Line 84: The Mahalanobis distance is not completely explained, the matrix S-1 is not referenced (Covariance matrix).
Line 99: Why has the parameter “pmode” this name if it is a “median” and not a “mode”?
Author Response
Please see attachment.
Author Response File: Author Response.pdf