Grape Bunch Detection at Different Growth Stages Using Deep Learning Quantized Models
Round 1
Reviewer 1 Report
My comments attached in a separate file
Comments for author File: Comments.pdf
Author Response
Dear reviewer,
We thank you for the reviews, which truly helped to improve the manuscript quality.
We answered each question in the attached file.
Author Response File: Author Response.pdf
Reviewer 2 Report
Aquiar et al. employ images to automatically detect two grape fruit stages. Images were obtained through a robot in the field. Please adjust the following remarks in the manuscript (rather than the response letter).
(1) Lines 1–5 & 23–29: you cannot use the exact same wording in these two sections.
(2) Lines 1–9: shorten this section to two lines at most
(3) Lines: 1–20: you need to specify which two development stages you refer to, as well as that there is a variation in the accuracy detecting them
(4) Introduction: one of the key features of the present work is that you employ cameras operating in the visible portion of the electromagnetic spectrum (400–700 nm). In this way, costly equipment or trained personnel are not required (Fanourakis et al., 2021 Agronomy 11, 795). In previously conducted studies, however, not only a specific orientation of the object of interest in relation to the camera is required, but also defined illumination conditions (Fanourakis et al., 2021 Agronomy 11, 795; Taheri-Garavand et al., 2021 Acta Physiol Plant 43, 78). The required specific illumination limits the method applicability to controlled-light environments (Taheri-Garavand et al., 2021 Acta Physiol Plant 43, 78). The method presented in this paper is independent of the ambient light environment, setting this solution affordable/cost-effective, portable (thus in situ) and rapid (Taheri-Garavand et al., 2021 Acta Physiol Plant 43, 78).
(5) Lines 48–49: In recent years, CNNs have been increasingly incorporated in plant phenotyping concepts (Taheri-Garavand et al., 2021 Industrial Crop Prod 171, 113985). They have been very successful in modeling complicated systems, owing to their ability of distinguishing patterns and extracting regularities from data. Examples further extent to variety identification in seeds (Taheri-Garavand et al., 2021 Plants 10, 1406) and in intact plants by using leaves (Nasiri et al., 2021 Plants 10, 1628).
(6) From the Introduction, is it not clear why distinguishing the two developmental stages is of importance.
(7) Lines 73–133: you need to shorten this section by removing the details. Mention the examples all together and stress similarities/ dissimilarities of these studies rather than presenting them one by one.
(8) Lines 153–154: what type of cameras were those? You need to provide model, company. Are the remaining of the sensors relevant? If not, why provide them here?
(9) Lines 182–183: you need to specify exactly how these two developmental stages are. For instance, berry size or color. Also include an image of representative grapes in these two stages.
(10) Line 355–356: this is important information that cannot be omitted from the abstract/ discussion sections. Is color the primary trait where classification is performed?
(11) Line 404–406/ 437: which dates were the visits? Which time of the day were the images obtained? Did you image the same plants, and developmental changes progressed in time OR these two developmental stages were present at the same time on the plants?
(12) You need to reinforce the discussion section. I below give you some ideas:
- strength of the method: low-cost, portable, independent of light conditions (Fanourakis et al., 2021 Agronomy 11, 795; Taheri-Garavand et al., 2021 Acta Physiol Plant 43, 78)
- weakness: low precision
- Does the low-cost concept include the robot? Does the low-concept include the computational power running the model?
On a commercial scale, evidently, a capital investment is initially required for adopting the employed approach (Taheri-Garavand et al., 2021 Industrial Crop Prod 171, 113985). Nevertheless, the wide-ranging large-scale commercial applications can provide high returns through considerable improvements in process enhancement and cost reduction.
- Why the precision is different between the two stages? Why you did not include fully-grown grapes. How far are the employed developmental stages from harvest? Are there any decisions to be made by the growers/ agronomists at that stage?
- Ideas to improve the precision of the model. Compare the obtained precision with the ones of earlier studies (e.g., Taheri-Garavand et al., 2021 Plants 10, 1406; Nasiri et al., 2021Plants 10, 1628)
- what is the ideal time of the day to obtain images? Is it midday of a sunny day or overcast days? Overcast days will give diffuse light and reduce shadows
- the problem with inclusions is expected to reduce in fully-developed grapes? Can the problem with inclusions being partially solved by different pruning strategy?
- How robust is the obtained model in measuring the same trait in another cultivar?
Author Response
Dear reviewer,
We thank you for the reviews, which truly helped to improve the manuscript quality.
We answered each question in the attached file.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
This reviewer thanks the authors for their thorough answers. However, as a last remark, I still believe the need for an energy-efficient HW platform in this specific case remains, somehow, not sufficiently highlighted. Authors have stated in general lines that it would be useful to use a low-power platform because the idea is to physically place this on a robot, but why? Why do you need to perform the inference online? Which are your real-time constraints and why are they in place for this specific case? In other words, which is the need for online processing in precision agriculture? I think it is important to clarify this point since it would help to understand your rationale behind going for a low-power platform. Additionally, since energy consumption is yet a topic that seems relevant for this research, a comparison with the state of the art in terms of energy consumption would be interesting.
Author Response
The authors thank the reviewer for all the comments that helped to improve the manuscript.
We answer each review below.
This reviewer thanks the authors for their thorough answers. However, as a last remark, I still believe the need for an energy-efficient HW platform in this specific case remains, somehow, not sufficiently highlighted. Authors have stated in general lines that it would be useful to use a low-power platform because the idea is to physically place this on a robot, but why? Why do you need to perform the inference online? Which are your real-time constraints and why are they in place for this specific case? In other words, which is the need for online processing in precision agriculture? I think it is important to clarify this point since it would help to understand your rationale behind going for a low-power platform.
We agree with the relevance of detailing more the rationale behind the requirement for low-power and high speed. To do so, we added this explanation in the following paragraph of the manuscript:
"This requirement is important since our main goal is to have this solution running on our robotic platform (Fig. 2). Thus, power consumption is taken into consideration so that the grape detection solution requires as less power as possible, and robot autonomy does not be highly affected by it. With this low-power solution, the robot will operate autonomously for a larger time without needing to charge. On the other hand, high-power solutions can decrease the autonomy time of the platforms, which is essential for long-term operations.
In addition, since this is intended to be a solution that runs online on the robot, runtime requirements are important, so that the detection can be performed in a time-effective manner. In this way, mobile agricultural robots can perform tasks dependent on the grape detection algorithm in an online fashion. For example, SLAM algorithms that usually run at high frequency can use the grape detections to build prescription maps that can be used for later processing and other agricultural applications. Also, harvesting procedures require the correct location of the grape bunches in relation to the robotic arm that is moving. Thus, it is essential to have a high detection frequency to have a precise location of the grapes with reference to the arm gripper."
Additionally, since energy consumption is yet a topic that seems relevant for this research, a comparison with the state of the art in terms of energy consumption would be interesting.
We agree that this should be detailed in the manuscript. We added a reference in the discussion section to two papers that also deploy object detection models in low-power devices. This paragraph is the following:
" One of the main goals of this work was to achieve a low-power solution.
The device used operates at high inference rate with a requirement of 5V and 500mA. This result is aligned with the state-of-the-art works that propose advanced solutions for object detection using accelerator devices. . Kaarmukilan et al. [53] use Movidius Neural Compute Stick 2 that, similarly to the TPU used in this work, is connected to the host device by USB and is capable of 4 TOPS with a 1.5W power consumption. Dinelli et al. [54] compare several field-programmable gate array families by Xilinx and Intel for object detection. From all the evaluated devices, the authors achieved a minimum power consumption of 0.969W and maximum power consumption of 4.010W."
Once again, we thank the reviewer for helping to improve the manuscript quality.
Reviewer 2 Report
Authors did excellent work in dealing with my comments.
The quality of the manuscript was improved.
On this basis, I recommend the present manuscript for publication in Agronomy.
Author Response
The authors would like to thank the reviewer for all the help in improving the manuscript quality.