A Performance Analysis of a Litchi Picking Robot System for Actively Removing Obstructions, Using an Artificial Intelligence Algorithm
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe article:
Performance Analysis of Actively Removing Occlusion for Litchi Picking Robot using Artificial Intelligence Algorithm
Has been reviewed with the following comments:
Lines 33 and 40 miss references.
The sentence in line 34 is not well written.
Throughout all the introduction the authors talk about obstruction and occlusion, but none of them is explained in an understandable way. An obstruction can be a branch;
I would like that you explain how litchi fruit grow in a tree. Which is the percentage of fruit in the surface. Does it grow in the interior of the tree?
Which agronomic treatments are given for simplifying and for getting a better yield. For example, pruning, there are studies with LIDAR of trees morphology.
At the end of the introduction where you add your objectives you should clarify the previous spots. What is occlusion and what is obstruction.
Some Works done with the occlusion removal units as the one that you mention in lines 112 y 113 should be included in the introduction.
In line 120 add and C++ …. To join both sentences.
Which is the working range of the robot in line 125.
All lines from 125 to 135 comment on the occlusions but as mentioned before it is missing at least one paragraph of it in the introduction.
In heading 2.2. (line 156) Let only: ROIs Segmentation of Litchi fruit and branches.
How was the standard deviation of line 197 found?
Lines 198 and 199 are not clear with the for example words.
Add a reference for the Bouguet algorithm in line 217
Rewrite paragraph from 242 and 246 as the words “24-pixel field” are not clear
Letters in figures 7a an 7c are too small for the reader.
Add points and made the sentence shorter, so that the word coordinate in its different aspects is understood.
Change as by at in line 310.
The phrase “subsequently annotated manually” should be clarified and substituted in line 335.
What is the name of the X-axis in Figure 11.
Use a point before first situation and another before second situation (lines 361 and 362).
In line 367 it starts with fourth experiment. Where is the third experiment?
Figure 13 misses letters a and b in the image.
There is an image 13 in line 411 and another in line 421. Which is the good one?
In figure 14a the blue letters appear over the red ones and have to be fixed.
Figure 14 should be larger. So please put 14a in series and both images of 14b bellow.
In Table 6, what is the meaning of occlusion letter and what T means in the accuracy. What is a, b and c?
In line 461 it should be are instead of were.
In figure 15 it should be written in the heading what is each group? As well in the previous text it says that the two motion traces are clearly separated and it is not so, lines 464.
In line 474 the word where should be eliminated and a full stop added.
REFERECES
Reference 3: Ieee should be IEEE.
In reference 7 the place and date of the conference should be included.
Comments on the Quality of English LanguageThere are a small quantity of english errors and unclear phrases.
Author Response
1 ) Lines 33 and 40 miss references.
Response: Thank you for your good comment. References 1 and 2 are the references for the corresponding lines.
2) The sentence in line 34 is not well written.
Response: Thank you for your good comment. We have rewritten the sentence as shown in line 34 in the revised manuscript.
3) Throughout all the introduction the authors talk about obstruction and occlusion, but none of them is explained in an understandable way. An obstruction can be a branch. I would like that you explain how litchi fruit grow in a tree. Which is the percentage of fruit in the surface. Does it grow in the interior of the tree?
Response: Thank you for your excellent comment. Firstly, we have changed all the word ‘occlusion’ in the manuscript into the word ‘obstruction’. Then, we have explained the ‘obstruction’ in our study. The obstruction means the litchi fruit obstructed the picking point to be cut by the robot. Under this situation, if the obstruction is not removed, the robot will damage the litchi fruit so that picking operation fails and some economic losses happen. We admit that obstruction include many objects such as branches, leaves, fruits, etc. But in our study, we only consider the situation that the obstruction at the picking point is the litchi fruit. Litchi fruits grow on the surface of the tree crown. However, due to the random growth of branches, the mother branch including the picking point usually is obstructed by other branches, leaves, or fruits. We also admit that we used the word ‘occlusion’ to express inaccuracies and ambiguities.
4) Which agronomic treatments are given for simplifying and for getting a better yield. For example, pruning, there are studies with LIDAR of trees morphology.
Response: We are very sorry for not understanding your meaning.
5) At the end of the introduction where you add your objectives you should clarify the previous spots. What is occlusion and what is obstruction.
Response: Thank you for your excellent comment. We have clarified our objectives following your suggestion. And the word ‘occlusion’ was changed into the word ‘obstruction’.
6) Some Works done with the occlusion removal units as the one that you mention in lines 112 y 113 should be included in the introduction.
Response: Thank you for your excellent comment. We have added the corresponding explanations in the introduction.
7) In line 120 add and C++ …. To join both sentences.
Response: Thank you for your good comment. We have added the word ‘and’.
8) Which is the working range of the robot in line 125.
Response: Thank you for your good comment. We have added the working range of the robot.
9) All lines from 125 to 135 comment on the occlusions but as mentioned before it is missing at least one paragraph of it in the introduction.
Response: Thank you for your good comment. We have added the corresponding paragraph in the introduction.
10) In heading 2.2. (line 156) Let only: ROIs Segmentation of Litchi fruit and branches.
Response: Thank you for your good comment. We have corrected it.
11) How was the standard deviation of line 197 found?
Response: Thank you for your excellent comment. Statistical results obtained by processing a large number of images that the average branch width at the picking position was 5 pixels, and the average branch width at the equidistant line below the picking area was 15 pixels. The average difference was 10 pixels.
12) Lines 198 and 199 are not clear with the for example words.
Response: Thank you for your excellent comment. We have rewritten the sentences.
13) Add a reference for the Bouguet algorithm in line 217.
Response: Thank you for your good comment. We have added the reference [28].
14) Rewrite paragraph from 242 and 246 as the words “24-pixel field” are not clear.
Response: Thank you for your good comment. We have rewritten the paragraph.
15) Letters in figures 7a an 7c are too small for the reader.
Response: Thank you for your good comment. We have redrawn the figure.
16) Add points and made the sentence shorter, so that the word coordinate in its different aspects is understood.
Response: Thank you for your good comment. We have rewritten the paragraph.
17) Change as by at in line 310.
Response: Thank you for your good comment. We have corrected it.
18) The phrase “subsequently annotated manually” should be clarified and substituted in line 335.
Response: Thank you for your good comment. We have rewritten the sentence.
19) What is the name of the X-axis in Figure 11.
Response: Thank you for your good comment. We have added the name of the X-axis.
20) Use a point before first situation and another before second situation (lines 361 and 362).
Response: Thank you for your good comment. We have rewritten the paragraph following your suggestion.
21) In line 367 it starts with fourth experiment. Where is the third experiment?
Response: Thank you for your good comment. The third experiment is the text below Fig. 11.
22) Figure 13 misses letters a and b in the image.
Response: Thank you for your good comment. We have added letters a and b.
23) There is an image 13 in line 411 and another in line 421. Which is the good one?
Response: Thank you for your good comment. There were two figures 13. We have corrected them.
24) In figure 14a the blue letters appear over the red ones and have to be fixed.
Response: Thank you for your good comment. We have fixed it.
25) Figure 14 should be larger. So please put 14a in series and both images of 14b bellow.
Response: Thank you for your good comment. We have fixed them following your suggestion.
26) In Table 6, what is the meaning of occlusion letter and what T means in the accuracy. What is a, b and c?
Response: Thank you for your good comment. We have added a paragraph to explain them in the revised manuscript.
27) In line 461 it should be are instead of were.
Response: Thank you for your good comment. We have corrected it.
28) In figure 15 it should be written in the heading what is each group? As well in the previous text it says that the two motion traces are clearly separated and it is not so, lines 464.
Response: Thank you for your excellent comment. We have written the headings of the figure. And the dashed line in Fig. 16 represented the distance between the occlusion object and the picking point. And the change in the length of the dashed line reflected that two motion traces are clearly separated.
29) In line 474 the word where should be eliminated and a full stop added.
Response: Thank you for your excellent comment. We have corrected it.
30) REFERECES Reference 3: Ieee should be IEEE. In reference 7 the place and date of the conference should be included.
Response: Thank you for your good comment. We have corrected them.
Reviewer 2 Report
Comments and Suggestions for AuthorsIn Line 173 “If ROIs of the litchi fruits and the branches overlapped, it was considered that the recognized branch belonged to the recognized fruit” Please elaborate this context. If the overlapped fruits originated from the same branch or the branch occlusion Then how do you classify or what measures do you undergo?
In line 177 it is mentioned that “If the center distance was smaller than the threshold 500 pixels” is there any reference or exact justification about fixing the threshold for varied sizes and shapes of litchi.
In Line198 “Pixel difference on equidistant lines” The equidistant lines may be obstructed by branches, leaves, or other obstructions, which makes it difficult for automated systems to recognize and select the litchi fruits.
The accuracy of disparity estimation decreases when objects are positioned too near or far away from the stereo camera system. This restriction may affect the method's suitability in scenes with significant depth variations.
Many factors, including the disparity range, matching metric, and window size, can affect how accurate disparity maps are. It can be difficult to choose the right parameters, and you may need to adjust them manually, which can be time-consuming and may not produce the best results for all image pairs.
How do you justify the Issues with the spray gun for visual localization to remove occlusion using the air-blowing method provided by the gas pump. Is there proper, proven reasoning for this process.
It would be better if the author presented certain ablation experiments and comparative studies with other existing models.
Author Response
1) In Line 173 “If ROIs of the litchi fruits and the branches overlapped, it was considered that the recognized branch belonged to the recognized fruit” Please elaborate this context. If the overlapped fruits originated from the same branch or the branch occlusion Then how do you classify or what measures do you undergo?
Response:
We appreciate your good comment to improve our manuscript. The determination of membership relationship was not simply about determining whether the recognition boxes overlapped. But rather by combining the overlap percentage of the recognition boxes and whether the bounding box of the branches was located above the bounding box of the litchi. The average overlap percentage between the fruits and branches identified in our YOLOv8-Seg result graph was 9.39%, and the overlap percentage we set was 12%, The overlap percentage of the situation you mentioned in the experiment was greater than 12%.
2 ) In line 177 it is mentioned that “If the center distance was smaller than the threshold 500 pixels” is there any reference or exact justification about fixing the threshold for varied sizes and shapes of litchi.
Response:
Thank you for your good comment. Statistical results obtained by processing a large number of images showed the average radius of a single litchi fruit was 400 pixels, and the maximum radius was 500 pixels.
3 ) In Line198 “Pixel difference on equidistant lines” The equidistant lines may be obstructed by branches, leaves, or other obstructions, which makes it difficult for automated systems to recognize and select the litchi fruits.
Response:
Thank you for your excellent comment. There was indeed a situation where the equidistant line you mentioned was interfered with by other factors, which prevented the normal selection of branch picking points. This was also reflected in the 3.1 Results Discussion section, however, there was no too many positioning failures in our experiment.
4 ) The accuracy of disparity estimation decreases when objects are positioned too near or far away from the stereo camera system. This restriction may affect the method's suitability in scenes with significant depth variations.
Response:
Thank you for your excellent comment. The Zed 2i binocular camera we used could measure a depth range of 0.2-20m in 1080p 30FPS mode, and the 3D point cloud model produced by binocular localization at 0.45m-20m had the best effect. In the experiment of our manuscript, the localization distance for the litchi picking operation was within the range of 0.7-1m, and the localization error within this range was reflected in Table 5. The error caused by this depth change could be ignored.
5 ) Many factors, including the disparity range, matching metric, and window size, can affect how accurate disparity maps are. It can be difficult to choose the right parameters, and you may need to adjust them manually, which can be time-consuming and may not produce the best results for all image pairs.
Response:
Thank you for your excellent comment. The Zed 2i camera we use comes with its own SDK package, which can be directly called in the Ubuntu system. The problem you mentioned above can be solved through the adaptive algorithm in the SDK.
6) How do you justify the Issues with the spray gun for visual localization to remove occlusion using the air-blowing method provided by the gas pump. Is there proper, proven reasoning for this process.
Response:
Thank you for your excellent comment. The main purpose of implementing by the spray gun is to remove occlusion, leaving a certain space redundancy between the picking point and the occlusion object. The end effector can enter the space redundancy to wait for cutting the picking point. Therefore, we believe that as long as the Euclidean distance between the picking point and the occlusion object allows the end effector to successfully enter the space redundancy, this process is reasonable.
7) It would be better if the author presented certain ablation experiments and comparative studies with other existing models.
Response:
Thank you for your excellent comment. The focus of this study is to verify the feasibility of the proposed litchi active removing occlusion system. We fully agree with the ablation experiment and comparison experiments with other models you mentioned. In future research, we will focus on studying the clamping posture of the end effector after entering the space redundancy between the shaking picking point and the shaking fruit.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe gripper specifications need to be explained in more details within the 'System Composition' Section.
Please explain why YOLOv8 is prefered over other software object recognition software packages?
More explanation about how the mobile part of system is controlled need to be given.
Some minor comments are as follows.
The stereo rectification algorithm used in the paper, its name and references need to be given.
The caption of Fig. 15 is not underneath the figure.
The rectifying equation of (1) require citation.
Forward kinematics equations of (5) and (6) require citation.
Table 5, and Table 6 need to be adjusted in single page.
I can observe some problems within your references within the paper. The paper reference numbers appear as '[Error! Reference source not found.]'. You would need to correct these references.
Author Response
1 ) The gripper specifications need to be explained in more details within the 'System Composition' Section.
Response:
We appreciate your good comment to improve our manuscript. We have added the gripper specifications in the revised manuscript.
2 ) Please explain why YOLOv8 is prefered over other software object recognition software packages?
Response:
Thank you for your excellent comment. The main purpose of this study is to verify the feasibility of the litchi active removing occlusion system. The YOLOv8 model is only for semantic segmentation to extract the ROI regions of litchi fruits and branches. In future studies, for example, litchi fruit recognition in unstructured environment, we will conduct a comprehensive study and experiment on the selection of deep learning models.
3 ) More explanation about how the mobile part of system is controlled need to be given.
Response:
Thank you for your excellent comment. We have added a paragraph about the mobile part explanation in the revised manuscript.
4 ) The stereo rectification algorithm used in the paper, its name and references need to be given.
Response:
Thank you for your excellent comment. We have added the corresponding reference.
5) The caption of Fig. 15 is not underneath the figure.
Response:
Thank you for your good comment. We have moved the caption underneath Fig. 15.
6) The rectifying equation of (1) require citation.
Response:
Thank you for your excellent comment. We have added the corresponding reference.
7) Forward kinematics equations of (5) and (6) require citation.
Response:
Thank you for your excellent comment. We have added the corresponding reference.
8) Table 5, and Table 6 need to be adjusted in single page.
Response:
Thank you for your excellent comment. We have adjusted Table 5 and Table 6 in single page.
9) I can observe some problems within your references within the paper. The paper reference numbers appear as '[Error! Reference source not found.]'. You would need to correct these references.
Response:
Thank you for your good comment. We have corrected the connections for each reference to ensure that each reference can be successfully opened.