Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Beyond Trade-Off: An Optimized Binocular Stereo Vision Based Depth Estimation Algorithm for Designing Harvesting Robot in Orchards

Agriculture 2023, 13(6), 1117; https://doi.org/10.3390/agriculture13061117

by Li Zhang¹

, Qun Hao^1,2,3, Yefei Mao⁴, Jianbin Su⁵ and Jie Cao^1,2,*

Reviewer 1:

Jianjun Yin

Reviewer 2:

Huajun Song

Reviewer 3: Anonymous

Agriculture 2023, 13(6), 1117; https://doi.org/10.3390/agriculture13061117

Submission received: 20 April 2023 / Revised: 12 May 2023 / Accepted: 19 May 2023 / Published: 25 May 2023

(This article belongs to the Special Issue Agricultural Automation in Smart Farming)

Round 1

Reviewer 1 Report

You have made significant improvements on the existing image matching algorithms, but there are still some issues:

(1) In section 2, the authors holds this views that the use of active ranging sensors has certain drawbacks and limitations, and passive ranging sensors is superior to active ranging sensors. In fact, with structured light devices, active vision sensors greatly improve the accuracy and efficiency of stereo image matching. For example, 3D scanner is like this, which can obtain the measurement accuracy required for machinery manufacturing. in addition, the vision devices based on infrared stereo matching (such as RealSense R200, RealSense D435) have also good application. For example, the paper titled as “Detection method for table grape ears and stems based on a far-close-range combined vision system and hand-eye-coordinated picking test”, which was published in Computers and Electronics in Agriculture in 2022.

(2) In line 432-435, we compared our proposed algorithm with sum of squared difference (SSD) , normalized cross correlation (NCC) , and adaptive support window (ASW) algorithms. The corresponding quantitative results shown in Table 5. However, the results of the other three algorithms in Table 5 were not given out. Moreover, the data in Table 5 should include the calculation time. In addition, the depth accuracy obtained by the matching algorithm proposed by the author should be compared with results of active vision system such as RealSense.

(3) Figure 12 should be explained in detail the content illustrated by the multiple sets of images.

In general, the manuscript draft is clear and well-written.

Author Response

Response to Reviewer 1

Our point-to-point response to the comments

We appreciate for Editors/Reviewers’ warm work earnestly, and hope that the corrections will meet with approval. Revised parts are marked in yellow in the manuscript. Please feel free to contact us with any questions and we are looking forward to your consideration. The main corrections in the paper and the responds to the reviewer’s comments are as flowing:

Responds to the reviewer’s comments:

Reviewer #1:

Response: Thank you very much for your positive and valuable comments. We are very sorry for our inappropriate expression on active ranging sensors. We read this high-quality paper “Detection method for table grape ears and stems based on a far-close-range combined vision system and hand-eye-coordinated picking test” carefully which gave us great inspiration, and added this paper as reference.

2 “Jin, Y.; Yu, C.; Yin, J.; Yang, S.X. Detection method for table grape ears and stems based on a far-close-range combined vision system and hand-eye-coordinated picking test. Computers and Electronics in Agriculture 2022, 202, 107364.”

Also, consider in our present research work which focus attention on passive ranging sensors, we revised the content of Introduction and related work sections on active ranging sensors, and we will try to do some research work on active ranging sensors, such as RealSense in future work for further discussion on which way is more suitable for the design the visual system for harvesting robot in wild orchards.

“In future work, we will focus on some active vision systems such as RealSense for further discussion on which way is more suitable for the design of the visual system for harvesting robots in wild orchards.”

(2) In line 432-435, we compared our proposed algorithm with sum of squared difference (SSD), normalized cross correlation (NCC), and adaptive support window (ASW) algorithms. The corresponding quantitative results shown in Table 5. However, the results of the other three algorithms in Table 5 were not given out. Moreover, the data in Table 5 should include the calculation time. In addition, the depth accuracy obtained by the matching algorithm proposed by the author should be compared with results of active vision system such as RealSense.

Response: Thank you very much for your positive and valuable comments. We have added detail illustrated for multiple sets of images.

“In order to more comprehensively verify the feasibility of the algorithm proposed in this paper, experiments were carried out on image samples of different depth ranges, short distance (S, 0.6~0.75 m), medium distance (M, 1~1.2 m) and far distance (F, 1.6~1.9 m). The corresponding quantitative results shown in Table 5.”

“Under such three different distances conditions, we compared our proposed algorithm with sum of squared difference (SSD) [33], normalized cross correlation (NCC) [34], and adaptive support window (ASW) [35] algorithms. We visualized the results according to the principle that if to the depth from far to near, the color changed from cold to warm, as shown in Figure 12.”

For your positive and valuable comments on the depth accuracy obtained by the matching algorithm proposed by the author should be compared with results of active vision system such as RealSense.

In our present research work, we focus our attention on 3D depth estimation based on stereo cameras, trying to improve the performance of both accuracy and time cost. We will focus on the active vision system of depth estimation for harvesting robots in future work, and we added it in revised manuscript in section 6 of future work. “In future work, we will focus on some active vision systems such as RealSense for further discussion on which way is more suitable for the design of the visual system for harvesting robots in wild orchards.”

(3) Figure 12 should be explained in detail the content illustrated by the multiple sets of images.

Response: I am very grateful to your comments for the manuscript. We have added detail illustrated for multiple sets of images.

“Figure 12. Visualization of Compared Disparity Map. From top to bottom are rectified left image, and the experiment results obtained by SSD, NCC, ASW and our proposed Completion-BiPy-Disp method, respectively.”

Thanks to all reviewers for the thoughtful and thorough review. Hopefully we have addressed all of your concerns.

Author Response File: Author Response.docx

Reviewer 2 Report

This paper proposes a disparity completion algorithm based on binocular stereo vision. The bilateral filtering and the pyramid fusion are added to the traditional method of obtaining disparity maps in this paper. The bilateral filtering algorithm is added to complete the disparity map that contains many holes and the pyramid fusion model can significantly reduce the time cost of the proposed method. Experiments at three kinds of different distances show that the proposed method can effectively complete disparity maps and estimate the depth with less time consumption. Here are some concerns and suggestions for further improvement.

Main comments:

1.The explanation in line 40-41 for selecting the binocular camera is not clear and not specific.

2.The introduction and related work are somewhat repetitive in some respects. For example, the sentence in line 161-164 is similar to the sentence in line 30-32. I suggest that these two sections can be appropriately combined and reduced.

3.The logical structure of the section 3.2.1 is not very well organized and the principle of the SGM algorithm is not very clearly explained.

4.The description in line 239-242 looks like a summary, which does not match this title of the section. I suggest that the description would be removed.

5.It would be better if this condition in line 262 was advanced.

6. Most of subsection 3.3.3 describes existing problems, but does not introduce the proposed improvement in detail. I suggest that this process in line 296-297 can be described in more detail.

7.The title of Figure 3 is the same as the title of Figure 4, please check if any changes are needed.

8.The title of the section 4.5 is the same as the title of the section 4.4, please check if any changes are needed.

9.The statement in line 422 is not accurate and “the relative error” should be changed to “the average relative error”.

This paper has some grammatical problems. For example, the sentence in line 39 lacks a subject. Please check it carefully.

Author Response

Response to Reviewer 2

Our point-to-point response to the comments

Responds to the reviewer’s comments:

Reviewer #2:

1.The explanation in line 40-41 for selecting the binocular camera is not clear and not specific.

Response: We are grateful to the reviewer for reminding us of this point. According to your advice, we have added the detailed information in the revised manuscript. “Therefore, we applied baseline-variable stereo camera (LenaCV USB3.0), and baseline was set as 60mm.”

Response: Thank you very much for your positive and valuable comments. We have revised this part of the content and removed the duplicate content.

3.The logical structure of the section 3.2.1 is not very well organized and the principle of the SGM algorithm is not very clearly explained.

Response: Thank you very much for your positive and valuable comments. According to your advice, we have reorganized the section 3.2.1 and explained the SGM algorithm in detail. “……based on the idea of pixelwise matching of Mutual Information and approximating a global, 2D smoothness constraint by combining many 1D constraints. This SGM method calculates the matching cost hierarchically by mutual Information. To pathwise optimize from all directions through the image, it uses an approximation of a global energy function named cost aggregation. Then, disparity computation is done by winner takes all and is supported by disparity refinements like consistency checking and sub-pixel interpolation. So, given…” .

4.The description in line 239-242 looks like a summary, which does not match this title of the section. I suggest that the description would be removed.

Response: Thanks for your comments. We have removed line 239-242.

5.It would be better if this condition in line 262 was advanced.

Response: Thanks very much for the kind comments. According to your advice, we have revised it in the manuscript. “where, x is the pixel coordinate, and adjust the intensity and spatial similarity respectively, and according to recommendations, the values of , are set to 10 and 30, respectively.”

Most of subsection 3.3.3 describes existing problems, but does not introduce the proposed improvement in detail. I suggest that this process in line 296-297 can be described in more detail.

Response: Thank you very much for your positive and valuable comments. According to your advice, we have described in more detail it in the manuscript. “……Specifically, the disparity maps with different resolutions were processed by bilateral filtering and up-sampling at first. Then, the output results were as input into a multi-scale pyramid model where 1/32,1/16,1/8,1/4,1/2 and 1 these six different scale multi-resolution results can be obtained from. Finally, The multi-resolution results obtained with corresponding different resolutions which were fused from low to high in a certain proportion, and added to the results on the upper scale. ……“

7.The title of Figure 3 is the same as the title of Figure 4, please check if any changes are needed.

Response: We are grateful to the reviewer for reminding us of this point, we are very sorry for this mistake, and we revised Figure 4 caption. “Prototype equipment and experimental environment”.

8.The title of the section 4.5 is the same as the title of the section 4.4, please check if any changes are needed.

Response: We are very sorry for this mistake. And we have revised it in the manuscript. The title of the section 4.4 is “The qualitative completion results” and the title of the section 4.5 revised as “The quantitative results analyze”

9.The statement in line 422 is not accurate and “the relative error” should be changed to “the average relative error”.

Response: We are very sorry for this mistake. And we have revised it in the manuscript. “The average absolute error of the depth value is 7.2375mm, and the average relative error no more than 1.2%.”

Thanks to all reviewers for the thoughtful and thorough review. Hopefully we have addressed all of your concerns.

Author Response File: Author Response.docx

Reviewer 3 Report

I would like to congratulate you on the preparation of this article. Overall, I am pleased to report that I found the article satisfactory. The paper is well-structured and details all the stages and minor stages of the study. However, there are a few remarks I would like to make about the paper:

- Regarding the formatting:

o In line 26 replace "ability[2]" by "ability [2]" with space. Do the same for line 47.

o In line 30 replace "[4][5][6][7]" with "[4-7]". According to the authors' guide, a hyphen should be used if the citations are consecutive. Do the same for lines 38, 99, 146, 149.

o In line 30 replace "[3][13][14]" with "[3,13,14]". According to the authors' guide this is the correct format if the citations are not consecutive. Do the same for lines 140, 141, 142, 272.

o In line 101 replace "[27]studied" with "Gongal et al. [27] studied". Do the same for line 125, 128, 130, 131, 132.

o In line 103 replace "Gené-Mola et al published" with "Gené-Mola et al. [28] publised".

o In line 198 replace "for a point p" with "for a point p". The "p" should be in italics, to be consistent with the formatting of the rest. Do the same for lines 219, 232, 262, 268.

o In line 199 it says "as equation:", it would be more correct to put "as Equation (1):" or "as equation (1):" as it is written in line 230. Same for the rest (line 202, 204, 207, 215, 218, 219, 254, 257, 274.

o Rewrite sentences with punctuation errors in line 269, 294, 315.

o Put a space between the digit and its unit (line 310, 419, 422, 431, 465, 466).

o Replace in line 311 "in Figure 5" by "in Figure 5.", adding a full stop at the end of the paragraph. Same for lines 320, 329, 406, 414, 443.

o When in a figure there are subfigures within it, either left or right or top and bottom, etc., it does not follow the guidelines defined by the journal's author guidelines. Revise and adapt the format. In addition, the reference in the text to these subfigures is not appropriate.

o In table 4, in the first column "ID", space between the text and the parenthesis". In addition, start the first letter in capital letters. Same for the heading of table 5.

o Try to ensure that all units follow the international system of measurement, using metres (m) or millimetres (mm) if the measurement is very small (line 419, 431, etc.).

o When defining a range of a variable, use a hyphen to determine the minimum and maximum value instead of a space. For example, "1-1.2 m" instead of "1 1.2 m". (line 431, etc.)

- Regarding the content:

o I think you should review the citations in the introduction and related works and check if all citations are necessary.

o There are some ideas about the comparison between different methods such as monocular, binocular and multi-camera system. This idea is stated at the beginning of the introduction and repeated in the related works. It would be good to condense this information and fit it into the section that you think is most appropriate.

o I would have liked to find a more in-depth discussion of the results obtained and their comparison with the work of other authors.

o I would like to know if you have tried to install the perception systems close to the end-effector or used a hybrid between the two, as other authors have done this process to minimise coordination errors between the perception system and the actuator.

Thank you very much for your contribution.
Best regards.

Author Response

Response to Reviewer 3

Our point-to-point response to the comments

Responds to the reviewer’s comments:

Reviewer #3:

In line 26 replace "ability[2]" by "ability [2]" with space. Do the same for line 47.

Response: We are very sorry for this mistake. And we have revised it in the manuscript. “……Exploit RGB camera to achieve passive ranging have advantages of high resolution and strong anti-light influence ability [3].”

In line 30 replace "[4][5][6][7]" with "[4-7]". According to the authors' guide, a hyphen should be used if the citations are consecutive. Do the same for lines 38, 99, 146, 149.