Anthropomorphic Grasping of Complex-Shaped Objects Using Imitation Learning
Round 1
Reviewer 1 Report
This study is very interesting from a practical perspective and offers very relevant practical contributions. However, these contributions are not explicitly addressed by the authors. The theoretical contextualization of this work is relatively weak. The scientific quality of the work needs to be increased to approximate the technical quality of the project implementation.
Improvement suggestions:
- I recommend the authors to include a Literature Review sector.
- Authors note “To imitate human hand motions, we have used the commercial Allegro robotic hand, which has four fingers with 4 DOF each.” Are there other alternatives on the market?
- The same applies to RGBD camera. The selection criteria should be more explicit.
- Before presenting the simulation results, the authors should clarify the methods used to analyze the data.
- Authors state “In the simulation environment, we used ten objects that we can see in Figure 13 a screwdriver, cup, carafe, telephone, valve, hammer, wrench, skillet, scissors, and lamp. The result shows 97% accuracy.” Where are these results presented?
- Authors should better explain the results presented in Table 1. What is a success and failure case? Where is the specification for each test run?
- There is no discussion of the results considering the results obtained by other authors in the field using alternative approaches. I consider it essential to include the Discussion section.
- Conclusions section must present the theoretical and practical contributions provided by this work.
- Number of references should be improved.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
The paper concerns a high DOF anthropomorphic robotic hand for perceiving and grasping complex-shaped objects autonomously by using a number of machine learning approaches. This work utilizes 3D data cloud and RGB imagine to estimate the grasping pose. It is an interesting contribution, and hence it could be published but after some revisions. See the points below.
1. The efficiency of the high DOF gripper is not expressed.
2. In section 5.2, the transformation matrices should be different with different candidate objects. Also, the authors should explain how to determine K when the KNN is used. What index determines the fineness of the clustering algorithm? The authors should explain these key issues related to transformation matrix.
3. It is curious how long it takes for the process from image reconstruction to capture successfully. Whether it can be completed in real time or in a short time is still open for discussion.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Round 2
Reviewer 1 Report
Minor improvements are suggested:
1. Authors note “Compared to other studies that conduct similar work with different approaches[39,40]…” Please clarify the characteristics of these different approaches addressed in studies [39] and [40].
2. Authors note “Besides, using inverse reinforcement learning, we will find a faster path from default pose to grasping than human experts suggested.” Explain how you can do that using inverse reinforcement learning.
Author Response
Please see the attachment.
Author Response File: Author Response.docx