Next Article in Journal
Self-Assembled TiN-Metal Nanocomposites Integrated on Flexible Mica Substrates towards Flexible Devices
Previous Article in Journal
Linguistic-Driven Partial Semantic Relevance Learning for Skeleton-Based Action Recognition
Previous Article in Special Issue
Research on the Collision Risk of Fusion Operation of Manned Aircraft and Unmanned Aircraft at Zigong Airport
 
 
Article
Peer-Review Record

Robotic Grasping of Novel Objects Based on Deep-Learning Based Feature Detection

Sensors 2024, 24(15), 4861; https://doi.org/10.3390/s24154861
by Kai Sherng Khor 1, Chao Liu 2 and Chien Chern Cheah 1,*
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Sensors 2024, 24(15), 4861; https://doi.org/10.3390/s24154861
Submission received: 29 June 2024 / Revised: 23 July 2024 / Accepted: 24 July 2024 / Published: 26 July 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Though robot navigation area has progressed ahead but intelligent grasping of unknown object research is falling behind and that's why uses of robots and manipulators in manufacturing and production industries have been affected. In that perspective the reviewed article is a step forward in this area. 

An well-written article in 21 pages, including 23 most appropriate references, offers a new approach of ML for recognising unknown objects and grasping it using two gripper jaws in a balanced position. In this work the developed algorithm is trained to recognize common object features such as edges and corners; then after image processing using deep learning the system can identify grasping surfaces using two gripper jaws (though that's a limitation at this stage). Accordingly the grasping surfaces are parallel and the jaws-contact line possibly passes through the abstract centroid of the object. Claimed 98.25% success rates of appropriate grasping which is higher than other a couple of works in this area. The work is step forward for mobile robots in unknown object grasping.  Thank you providing the algorithm, helpful for other researchers. 

However, the major question remains unknown and that is appropriate grasping force estimation for unknown objects! The other limitation of the work is that the algorithm can process data only for 2-jaw gripper (!) while experiences show that 2-jaw grasping is unstable in many cases and it shortens its application in machine manufacturing areas where most of the parts are of cylindrical shape.   

Though the article been extensively well written, I would take the diligence to recommend the followings:

Improve the Conclusion section, pl use standard content such as methodology connected to the problem and then output of the work. Please indicate future works in plan. 

Also it would be good to explore the problem and research question in Introduction section appropriately. 

Good to use the term "unknown object" all through the article instead "novel object" and add keywords like "unknow object", robotic grasping (if you think so). 

It was my pleasure to read the article (many thanks to the editor giving that opportunity and trusting me), there are good learning points! 

Author Response

The authors would like to express their sincere gratitude to the Editor and
the three reviewers for their valuable and constructive comments. We have
thoroughly revised our manuscript according to the comments and suggestions.
In the following, we provide the point-to-point responses to the comments
and the revisions made in the manuscript. The major changes are
highlighted in red color in the revised manuscript as per the Editor’s requirement
and to facilitate the reviewers’ reference. We hope the responses
can clarify the reviewers’ concerns and the according revisions can help to
improve the manuscript’s quality to meet the publication requirements of
Sensors.

Response to Reviewer 1:
- “Though robot navigation area has progressed ahead but intelligent grasping
of unknown object research is falling behind and that’s why uses of robots
and manipulators in manufacturing and production industries have been affected.
In that perspective the reviewed article is a step forward in this area.
An well-written article in 21 pages, including 23 most appropriate references,
offers a new approach of ML for recognising unknown objects and
grasping it using two gripper jaws in a balanced position. In this work the
developed algorithm is trained to recognize common object features such as
edges and corners; then after image processing using deep learning the system
can identify grasping surfaces using two gripper jaws (though that’s a
limitation at this stage). Accordingly the grasping surfaces are parallel and
the jaws-contact line possibly passes through the abstract centroid of the object.
Claimed 98.25% success rates of appropriate grasping which is higher
than other a couple of works in this area. The work is step forward for mobile
robots in unknown object grasping. Thank you providing the algorithm,
helpful for other researchers.”
Reply: The authors are grateful to the reviewer’s appreciation of the
contribution of this work.


- “However, the major question remains unknown and that is appropriate
grasping force estimation for unknown objects! ”
Reply: The authors acknowledge that grasp force estimation is a limitation
to this work. Currently, as the focus is purely on the visual aspects
of the object, other physical properties like malleability and textures are
not considered in the grasp derivation. Section 3.1 on page 14 explains how
much grasp force is used. This certainly is a limitation especially for softer
and more flexible objects. Some of the experimental failures like the toy
slinky (that was elaborated on page 19) failed mainly because of the malleable
nature of the object and the use of an insufficient grasping force. This
has been added to future works in the Conclusion section on page 20.


- “The other limitation of the work is that the algorithm can process
data only for 2-jaw gripper (!) while experiences show that 2-jaw grasping
is unstable in many cases and it shortens its application in machine manufacturing
areas where most of the parts are of cylindrical shape.”
Reply: The authors thank the reviewer to highlight this point. The
proposed algorithm in this work is indeed developed for 2-jaw (finger) parallel
grippers. The main motivation of devoting this study to such type
of grippers is due to the fact that the 2-jaw grippers are the most commonly
used grippers in practical applications compared to other types of
grippers, e.g. multi-finger grippers, vacuum grippers, magnetic grippers,
etc. It is acknowledged that different types of grippers possess their own
benefits, but the 2-jaw grippers represent the most popular grippers for
robotic manipulations in practice by offering several key advantages, including
“versatility” (capable to handle a wide range of object shapes and
sizes with help of finger-tip changers/adaptors), “precision” (in generating
precise gripping force when handling delicate items), “simplicity and reliability”
(for mechanical, control design and maintenance), “operation speed”
(especially compared with complex mutli-finger grippers), “cost-effective”
(simpler design, fewer components and widespread use) [R1] [R2].
It is true that if only gripping stability is concerned multi-finger grippers
could perform better, but the overall cost may be greatly increased compared
to 2-jaw grippers because of the aforementioned practical considerations. As
a matter of fact, the experimental studies of this work covered a large number
of objects including some with challenging shapes like cylindrical mug and
round ball, but still a high grasp success rate of 98.25% was achieved with a
general-purpose 2-jaw gripper, which further justifies the effectiveness and
robustness of the proposed algorithm and thus verifies the contribution of
this work.
We add in a “Remark” paragraph to address this point on Page 14 of
the revised manuscript.
[R1] L. Birglen, T. Schlicht, ”A statistical review of industrial robotic
grippers”, Robotics and Computer-Integrated Manufacturing, Vol. 49, Pages
88-97, 2018.
[R2] Z. Hu, W. Wan, and K. Harada, ”Designing a Mechanical Tool for
Robots with 2-Finger Parallel Grippers”, IEEE Robotics and Automation
Letters, Vol. 4, No. 3, Pages 2981-2988, 2019.


- “Though the article been extensively well written, I would take the
diligence to recommend the followings:
Improve the Conclusion section, pl use standard content such as methodology
connected to the problem and then output of the work. Please indicate future
works in plan. ”
Reply: Following the reviewer’s suggestion, the authors have rewritten
the section of Conclusion by explaining the algorithm development and the
experimental studies with more details. Also, the planned future works have
been added in as the second paragraph of this section.


- “Also it would be good to explore the problem and research question in
Introduction section appropriately. ”
Reply: A detailed survey of relevant works in literature was provided in
the Introduction of the first version of this paper. Following the reviewer’s
suggestion, we have modified the last paragraph of the Introduction section
on Page 3 of the revised manuscript in order to explicitly point out the open
problem and further highlight the research challenge.


- “Good to use the term “unknown object” all through the article instead
“novel object” and add keywords like “unknow object”, robotic grasping (if
you think so). ”
Reply: Following the reviewer’s suggestion, the authors have gone through
the manuscript and replaced all “novel object” with “unknown object”.
Also, the keywords have been updated as ”robotics, robotic grasping, unknown
objects”.


- “It was my pleasure to read the article (many thanks to the editor giving
that opportunity and trusting me), there are good learning points! ”
Reply: The authors would like to thank the reviewer again for his/her
appreciation of this work’s contribution.

 

Attached is also the authors' responses to all three reviewers as a whole.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The research on grasping novel objects presented in this paper is quite significant. Compared to traditional deep learning-based grasping methods, it does not require the construction of extensive datasets and can better cope with open environments. While the paper can be accepted after minor revisions, there are a few issues that need to be discussed with the authors:

(1) Although the paper mentions identifying object grasping features through edges, corners, and centroids to address grasping novel objects, is there any overlap between the objects selected from the Cornell dataset and those used in subsequent experiments? Additionally, what is the difficulty level of annotating such data? How were the edges and corners of objects like "teddy bears" in the subsequent experiments handled?

(2) The explanations regarding P (perimeter) and A (area) in the paper are quite confusing. It would be helpful to include illustrations alongside the explanations to make them clearer and easier to understand.

(3) Conducting only 10 experiments in the subsequent experiments may not be sufficiently convincing. Is the sample size sufficient to draw robust conclusions?

(4) When objects have multiple grasping points, such as scissors and charging cables shown in the experiments, how is the unique grasping point determined? Furthermore, water cups are circular, and the handling of circular objects is not mentioned in the previous theoretical section. How does the proposed algorithm address this?

(5) The connection between the grasping points of an object ideally passes through its centroid. However, in practical grasping scenarios, the centroid of a novel object may not align with the centroid derived from the captured image. How does this affect the grasping success rate? Moreover, the material properties of novel objects are unknown. Are there any conditions under which the reported 98% grasping success rate for novel objects holds? For instance, do smooth, transparent, or fragile objects fall within the scope of "novel objects" considered in this study?

Author Response

The authors would like to express their sincere gratitude to the Editor and
the three reviewers for their valuable and constructive comments. We have
thoroughly revised our manuscript according to the comments and suggestions.
In the following, we provide the point-to-point responses to the comments
and the revisions made in the manuscript. The major changes are
highlighted in red color in the revised manuscript as per the Editor’s requirement
and to facilitate the reviewers’ reference. We hope the responses
can clarify the reviewers’ concerns and the according revisions can help to
improve the manuscript’s quality to meet the publication requirements of
Sensors.

Response to Reviewer 2:
- “The research on grasping novel objects presented in this paper is quite
significant. Compared to traditional deep learning-based grasping methods,
it does not require the construction of extensive datasets and can better cope
with open environments. ”
Reply: The authors are grateful to the reviewer’s appreciation of the
contribution of this work.


- “While the paper can be accepted after minor revisions, there are a few
issues that need to be discussed with the authors:
(1) Although the paper mentions identifying object grasping features through
edges, corners, and centroids to address grasping novel objects, is there any
overlap between the objects selected from the Cornell dataset and those used
in subsequent experiments? Additionally, what is the difficulty level of an-
notating such data? How were the edges and corners of objects like “teddy
bears” in the subsequent experiments handled? ”
Reply: There were no overlaps between the objects selected from the
Cornell dataset and those used in our experiments. Since there were no existing
datasets labeled for the purpose of this study, the labeling of edges and
corners was done manually. The objects chosen for training had clear edges
and corners and therefore it was not difficult to do data annotation. Two
examples are illustrated in Figure 4 on page 6 of the revised paper, showing
a glasses case and a bag clip not used in the subsequent experiments. The
sharp corners on the bag clip are very prominent compared to the curved
corners on the glasses case. However, these curved corners are also labeled
as corners to account for the fact that many objects are deliberately manufactured
with rounded corners for safety purposes (e.g., cellphones). We
have clarified the above mentioned in page 6 of the revised paper. This labeling
strategy improved the robustness of the algorithm when dealing with
objects like the “teddy bear” and has been proved through the experiments
to be highly effective. After training, the algorithm automatically treated
parts like the bear’s ears and hands as corners, providing sufficient features
for the algorithm to derive a grasp position. It should be noted that the
teddy bear was also not part of the training dataset.


- “(2) The explanations regarding P (perimeter) and A (area) in the paper
are quite confusing. It would be helpful to include illustrations alongside
the explanations to make them clearer and easier to understand. ”
Reply: A new Figure 2 has been added on page 4 of the revised paper
to better illustrate the explanations for the P (perimeter), A (area), and
(xc, yc) (object centroid).


- “(3) Conducting only 10 experiments in the subsequent experiments
may not be sufficiently convincing. Is the sample size sufficient to draw
robust conclusions? ”
Reply: We would like to clarify that there were a total of 400 experiments
conducted. The 10 attempts were referring to number of trails for
each object, and there were 40 objects in total involved in the experimental
validation. As illustrated in Table 1 in page 16 of the revised paper, the
total number of attempts is significantly higher than the existing methods
[18, 19, 20] in the literature. These 3 studies in [18,19, 20] conducted 4,
5, and 10 attempts per object respectively and the number of objects were
also less than that in our experiments. Therefore, the authors believe that
the sample size of the experiments is sufficiently large to draw meaningful
comparisons and conclusions.


- “(4) When objects have multiple grasping points, such as scissors and
charging cables shown in the experiments, how is the unique grasping point
determined? Furthermore, water cups are circular, and the handling of cir-
cular objects is not mentioned in the previous theoretical section. How does
the proposed algorithm address this? ”
Reply: As illustrated in Figure 17 on page 16 of the revised paper, this
algorithm does not have a unique grasping point for objects with varying
configurations, such as Swiss knives and scissors. The algorithm determines
where to grasp the object automatically and entirely based on what it sees
in the current configuration before grasping. This highlights the algorithm’s
strength in responsiveness to various situations. The algorithm does not
search for a fixed grasping point when the position, orientation, and shape
of the object change, as demonstrated by the differing grasp points on the
many configurations of the Swiss Army Knife. This flexibility accounts
for its high grasp success rate despite the diverse situations the algorithm
encounters during experiments. For circular objects, please refer to Section
2.7 “Objects without straight edge” on page 11 of the manuscript, which has
been dedicated to addressing this issue. In this case, the algorithm searches
for the shortest distance across the object centroid from one side of the
perimeter to the other to obtain the grasping pose. It should be noted that
in real sensor-based measurements, no lines across the object centroid will
be detected to have exactly the same lengths in numerical values considering
the measurement resolution and noises, so the shortest line can always be
determined according to the sensor measurement results even for a circular
object.


- “(5) The connection between the grasping points of an object ideally
passes through its centroid. However, in practical grasping scenarios, the
centroid of a novel object may not align with the centroid derived from the
captured image. How does this affect the grasping success rate? Moreover,
the material properties of novel objects are unknown. Are there any conditions
under which the reported 98% grasping success rate for novel objects
holds? For instance, do smooth, transparent, or fragile objects fall within
the scope of ”novel objects” considered in this study? ”
Reply: The proposed algorithm does not rely much on the object centroid
to determine the grasping points but more on the overall object geometric
properties (edges and corners), similar as human do. As illustrated in
Figure 8 and the results in Figure 19 of the revised paper, the centroid is not
necessary for a good grasp, as it depends on the features and shapes of the
objects. The geometrical estimate is therefore sufficient for the algorithm to
function effectively. The object centroid is necessary only when there is no
edge detected for the object, as explained in Section 2.7 “Objects without
straight edge” on page 11 of the manuscript and in the previous reply. In
such situations, there indeed may exist the misalign problem of object centroid
and image centroid. However, it is very unlikely to affect the grasping
success rate because the current optical sensors used for robot manipulation
usually provide good imaging quality so that the misalign won’t be severe.
The authors thank the reviewer to point out the material issue and
acknowledge that other physical properties, such as flexibility and textures,
are not considered in the grasp derivation, which is a current limitation
of this work. Section 3.1 on page 14 explains how much grasp force is
used, highlighting a limitation, especially for softer and more flexible objects.
Some experimental failures, such as with the toy slinky (elaborated on page
19), occurred mainly due to the flexible nature of the object and the use
of insufficient grasping force. Meanwhile, it should be noted that flexible
object grasping and manipulation is currently a stand-alone and very active
research topic and was thus not specifically explored in this work.
Since the proposed algorithm only uses optical sensor (camera) for object
detection (and accordingly grasping point selection) and force sensor
for simple grasping force control, object with special physical properties
that are beyond the detection capability of optical sensors will certainly
affect the grasping success rate. As the reviewer mentioned, objects with
slippy/smooth surface and high transparency are expected to cause difficulty
in detection and hence grasping point selection. Multimodal sensors
will be needed to solve this problem, including tactile sensor, laser scanner,
etc., which is beyond the research scope of this work. Figure 19 on page
18 shows all the 40 objects used in this study. Most of the objects were
chosen to facilitate comparisons with works in the literature [18, 19, 20].
Nevertheless, among the 40 experiment objects, 18 other objects were chosen
including highly flexible objects like slinky toy (object 11, Figure 18)
and objects with varying configurations like Scissors (object 8, figure 18),
Wine opener (object 28, Figure 18), Swiss Army Knife (object 31, figure
18). The objects tested also include smooth objects like the metal bowl
(object 16, Figure 18), slightly transparent objects like the translucent box
(object 29, Figure 18), fragile objects like the glass jar (object 18, Figure
18). Even with such challenging objects, the grasping performance was still
remarkable, which verifies again the effectiveness of the proposed algorithm.
The consideration of physical properties beyond geometrical ones has been
added to future works in the Conclusion section on page 20.

 

Attached is also the authors' responses to all three reviewers as a whole.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper presents a novel method for synthesis of effective robotic grasps, based on detection of object features from the camera images and the use of deep learning algorithms. In particular YOLOv5-OBB is used here as deep learning algorithm. The method presented seems simple and generalizable to different kind of objects, differently to methods based on training with specific sets of objects. The features detected on images are labelled as edges or corners and the algorithms proposed uses the position and/or inclinations of these features to select an appropriate grasping posture with a two-finger pinch.

The paper describes well and clearly the method, uses adequate images to support it and describes the algorithms with clear pseudocode.  

The method is also demonstrated through for different sets of objects, with a total of 40 objects and 10 grasp attempts with different positions and orientation for each object. A UR5e robotic arm with a commercial pinch gripper was used. The overall success rate is higher than 98%, improving previous results in similar studies for 3 similar objects sets.

 

Overall, the paper is considered solid and a good contribution to robotic grasping methodology. As possible improvements before paper publication, I suggest:

 

-       Providing more information about the control method used in the robot for the experiments. How is decided that grasp is done, before moving the robotic arm? Is any pressure sensor information used for this?

-       Sharing the algorithms used for the study in a public repository.

Author Response

The authors would like to express their sincere gratitude to the Editor and
the three reviewers for their valuable and constructive comments. We have
thoroughly revised our manuscript according to the comments and suggestions.
In the following, we provide the point-to-point responses to the comments
and the revisions made in the manuscript. The major changes are
highlighted in red color in the revised manuscript as per the Editor’s requirement
and to facilitate the reviewers’ reference. We hope the responses
can clarify the reviewers’ concerns and the according revisions can help to
improve the manuscript’s quality to meet the publication requirements of
Sensors.

Response to Reviewer 3:
- “The paper presents a novel method for synthesis of effective robotic grasps,
based on detection of object features from the camera images and the use of
deep learning algorithms. In particular YOLOv5-OBB is used here as deep
learning algorithm. The method presented seems simple and generalizable to
different kind of objects, differently to methods based on training with specific
sets of objects. The features detected on images are labelled as edges or
corners and the algorithms proposed uses the position and/or inclinations
of these features to select an appropriate grasping posture with a two-finger
pinch.
The paper describes well and clearly the method, uses adequate images to
support it and describes the algorithms with clear pseudocode.
The method is also demonstrated through for different sets of objects, with a
total of 40 objects and 10 grasp attempts with different positions and orientation
for each object. A UR5e robotic arm with a commercial pinch gripper
was used. The overall success rate is higher than 98%, improving previous
results in similar studies for 3 similar objects sets.
Overall, the paper is considered solid and a good contribution to robotic
grasping methodology. ”
Reply: The authors are grateful to the reviewer’s appreciation of the
contribution of this work.


- “As possible improvements before paper publication, I suggest:
Providing more information about the control method used in the robot for
the experiments. How is decided that grasp is done, before moving the robotic
arm? Is any pressure sensor information used for this? ”
Reply: We have clarified that the robot commands are executed based
on the inner controller of UR5e with closed architecture and no external
feedback controller is required. Please refer to Section 3.1 on page 14 of
the revised paper. A grasp is considered successful if the robot is able to
grasp it and return to its ending position without dropping it in the moving
process. Please refer to the sentence at end of section 3.1. Pressure sensor
information is used but it is set at as a fixed threshold as other physical
properties like malleability and textures are not considered in the grasp
derivation. We have also clarified it in section 3.1 and added a paragraph of
future works in the Conclusion section on page 20.


- “Sharing the algorithms used for the study in a public repository.”
Reply: Following the reviewer’s suggestion, a link to a public repository
(Github) holding the code has been added after the pseudo-code on page
13.

 

Attached is also the authors' responses to all three reviewers as a whole.

Author Response File: Author Response.pdf

Back to TopTop