Next Article in Journal
Artificial Intelligence Applications in Military Systems and Their Influence on Sense of Security of Citizens
Next Article in Special Issue
A Real Application of an Autonomous Industrial Mobile Manipulator within Industrial Context
Previous Article in Journal
Open Source Control Device for Industry 4.0 Based on RAMI 4.0
Previous Article in Special Issue
Adaptive Robust Controller Design-Based RBF Neural Network for Aerial Robot Arm Model
 
 
Article
Peer-Review Record

The Control Method of Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism to Multi-DOF Manipulator

Electronics 2021, 10(7), 870; https://doi.org/10.3390/electronics10070870
by Yangyang Hou, Huajie Hong *, Zhaomei Sun, Dasheng Xu and Zhe Zeng
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Electronics 2021, 10(7), 870; https://doi.org/10.3390/electronics10070870
Submission received: 8 March 2021 / Revised: 29 March 2021 / Accepted: 30 March 2021 / Published: 6 April 2021
(This article belongs to the Special Issue Advances in Robotic Mobile Manipulation)

Round 1

Reviewer 1 Report

The topic is interesting, however the quality of the current version of the paper is generally between average and low. Follows a list of comments to help the authors to improve the paper quality.

Perhaps the title could be shortened a little bit.

In general, I am not very sure how much is relevant the proposed method to deep reinforcement learning.

In Introduction, it seems that some stuff belongs to the next section of Related Work, therefore should be moved there.

The terms Q learning and Q value are not explained.

On page 3, the reference to Table 3 needs to be corrected to “Table 3”.

The title in section 3 does not follow the journal’s guidelines.

In section 3 the references to Figure 1 and Figure 2 are missing.

The title in section 4 needs correction (“Rebrith”).

The parameters in equation 15 are not explained.

On page 6, the references to Figures 3 and 4 have not the correct syntax.

In section 6, further details are needed to clarify the UR5 manipulator model used.

Further explanations are needed for the contents of Tables 3 and 4, and particularly for the software infrastructure used in the experimentations.

Author Response

Thank you very much for your review. In view of your suggestion, we have made careful revision. The specific modification feedback has been put in the word version to be uploaded, please review.

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper propose  a new Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism(RTD3) algorithm useful for better learning of the manipulator’s motion capability, without using a kinematic model.

My comments are:

  • page 1 line 40: In my opinion it would be more appropriate:  "...solution of the rotation angle of each joint of the manipulator"
  • page 2 line 49: Even if the working environment is known, all modern robots/manipulators use closed loop control, based on kinematic and/or dynamic model
  • page 2 line 85: Reference is missing
  • page 4 line 146: "... the end executor position of the manipulator"  - "end-effector" is the term most often used, even by the authors of this paper
  • first line of page 5:  in order to be comparable, the angles must be different
  • page 5 line 150: In kinematic and dynamic theory, this is how the angular velocity is denoted. If you are referring to a certain coefficient, to avoid confusion, another notation should probably be used.

In Conclusion, the authors states that the proposed RTD3 algorithm achieves higher learning efficiency and thus the multi-DOF manipulator obtains better motion ability. In my opinion, for this statement to be correct, a comparison would be useful, not necessarily with another learning algorithm (TD3) but with a method currently used in the theory and practice of manipulators..

For the article to be published, I believe that the research design and methods need to be improved.

Author Response

Thank you very much for your review. In view of your suggestion, we have made careful revision. The specific modification feedback has been put in the word version to be uploaded, please review.

Author Response File: Author Response.pdf

Reviewer 3 Report

The current paper proposes to reflect the learning characteristics of the multi-DOF manipulator, and the deep reinforcement learning method can make the multi-DOF manipulator obtain better motion ability. The theory was validated using experiments.

 

Comments to author:

- Please define the sampling time.

- Please add more details of how the theory from the previous sections is applied is applied in the results section.

- Please add the units of measurement both abscissa and orderly in all figures.

- The authors could add a paragraph with the advantages and the disadvantages of the proposed method.

- The state of the art should be improved with more references, maybe the author could add the following publications:

o Hybrid Data-Driven Fuzzy Active Disturbance Rejection Control for Tower Crane Systems, European Journal of Control, vol. 58, pp. 373-387, 2021.

o Multi-Agent-Based Data-Driven Distributed Adaptive Cooperative Control in Urban Traffic Signal Timing, Energies, vol. 12, no. 7, pp. 1–19, 2019.

- Please add more details regarding the obtained results.

Author Response

Thank you very much for your review. In view of your suggestion, we have made careful revision. The specific modification feedback has been put in the word version to be uploaded, please review.

Author Response File: Author Response.pdf

Reviewer 4 Report

  • Very good manuscript describing interesting research (one of the best I’ve reviewed this year). Nonetheless, the reviewer offers suggestions for improvements.
  • Please elaborate the respective details the reader may find in the doubly-cited [11,12] or otherwise reduce the citation to the one necessary to validate the author’s point.
  • Please elaborate the respective details the reader may find in the triply-cited [13-15] or otherwise reduce the citation to the one necessary to validate the author’s point.
  • Please elaborate the respective details the reader may find in the doubly-cited [16,17] or otherwise reduce the citation to the one necessary to validate the author’s point.
  • Figure 3, 4, 5, 6, 7, 8, 9 are rendered essentially useless, since the text in the figure is illegible. As a recommended technique, the reviewer offers advice to maintain text inside figures to never be smaller than the figure caption (the smallest text size permissible in the manuscript template). Please remember in many/most instances the manuscript (if published) will be used in printed form (not on an electronic device with significant zoom abilities)
  • The use of tables for algorithm pseudo-code is very effective.
  • The conclusion section is a bit weak lacking any summary statement of results (like those included in the abstract) drafted in broadest possible terminology. Some of the paragraph in lines 323-331 seems more appropriately placed in Section 7.  Also, the reviewer would like to see the author’s opinions for the direction of future research.  The reviewer offers a desire to see the proposed methods compared to the novel so-called whiplash control of multi-dof manipulators: Sands, T. Optimization Provenance of Whiplash Compensation for Flexible Space Robotics. Aerospace 2019, 6(9), 93.

Author Response

Thank you very much for your review. In view of your suggestion, we have made careful revision. The specific modification feedback has been put in the word version to be uploaded, please review.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

No further comments.

Reviewer 2 Report

The authors addressed all my concerns. No other comments

Reviewer 3 Report

The current paper has been significantly improved since last version and the authors answered to all my concerns. From my point of view the paper can be accepted to be published in Electronics Journal.

Back to TopTop