Next Article in Journal
Orthogonal Experimental Design Based Nozzle Geometry Optimization for the Underwater Abrasive Water Jet
Previous Article in Journal
Modelling of Determinants of Logistics 4.0 Adoption: Insights from Developing Countries
 
 
Article
Peer-Review Record

Reinforcement Learning Control of Hydraulic Servo System Based on TD3 Algorithm

Machines 2022, 10(12), 1244; https://doi.org/10.3390/machines10121244
by Xiaoming Yuan 1,*, Yu Wang 1, Ruicong Zhang 1, Qiang Gao 2, Zhuangding Zhou 1, Rulin Zhou 3 and Fengyuan Yin 1
Reviewer 1:
Reviewer 2: Anonymous
Machines 2022, 10(12), 1244; https://doi.org/10.3390/machines10121244
Submission received: 28 November 2022 / Revised: 14 December 2022 / Accepted: 15 December 2022 / Published: 19 December 2022
(This article belongs to the Topic Designs and Drive Control of Electromechanical Machines)

Round 1

Reviewer 1 Report

The paper presents a reinforcement learning approach to control a servo system, based on a gradient algorithm training an artificial neural network (ANN). In the introductory part of the paper the authors identify the need to obtain contrl systems with high precision. Secondly, they rise the need to use a reinforcement learning, but without clear support to do so. Do robust control techniques, optimal control policies, standard optimization methods are insufficient? Does that happen due to lack of availability of a precise model? Please supoort hte use of a RL approach some more. 

 

What happens when you have no model and you wish to tune the controller to get the optimal performance? What solutions can you adopt here? 

 

Next, the authors provide a wide overview of various approaches to obtaining optimized performance of the system, to increase effectiveness of the control system. Reinforcement learning is defined at the next stage, with the details of DDPG algorithm given. Please describe the methodology behind selecting the specific topology of the ANN from Table 2. Why do you expect this specific number of layers and neurons is sufficient? has any research been done on a different configuration? What have the results been? 

 

Why is reward negative in Figure 12? 

 

How was the LQG controller tuned, and why LQG not LQR one? or LQI one? are the results really comparable as far as so different contorl strategies are selected? 

 

Table 4 - what time of stability is? 

 

How long did the training take? How large a training set is? Verification set? How have you selected the stopping criteria, eventually? 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

In view of the nonlinear, time-varying and parameter coupling characteristics of the hydraulic servo system, this paper aims to construct a reinforcement learning environment to realize the on-line self-tuning of the parameters. The comparison results between TD3, DDPG and LQG algorithms show that the TD3 algorithm effectively improves the trajectory tracking performance and thus verifying the proposed control strategy. This paper fits the scope of the journal and I would like to recommend it be published if the authors could consider the following issues:

1. The authors are advised to illustrate the reasons for selecting the reinforcement learning algorithm in the introduction section.

2. For the statement in Section 3, "It can be seen from Table 4 that, compared with DDPG, the overshoot of TD3 step signal is reduced by 24%, and the stabilization time reaches 0.056s, which is twice as short as that of LQG, and the overshoot time reaches 7%.". The data of "24%" and "0.056" may need to be further explained by combining the data in Table 4.

3. For the figure 15 "Comparison diagram of control method effect" in Section 3, it is suggested to check whether the title corresponds to the content in the figure, such as Figure 15b) and 15e).

4. For the statement in Section 1 "Lv, Y. et al. [9] used an RL-based ADP structure to learn the optimal tracking control input of the servo mechanism, where unknown system dynamics are approximated with a three-layer NN identifier. " It is suggested to add the "ADP" in the sentence into the annotation of the English abbreviation.

5. There are some grammar mistakes and please check them throughout the manuscript.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop