Next Article in Journal
Music Generation System for Adversarial Training Based on Deep Learning
Next Article in Special Issue
From Pollution to Green and Low-Carbon Island Revitalization: Implications of Exhibition-Driven Sustainable Tourism (Triennale) for SDG 8.9 in Setouchi
Previous Article in Journal
A Hybrid Euler–Lagrange Model for the Paint Atomization Process of Air Spraying
Previous Article in Special Issue
Development of a ZrO2-Coating Technique by a Sol–Gel Process Assisted with Pre-Silica-Coating
 
 
Article
Peer-Review Record

Reinforcement Learning Control with Deep Deterministic Policy Gradient Algorithm for Multivariable pH Process

Processes 2022, 10(12), 2514; https://doi.org/10.3390/pr10122514
by Chanin Panjapornpon *, Patcharapol Chinchalongporn, Santi Bardeeniz, Ratthanita Makkayatorn and Witchaya Wongpunnawat
Reviewer 1:
Reviewer 2: Anonymous
Processes 2022, 10(12), 2514; https://doi.org/10.3390/pr10122514
Submission received: 7 October 2022 / Revised: 10 November 2022 / Accepted: 21 November 2022 / Published: 26 November 2022

Round 1

Reviewer 1 Report

some issues if the authors could revise their manuscript, this will be good,

1. line 244, if the equations 9-11 are correct. equation (13) is much earlier presented than equation (12);

2. Please add neural network structure information in Section 2 will be clear for readers to implement it in the future;

3. Line 282, in some control problems, the tuned system performance needs to consider the positive or negative error penalty differently. Since mainly, the synthesis process starts from some value and has a trend to some fixed value, which sometimes means the negative error is much more unfavorable. Could the author try to discuss that?

4. Line 319 the reference error

5. Line 376, is there a kind of drawback of RL that as the number of trained episode increase the performance varies without any clue? Are there any tune experiences for authors could share with others?

6. Line 401 that unstable must not be driven by any reasonable PID designer. So please revise the information for PID control's drawbacks. Further, there is another problem, why don't try PID (industrial application is extraordinary work well) compare with RL method? 

7. PH and the level controls are separated. I can not understand why this process cannot be controlled at the same time( just for 2 variables), or are they do not interact with each other thoroughly?

8. the conclusion is a little short, authors should polish that.

Author Response

Please See the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

This study presents the development of reinforcement learning (RL) control with a deep deterministic policy gradient (DDPG) algorithm to handle coupled pH and liquid level control in a continuous stirred tank reactor with a strong acid-base reaction.

In order to continue evaluating the manuscript, the author needs to clarify a few issues:

 

1.     What do Figures 7, and 10 depict?

2.     What are the actions (outputs) of 2 policies (liquid level control and pH control)? As far as I understand, they are h and pH values, is it true?

3.     The dimension of each layer (fully connected layer, output, …). Which hidden layers (figure 5, figure 6) are represented by the number of nodes shown in Tables 2 and 6?

4.     The authors created 2 reward functions for each control system and concluded the best reward function after training with these reward functions (L 384)=> this doesn't make sense because there are numerous reward functions they can create, therefore they cannot compare all of them and they should only present the reward function with the highest performance with their experiment.

5.     I do not think using the grid search algorithm to find the optimal episode is suitable, because, in Figure 10, it seems that the episodes around 350 have better results. In addition, the RL training process tries to maximize reward (it means that the model tries to decrease the error between setpoints and outputs), but evaluation criteria (ITAE, ISE, and IAE) are still high, Why?

6.     Figure 8: wondering about the mean value of the used noise model is around (3, 8). Are they too large to add to actions? ( in case, the actions are h and pH), because it will affect the episode reward and the average reward.

7.     Figure 12: the diagram seems to that the trained model has not been optimized

8.     Why are the ITAE, ISE, and IAE in table 6 too high (sample time)?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop