Next Article in Journal
Environmental Impact of Potentially Toxic Elements on Tropical Soils Used for Large-Scale Crop Commodities in the Eastern Amazon, Brazil
Next Article in Special Issue
One-Dimensional Convolutional Auto-Encoder for Predicting Furnace Blowback Events from Multivariate Time Series Process Data—A Case Study
Previous Article in Journal
Formation of a Composite Albian–Eocene Orogenic Wedge in the Inner Western Carpathians: P–T Estimates and 40Ar/39Ar Geochronology from Structural Units
Previous Article in Special Issue
Comparison of Semirigorous and Empirical Models Derived Using Data Quality Assessment Methods
 
 
Article
Peer-Review Record

Learning to Tune a Class of Controllers with Deep Reinforcement Learning

Minerals 2021, 11(9), 989; https://doi.org/10.3390/min11090989
by William John Shipman
Reviewer 1:
Reviewer 2: Anonymous
Minerals 2021, 11(9), 989; https://doi.org/10.3390/min11090989
Submission received: 13 July 2021 / Revised: 13 August 2021 / Accepted: 19 August 2021 / Published: 9 September 2021
(This article belongs to the Special Issue The Application of Machine Learning in Mineral Processing)

Round 1

Reviewer 1 Report

The paper discusses an interesting topic that uses DRL to improve the traditional PI controller. The PPO algorithm was used to train the agent and hyper-parameter optimization was performed to select the optimal agent neural network size and training algorithm parameters. The paper is well written and easy to follow. One suggestion is to include related work about using RL to optimize PID in the robotics field. 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

This work presents a system based on Deep Reinforcement Learning to train an agent to calibrate hyper-parameters of a control system mining application. For this, a simulation of a first-order process with a time delay, controlled by a PID controller was used as the environment. The agent was based on the Proximal Policy Optimization algorithm.

The article is very well written, and the theoretical support is strong, the methodology is good, and the application is very interesting and its contribution is important since calibrating controller parameters is still an open research topic. In addition, the results are very well presented, and it can be observed how the RL method outperforms the conventional hyper-parameters to reduce the error of the controllers.

I just have a few observations for the authors.

- In the materials and methods part, I suggest showing a block diagram of the proposed system to be able to visualize the problem in a better way.

- A figure that represents the Markov decision process, where the agent, the states, or observations are defined, and how the agent receives the reward is recommended.

- This problem is a Markov decision process or a Partially Observed Markov decision process? Please, comment about it.

- It should be clear that the input is in the continuous state space as well as the output, and that is why the PPO agent is used.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop