Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Learning to Tune a Class of Controllers with Deep Reinforcement Learning

Minerals 2021, 11(9), 989; https://doi.org/10.3390/min11090989

by William John Shipman

Reviewer 1:

Chang Wang

Reviewer 2: Anonymous

Minerals 2021, 11(9), 989; https://doi.org/10.3390/min11090989

Submission received: 13 July 2021 / Revised: 13 August 2021 / Accepted: 19 August 2021 / Published: 9 September 2021

(This article belongs to the Special Issue The Application of Machine Learning in Mineral Processing)

Round 1

Reviewer 1 Report

The paper discusses an interesting topic that uses DRL to improve the traditional PI controller. The PPO algorithm was used to train the agent and hyper-parameter optimization was performed to select the optimal agent neural network size and training algorithm parameters. The paper is well written and easy to follow. One suggestion is to include related work about using RL to optimize PID in the robotics field.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

This work presents a system based on Deep Reinforcement Learning to train an agent to calibrate hyper-parameters of a control system mining application. For this, a simulation of a first-order process with a time delay, controlled by a PID controller was used as the environment. The agent was based on the Proximal Policy Optimization algorithm.

The article is very well written, and the theoretical support is strong, the methodology is good, and the application is very interesting and its contribution is important since calibrating controller parameters is still an open research topic. In addition, the results are very well presented, and it can be observed how the RL method outperforms the conventional hyper-parameters to reduce the error of the controllers.

I just have a few observations for the authors.

- In the materials and methods part, I suggest showing a block diagram of the proposed system to be able to visualize the problem in a better way.

- A figure that represents the Markov decision process, where the agent, the states, or observations are defined, and how the agent receives the reward is recommended.

- This problem is a Markov decision process or a Partially Observed Markov decision process? Please, comment about it.

- It should be clear that the input is in the continuous state space as well as the output, and that is why the PPO agent is used.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Article Menu

Learning to Tune a Class of Controllers with Deep Reinforcement Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI