Next Article in Journal
The Concurrent Validity and Test-Retest Reliability of Possible Remote Assessments for Measuring Countermovement Jump: My Jump 2, HomeCourt & Takei Vertical Jump Meter
Previous Article in Journal
Tasks Allocation Based on Fuzzy Rules in Fractional Assembly Line with Redundancy
Previous Article in Special Issue
Integrated Adaptive Steering Stability Control for Ground Vehicle with Actuator Saturations
 
 
Article
Peer-Review Record

Event-Triggered Single-Network ADP for Zero-Sum Game of Unknown Nonlinear Systems with Constrained Input

Appl. Sci. 2023, 13(4), 2140; https://doi.org/10.3390/app13042140
by Binbin Peng 1, Xiaohong Cui 1,2,*, Yang Cui 3 and Wenjie Chen 1
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Appl. Sci. 2023, 13(4), 2140; https://doi.org/10.3390/app13042140
Submission received: 17 December 2022 / Revised: 3 February 2023 / Accepted: 6 February 2023 / Published: 7 February 2023

Round 1

Reviewer 1 Report

1. What is the main question addressed by the research? In this manuscript, H∞ problem with unknown dynamic and constrained input using event-triggered adaptive dynamic programming is studied and a single-critic network structure based on the event triggered ADP is proposed to approximate the solution of the HJI equation, and then simulation of the continuous-time linear system and the continuous-time nonlinear system is proposed. The topic is relevant to the field and helps addressing systems with unknown dynamics and constraints. The manuscript provides event-triggered ADP algorithm to solve the locally un- 300 known zero-sum game problem with constrained input. The method is clearly proposed and results convey the concept well. 

Author Response

Dear Reviewer,

         Thanks for your constructive suggestion, which is highly appreciated. We have carefully scrutinized the manuscript and made corresponding revisions, including grammatical errors and the addition of detailed descriptions. This paper proposes an event-triggered adaptive dynamic programming method to deal with the H∞ control problem with unknown dynamic and constrained input. First, transform the H∞ control problem into a more widely used two-player zero-sum game. Then, the control law is obtained by constructing the event-triggered Hamilton-Jacobi-Isaacs equation to reduce the influence of external disturbance on the stability of the system. Because it is difficult to obtain the exact solution of the HJI equation, a single-critic neural network structure is constructed based on the ADP method to approximate the solution of the HJI equation. Two simulation examples are used to verify the effectiveness of the algorithm in solving such problems.

Reviewer 2 Report

The article entitled “Event-Triggered Single-Network ADP for Zero-Sum Game of Unknown Nonlinear Systems with Constrained Input” is well-written and, from my point of view, would be of interest for the readers of Applied Sciences. In spite of these and before its publication, authors should perform the following changes:

Line 120 notation: I am not sure if it would be better to introduce it as an appendix of the manuscript. Please, think about it.

In some paragraphs are said statements like : “Combining (25) - (27)”. From my point of view, it is difficult to understand the meaning for readers, please explain more in-depht.

Line 257 in Matrix A, please put he same number of decimal position in all the numbers.

Author Response

Dear reviewer,

       Thank you very much for your precious comments and professional advice. According to your opinion, corresponding modifications have been made in the paper.

      1.   The relevant symbol definitions and abbreviations appear many times in this paper. 'Notation' (Line 121) is used to explain the interpretation of symbols and abbreviations which avoids repetition of description.

     2.    The original intention of using 'combining' (After Equation 27 and Equation 57) is to combine the previous formula to acquire the following conclusions. Now the description of these sentences has been modified.

     3. Line 252 in Matrix A has been revised to make all decimal places consistent.

Reviewer 3 Report

 

>’ where M(u) is a non-quadratic function, and R is a diagonal matrix. tanh−T (·) is an inverse

hyperbolic tangent function treated as the constrained input.’

‘is the activation function vector. ε V ( x ) is the reconstruction error of the critic NN.’

It is only a minor technical comment. The forms of the sentences should be improved.

>’ Therefore, integral reinforcement learning is introduced to relax the reliance on known information of f( x ).’ (Line 144) The sentence should be revised.

>Do you mean ‘Policy Improvement’ (the third step of the IRL)?

>’The present value function and Its derivative can be expressed as’ Please improve the sentence.

>Additional explanations related to the training constant are needed (equation 47).

>The \alpha_c parameter is higher than 1 (line 264). How was the adaptation process implemented in the simulation software? Have you used the integral elements (or direct sum of actual update and previous weight)?

>Have you tested the non-sinusoidal waveform of disturbance (line 264)?

>How were the initial values of the weights selected (section 5.1)?

>There is a minor technical problem with the text (line266).

>Please use a grid (figures).

>The details of simulation should be described (sampling time, implementation, software, etc.).

>The introduction should present more information about the relation between described theoretical considerations and real-life problems.

Author Response

Dear reviewer,

     Thank you very much for your rigorous comments and constructive suggestions. We have studied comments carefully and have made corrections. The main corrections in the paper are as flowing:

     1.  This sentence has been revised in Line 133 and Line 204.

     2. The sentence has been revised to ' Thus, we prefer to introduce the IRL technology to obtain the solution to the HJI equation without requiring the system dynamic f(x) ' (Line 144).

     3. The main goal of the optimal control problem is to find a control law to make the system asymptotically stable. Policy Improvement is one of the learning steps of reinforcement learning. The value function V (x) is obtained by solving the Bellman equation in Policy Evaluation. Then, the control laws are updated from the value function obtained in the Policy Evaluation at every iteration, and the value function and control law will be updated once until the optimal value function and optimal control law are obtained.

     4. The sentence has been revised (Before formula 41).

     5. The explanation about the square of the denominator in (After equation 47) is added.

     6. The learning rate of neural network is not higher than 1 in theory. In the process of simulation, in order to adjust the parameters to achieve ideal experimental results, the learning rate is amplified by 100 times under the condition that the formula is not wrong. The situation that the learning rate of the article is greater than 1 is caused by carelessness. In the simulation process, given the initial value of the NN weight, the initial value of the control law and the value of the relevant parameters within the learning period of the neural network (100s). The new weight is obtained during iteration, and it is updated throughout the learning period until the optimal value function and optimal control strategy are obtained. 'Integral elements' is the integral interval, and T is taken as 0.1 in the experiment.

     7. During the experiment, the external disturbance signal with the attenuation function is selected (Line 261). By the minimax optimization principle, the perturbation policy is used as a decision-maker and maximizes its value, while the control policy acts as another decision-maker and minimizes its value. The best experimental result is obtained by adjusting the level of disturbance attenuation related to external disturbance.

     8. In order to control the experimental variables and obtain the optimal value of approximate weights, the initial values of the weights are all given as 1 during the experiment.

     9. It has been modified in the paper (Line 264).

    10. Use the grid to summarize the algorithm proposed in the paper (Algorithm 1, Algorithm 2).

    11. 12. The details of the simulation are given in the paper: the sampling interval T=0.1 (Line 260), the sampling time t=100s / t=80s (In Figure). The introduction of the relationship between theoretical considerations and practical problems has been added in the article (Line 47, Line 65).

Reviewer 4 Report

The abstract should be written in a way that introduces readers to the subject, including readers that are not familiar to adaptive dynamic programming, but could use the technique to their future works. Therefore, I suggest that the abstract should be rewritten. For example, the abstract begins with the H∞ problem, without a definition for it.

Figure 1 is a descriptive figure that introduces the subject to the readers, so I suggest to be placed in the beginning of Section 2 (problem descrition).

In section 5, the authors should state how they acquired the simulation results. Also, the result figures should be more descriptive, including definition of the variables.

The authors state the limitations of the paper. However, a discussion on the potential use of the algorithm (including its limitations) and the comparison with other existing algorithms, before the conclusions section, would be beneficial.

 

Author Response

Dear reviewer,

      Thank you for your careful review and efforts in reviewing our manuscript. We have revised the manuscript accordingly. Our point-by-point responses are detailed below:

     1. We agree with your comments and revise the abstract. First, the main methods used in this paper are proposed, and the research problems are described. Convert the H∞ control problem into the two-player zero-sum differential game problem. The optimal solution of the zero-sum game is obtained by the minimax principle, that is, the Nash equilibrium point is equivalent to the solution of the HJI equation. 

     2.  Figure 1 shows the main flow chart of the event-triggered adaptive dynamic programming (ADP) algorithm. The implementation of the algorithm requires problem description and problem-solving methods to be built first. Therefore, after the introduction of the event-triggered ADP algorithm, Figure 1 is a more intuitive summary of the algorithm for readers to understand.

     3. The simulation results and related parameter definitions are described in more detail (Line 257, Line 265, Line 293).

     4. This is a very good question. Algorithm 2 is the main method used in this paper. We propose the event-triggered ADP to reduce communication load and computational cost (Line 169). In the simulation experiment, through the experimental results, the update times and sampling interval of the controller under the two strategies (time-triggered control and event-triggered control) are compared and analyzed. The modified content is shown in Line 240.

Back to TopTop