Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Adaptive Control of Quadrotors in Uncertain Environments

Eng 2024, 5(2), 544-561; https://doi.org/10.3390/eng5020030

by Daniel Leitão^1,†

, Rita Cunha^2,†

and João M. Lemos^3,*,†

Reviewer 1: Anonymous

Reviewer 2:

Gregory Provan

Eng 2024, 5(2), 544-561; https://doi.org/10.3390/eng5020030

Submission received: 7 March 2024 / Revised: 22 March 2024 / Accepted: 26 March 2024 / Published: 28 March 2024

Round 1

Reviewer 1 Report (Previous Reviewer 1)

Comments and Suggestions for Authors

This version is well-revised and it can be accepted now.

Comments on the Quality of English Language

Some minor typos still scatter in the revision, and please correct them.

Author Response

Reviewer's comment: Some minor typos still scatter in the revision, and please correct them.

Authors’ answer: The manuscript has been passed by spell check software. In addition, the text has been carefully reviewed to improve language.

Reviewer 2 Report (Previous Reviewer 2)

Comments and Suggestions for Authors

Thanks to the authors for addressing concerns from the reviews.

We want to point out the not all issues have been resolved.

The issue of real-time and the clarification of experimental validation still has not been addressed properly. I don't view the paper as publishable unless thes issues are made clear.

REAL-TIME

RL typically performs updates of the control after multiple sampling periods---this is stated in the paper on line 223:

line 223: "where the estimated Q-function gains W_j+1 are updated at every time step k,and the policy h(x_k) is updated after a fixed number of timesteps."

The authors note: A task (say “Estimation”) is said to be done in real time if it can be accomplished in a time smaller of equal to the sampling interval.

Rl updates are NOT made faster than the sampling rate.

You note: The sampling period selected throughout all experiences is h=0.01.

What is the relation between h and the time step? This is never described in the article.

Overloading of symbol h; it is also used for control mapping:

u_k=h(x_k) (eq. 16) consisting of a map h(.)

The claim that " learning amounts to least squares estimation of a linear regressor, that can be done in real time" may be true, but acquiring the data on which to learn takes multiple timesteps, as is stated in the article.

EXPERIMENTS

I find the experiments hard to view as being accurate.

(a) Due to the issue of "real-time" learning not being possible except after multiple sampling intervals, it cannot be that RL-based adaptation occurs as stated.

(b) RL is known for slow convergence, yet the authors state (line 312): "Despite some oscillations, the convergence is quick. This happens due to the selection of high variance for the dither noise."

What do the time numbers on Fig. 4 mean? Is it millisec? How does this relate to sampling rate?

Please clarify how dither noise affects learning rate in a precise way.

Author Response

Reviewer's comment

The issue of real-time and the clarification of experimental validation still has not been addressed properly. I don't view the paper as publishable unless these issues are made clear.

REAL-TIME

RL typically performs updates of the control after multiple sampling periods---this is stated in the paper on line 223:

line 223: "where the estimated Q-function gains W_j+1 are updated at every time step k,and the policy h(x_k) is updated after a fixed number of timesteps."

The authors note: A task (say “Estimation”) is said to be done in real time if it can be accomplished in a time smaller of equal to the sampling interval.

Rl updates are NOT made faster than the sampling rate.

Authors’ answer

In adaptive control, the time scale of the action update, i.e., of the value of the manipulated variable, must be slower than the one of learning (in this case, learning the parameters that define the control action), on order to decouple the dynamics of learning from plant dynamics. In order to enhance this feature, although the RLS parameters of the Q-function approximation are updated at every sampling rate (as well as the control action), the controller gains, that define the control action, are only recomputed at a lower rate (every M steps). This explanation has been included in the text.

Reviewer's comment

You note: The sampling period selected throughout all experiences is h=0.01.

What is the relation between h and the time step? This is never described in the article.

Authors’ answer

The time step of executing algorithm 1 is equal to the beginning of each sampling time. This explanation has been included in the text.

Reviewer's comment

Overloading of symbol h; it is also used for control mapping:

u_k=h(x_k) (eq. 16) consisting of a map h(.)

Authors’ answer

This issue has been corrected. The control mapping is now denoted by \Pi.

Reviewer's comment

Authors’ answer

Since a recursive version of least squares is used, the estimates are updated at each sampling time. This explanation has been added to the text.

Reviewer's comment

EXPERIMENTS

I find the experiments hard to view as being accurate.

(a) Due to the issue of "real-time" learning not being possible except after multiple sampling intervals, it cannot be that RL-based adaptation occurs as stated.

Authors’ answer

As already menstioned above, although the RLS parameters of the Q-function approximation are updated at every sampling rate (as well as the control action), the controller gains, that define the control action, are only recomputed at a lower rate (every M steps). This explanation has been included in the text, as mentioned above.

Reviewer's comment

(b) RL is known for slow convergence, yet the authors state (line 312): "Despite some oscillations, the convergence is quick. This happens due to the selection of high variance for the dither noise."

Authors answer

Usually, control action decisions based on RL are obtained by training neural networks, that requires very large amounts of plant input/output data and, therefore, take a long time before convergence. However, in the approach followed in this article, the approximation of the Q-function does not rely on neural network training but instead on recursive least-squares that have a fast convergence rate. Again, this feature is rendered possible by the class of control problems (linear-quadratic) considered. This explanation has been included in the text. Reviewer's comment What do the time numbers on Fig. 4 mean? Is it millisec? How does this relate to sampling rate?

Authors' answer

The symbol t denotes discrete time. Hence, the scale is the number of samples and continuous time, elapsed since the beginning of the simulation, is obtained by multiplying by the sampling period h. This explanation has been added to the text.

Reviewer's comment Please clarify how dither noise affects learning rate in a precise way.

Authors' answer

The selection of the dither noise power to injected must solve a dual problem. Indeed, the solution of the control problem requires a dither noise power as small as possible (ideally, zero), while the solution of the estimation problem requires a high value of the dither noise variance. The exact solution of this problem of finding the dither noise power value that fits the best compromise can be found by using multi-objective optimization, but is computationally very heavy. Good approximations, such as the one proposed in [1], are available for predictive adaptive controllers. A possibility is then to try to adapt this approach to RL adaptive control, but a much more complicated algorithm is expected to obtain. Although promising as future work, such a research track is outside the scope of the present work. Instead, in this article, the approach followed was to adjust the dither noise power by trial and error in order to obtain the best results. This explanation has been added at the end of the Conclusions section.

[1] da Silva, R.N.; Filatov, N.; Lemos, J.; Unbehauen, H. A dual approach to start-up of an adaptive predictive controller. IEEE Transactions on Control Systems Technology 2005, 13, 877–883.

Round 2

Reviewer 2 Report (Previous Reviewer 2)

Comments and Suggestions for Authors

The authors have responded to my issues, so I am happy to recommend the paper for publication.

I have no substantive technical comments to make on this version.

Comments on the Quality of English Language

One minor issue: the sentence on line 398 needs to be revise as it is ungrammatical.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper addresses the tracking control problem where uncertainties are simultaneously considered. The contributions and motivations of this paper should be highlighted and strengthened. The concerns are as follows:

In section 1.2, please give more description about the outer loop controller, for example, the structure..
As stated in section 2, the dynamics of the quadrotor is not used in the control design, but only in simulation. Please compact this section and make it clear. For example, the parameter values of dynamics can be given in Simulation results rather than in the body of the manuscript.
There should include an assumption that some constraints are imposed on the attitude angle since in the case that the pitch angle is 90 degree, there would be singularity problem of the control design.
According to the topic of this paper, the authors need to supplement some recent papers related to the technique issues of the paper for completeness: Modeling and control of a tilting quadcopter; Saturation-Tolerant Prescribed Control for Nonlinear Systems With Unknown Control Directions and External Disturbances
Please describe the selection of the parameters of the control parameters.
In figure. 4 and 5, the legend should be added.

Reviewer 2 Report

Comments and Suggestions for Authors

Goals of the article are confusing. The authors state "model free approach yields an algorithm that can run in real time. The algorithm is first introduced and applied to a baseline quadrotor controller to improve its performance"

what does real-time performance have to do with performance improvements? The notion of real-time is never defined.

What does it mean to learn in real time? Reinforcement learning is never typically associated with real-time due to its sample complexity.

In section 6 the authors claim: "The development of a reinforcement learning (RL) based adaptive controller for a quadrotor, that includes an adapted version of the Q-learning policy iteration algorithm for linear-quadratic problems explored in [6], was well succeeded, thus completing the main objective of this work."

Here, the contributions stated as being achieved are different to those earlier in the paper. This leads to some fundamental questions about what the article is really about, which is unclear after reading the article multiple times. The authors need to clarify the paper with a significant revision.

Further, the experiments don't show very much. As noted later in the review, this section is also confusing, with no goals stated, and nothing being clarified in terms of (1) precise technical claims and (2) empirical validation of those claims.

The related work is inadequate. For example, there are many papers on quadcopter control via RL not cited:

Koch, W., Mancuso, R., West, R., & Bestavros, A. (2019). Reinforcement learning for UAV attitude control. ACM Transactions on Cyber-Physical Systems, 3(2), 1-21.

Deshpande, A. M., Minai, A. A., & Kumar, M. (2021). Robust deep reinforcement learning for quadcopter control. IFAC-PapersOnLine, 54(20), 90-95.

Deshpande, A. M., Kumar, R., Minai, A. A., & Kumar, M. (2020, October). Developmental reinforcement learning of control policy of a quadcopter UAV with thrust vectoring rotors. In Dynamic Systems and Control Conference (Vol. 84287, p. V002T36A011). American Society of Mechanical Engineers.

Also, work on real-time RL:

Koh, S., Zhou, B., Fang, H., Yang, P., Yang, Z., Yang, Q., ... & Ji, Z. (2020). Real-time deep reinforcement learning based vehicle navigation. Applied Soft Computing, 96, 106694.

Ramstedt, S., & Pal, C. (2019). Real-time reinforcement learning. Advances in neural information processing systems, 32.

Further notes:

======================

Cascasde control structure is not just used in backstepping, but for many other methods, e.g., PID. PID control may be the most common.

Sec 5:

the structuring of this section is odd. What are the goals of the experiments? It is so unstructured that it feels as if the experiments are chosen randomly.

Exp 1.: why choose these parameters? Just to show how good your solution is?

Very odd to just select some "random" parameters.

Why use a trajectory in the form of a lemniscate?

What are the red and blue curves in Fig. 4?

What is the purpose of the final paragraph: "It is preferable to have a single learning agent acting at a given time."

This is just confusing and disconnected from the article.

Reviewer 3 Report

Comments and Suggestions for Authors

The paper seems like a simple implementation of RL method to quadrotors, which is not consistent with the title "adaptive control". In addition, I think the technical contribution is minor (or questionable), as the technical novelty is quite superficial.

The paper can be shortened by reducing the many well-known contents on RL introduction, quadrotor modelling and controller design.

The LQR control seems not so closely related to the main results, which can be reduced and even removed.

For adaptive control, one often designs adaptive controllers or estimators to deal with unknown parameters or uncertain system dynamics. The paper does not contribute any new idea to this direction.

Based on the above evaluations, I think the paper should be rejected.

Article Menu

Adaptive Control of Quadrotors in Uncertain Environments

Further Information

Guidelines

MDPI Initiatives

Follow MDPI