Next Article in Journal
Disturbance-Observer-Based Dual-Position Feedback Controller for Precision Control of an Industrial Robot Arm
Previous Article in Journal
Fixed-Time Incremental Neural Control for Manipulator Based on Composite Learning with Input Saturation
 
 
Article
Peer-Review Record

Robust Attitude Control of an Agile Aircraft Using Improved Q-Learning

Actuators 2022, 11(12), 374; https://doi.org/10.3390/act11120374
by Mohsen Zahmatkesh 1, Seyyed Ali Emami 1, Afshin Banazadeh 1 and Paolo Castaldi 2,*
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Actuators 2022, 11(12), 374; https://doi.org/10.3390/act11120374
Submission received: 1 November 2022 / Revised: 5 December 2022 / Accepted: 6 December 2022 / Published: 12 December 2022
(This article belongs to the Section Aircraft Actuators)

Round 1

Reviewer 1 Report

The paper, which I find particularly interesting and useful for those approaching optimal control techniques, fits the scope of Actuators MDPI Journal.

It would be interesting to test the use of such techniques by considering the change in the position of the aircraft's center of mass and the natural reduction in total mass during flight. Nevertheless, I suggest accepting the paper in its present form.

Author Response

Reviewer 1:

The paper, which I find particularly interesting and useful for those approaching optimal control techniques, fits the scope of Actuators MDPI Journal.

The authors would like to thank the reviewer for his/her review and valuable suggestions.

It would be interesting to test the use of such techniques by considering the change in the position of the aircraft's center of mass and the natural reduction in total mass during flight. Nevertheless, I suggest accepting the paper in its present form.

Thank you for your insightful comment helping to prove the robustness of our method more straightforwardly. In addition to Atmospheric disturbances, sensor measurement noises, and actuator faults, we have added another benchmark according to your comment as model parameter uncertainties. In this simulation, we have reduced and added all of the aircraft coefficients minus and plus 10 % and have simulated again with the same optimal Q-table without new training. The results were satisfactory and reported on page table 7 and figure 10.

Reviewer 2 Report

the article proposes a pitch angle controller design for a specific type of aircrat based on q-learning augmented with a procedure to generalize the learned q-table to continuous action space.

The authors point out (lines 69-89) the following four contributions of the paper:

1. choice of a truss-braced wing aircraftfor the attitude control problem

the model is supposed to be described in chapter 2 Modelling and simulation and I believe this part needs major revision.

a) please introduce reference frames, show how the body axes are directed, and describe Euler angle sequence - it is not very easy to understand even the reward shaping part of the paper without a clear picture of what goes where

b) please, add the eq (3) designations and explain how is this made to be represented in the body-frame

c) please, add the assumption of never encountering gimbal locks in Eq. 5

d) Eq 7, please provide the reference to the Dryden model

e) please, specify what do you intend to measure with the sensor (line 107). It might be a good idea to write out the complete observation model as well

f) Eq 8 it is difficult to see the reasoning owing to the fact that the designation \delta_E has not been introduced

and so on. All in all, please be careful to introduce all designations whenever you use them for the first time and formulate the control problem (with control goals, input constraints and all)

 

 

 

2) It will be demonstrated that the Q-learning performance in such a control problem depends strictly on reward function and problem definition.

This statement is inacurate. Any reinforcement learning problem greatly depends on the proper choice of the reward function. And any non-trivial control problem starts from properly defining the problem. What the authors are showing is merely a working control scheme, but I cannot agree that this general claim is justified

3) The performance of Q-learning will be evaluated in both MDP and POMDP problem frameworks.

4) the control method is examined in different flight conditions

I agree that the approach is well-described by the authors. The algorithms they use seem to be clear. However, when it comes to evaluating the performance of the algorithms, I would expect to see the design of numerical experiments, which seems to be totally absent. What are the model equations and how comes that the pitch angle becomes decoupled from the dynamics of all the other state variables? What are the initial conditions and set-points for the control problem? Are they chosen in accordance with the actual control problems that the chosen aircraft encounters? Please, be very clear about the control problems you are solving, because the paper only shows the solutions and talks of their quality. However it is hard for the reader to follow the arguments without clear understanding of the problem statements.

All in all, after the methodological corrections are duly made, I believe the paper deserves to be published.

 

 

 

 

Author Response

Reviewer 2:

the article proposes a pitch angle controller design for a specific type of aircraft based on q-learning augmented with a procedure to generalize the learned q-table to continuous action space.

We would like to thank the reviewer for his/her review and valuable comments.

The authors point out (lines 69-89) the following four contributions of the paper:

  1. choice of a truss-braced wing aircraft for the attitude control problem

the model is supposed to be described in chapter 2 Modelling and simulation and I believe this part needs major revision.

a) please introduce reference frames, show how the body axes are directed, and describe Euler angle sequence - it is not very easy to understand even the reward shaping part of the paper without a clear picture of what goes where

Thank you for your insightful comments helping our study more appropriate in aerospace applications. we have added a figure illustrating the frames of reference used in this study. Page 4, figure 2. Also, we defined them in lines 95, 96, 100, and 101.

b) please, add the eq (3) designations and explain how is this made to be represented in the body-frame

Thanks again. In this regard, we added eq 4 on page 4 and explained in detail the transformation from the stability to the body frame. Also, we added a complete explanation of transformation from the body to inertia on pages 5, Eqs. 7, and 8.

c) please, add the assumption of never encountering gimbal locks in Eq. 5

Magnificent. After eq 5 on page 4, we have explained why we have chosen the Euler angle differential equations and the assumptions in  degrees. All of the amendments are highlighted.

d) Eq 7, please provide the reference to the Dryden model

In this regard, we added “Flying qualities of piloted airplanes” and “Wind shear terms in the equation of aircraft motions” references.

e) please, specify what do you intend to measure with the sensor (line 107). It might be a good idea to write out the complete observation model as well

In this case, we amended the sentence with “In addition, the sensor noise is defined as $\pm10\%$ of sensor measurement of pitch angle.” on page 6.

f) Eq 8 it is difficult to see the reasoning owing to the fact that the designation \delta_E has not been introduced

we added a nomenclature table on the first page of the paper. Also, we attempted to define all of the variables properly according to your fruitful comment.

and so on. All in all, please be careful to introduce all designations whenever you use them for the first time and formulate the control problem (with control goals, input constraints and all)

 2) It will be demonstrated that the Q-learning performance in such a control problem depends strictly on reward function and problem definition.

This statement is inaccurate. Any reinforcement learning problem greatly depends on the proper choice of the reward function. And any non-trivial control problem starts from properly defining the problem. What the authors are showing is merely a working control scheme, but I cannot agree that this general claim is justified

We completely agree with your aforesaid comment. In this case, the abstract and introduction have been amended to clarify our goals and findings. Accordingly, we edited the abstract with “It will be proved that by defining comprehensive reward function based on dynamic behavior considerations” which means by considering pitch angle and pitch rate, this problem can be solved. In total, we have examined various reward functions with improper coefficients or not considering pitch rate where all of them were unsuccessful but these are not important for readers and not necessary to mention. All of the other amendments related to this comment are highlighted in the introduction section.

3) The performance of Q-learning will be evaluated in both MDP and POMDP problem frameworks.

4) the control method is examined in different flight conditions

I agree that the approach is well-described by the authors. The algorithms they use seem to be clear. However, when it comes to evaluating the performance of the algorithms, I would expect to see the design of numerical experiments, which seems to be totally absent. What are the model equations and how comes that the pitch angle becomes decoupled from the dynamics of all the other state variables? What are the initial conditions and set-points for the control problem? Are they chosen in accordance with the actual control problems that the chosen aircraft encounters? Please, be very clear about the control problems you are solving, because the paper only shows the solutions and talks of their quality. However it is hard for the reader to follow the arguments without clear understanding of the problem statements. All in all, after the methodological corrections are duly made, I believe the paper deserves to be published.

Thank you very much for your time. In this regard, the numerical experiments are gathered in Tables 6 and 7. We agree with you that the decoupling assumption is not real but unfortunately, we did not have any coupling coefficients, and we have mentioned this assumption in the text according to your comment. However, the poor longitudinal stability characteristic of this and Boeing N+3 airplanes was attractive for us to trace this study. The initial conditions are edited in table 5 according to your advice. The first simulations include tracking a constant pitch angle of 1 degree. Also, the POMDP and MDP training are performed with this goal. Then the trained Q-table is used for variable pitch angle tracking between -4 to +4 degrees to show its robustness and needless to gain scheduling. Because the aircraft T/O and landings are usually performed in this bound. These explanations are entered on page 13 lines 200 to 203.

Reviewer 3 Report

In this paper, Q learning is used for attitude control of a novel regional truss-braced wing aircraft. And a Fuzzy Action Assignment (FAA) method is adopted to generate continuous control commands using the trained Q-table.

 

In general, this article is interesting, but some issues need further clarification.

 

1. Deep reinforcement learning is very suitable to solve this kind of continuous control problem. Why does the author choose discrete reinforcement learning (Q learning) and FAA method? Since the maneuvering range of the aircraft is large, how to solve the dimension problem of the Q table?

2. Wind interference is considered, but it is not included in the 6-DOF modeling.

3. The author has done different comparative simulations, such as in the case of sensor noise and actuator failure. Did the author train this in different situations? Or are these factors considered in the same training environment?

Author Response

Reviewer 3:

In this paper, Q learning is used for attitude control of a novel regional truss-braced wing aircraft. And a Fuzzy Action Assignment (FAA) method is adopted to generate continuous control commands using the trained Q-table. In general, this article is interesting, but some issues need further clarification.

  1. Deep reinforcement learning is very suitable to solve this kind of continuous control problem. Why does the author choose discrete reinforcement learning (Q learning) and FAA method? Since the maneuvering range of the aircraft is large, how to solve the dimension problem of the Q table?

We would like to thank the reviewer for his/her in-depth review and inspiring comments. According to your comment, there are many pieces of research which are utilized Deep Rls. But the main challenge confronting this is complex Neural Networks architectures theoretically, and the need for powerful processors like GPU. The dimension of the Q-table is just defined as 47 * 7 * 21 in the MDP and 47 * 21 in the POMDP problem definitions. This research was able to solve the analogous problem by generating a robust optimal Q-table and computing continuous actions in the execution phase by just two equations 22, and 23. The FAA is a novel general connector which means every continuous or discrete optimal Q-tables can be an input to FAA and generate continuous actions as output.

  1. Wind interference is considered, but it is not included in the 6-DOF modeling.

Thank you very much. We have added the equations of wind on page 6 according to your comment.

  1. The author has done different comparative simulations, such as in the case of sensor noise and actuator failure. Did the author train this in different situations? Or are these factors considered in the same training environment?

Thanks a lot. In this regard, the Q-table was trained just 2 times for desired pitch angle of 1 degree according to algorithm 1 on page 9, including MDP and POMDP model definitions. Then, all of the simulations are performed using them. Even the FAA was not involved in the learning process which means the learned Q-table has proved its robustness.

Round 2

Reviewer 2 Report

I believe the revision duly answers all the comments I had to the original version. The paper is basically ready to be published. I would recommend to go over it as in the haste of working towards the deadline the authors made a few typos and grammatical mistakes (below, I give a couple of examples I spotted). Otherwise, all seems to be well.

- stability frame supersript (under eq(2)) ']^E' is used twice for both stability and inertial frames

- eq(7) - \theta should be changed to \varphi in one of the transformations

- In the phrase "This process continuous until reaching to final states" right before equation (16) 'continuous' should probably be 'continues'

- There is an unresolved figure reference in the first patagraph of the section 3.4

Author Response

We would like to thank the reviewer again for his/her thorough review and useful comments. We have corrected all the grammatical mistakes and typos mentioned by the respected reviewer in the revised manuscript. 

Reviewer 3 Report

  • All my issues have been solved.

Author Response

Thank you very much for your time and useful comments. 

Back to TopTop