Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method
Abstract
:1. Introduction
- In reference [25,28], they studied the LQT problem of linear systems, but did not consider the problems of stochastic disturbance. Due to stochastic disturbances that may occur error in state and output information, the standard Q-learning algorithm cannot be used directly.We proposed the AOPQ algorithm to overcome this problem.
- In reference [29], the authors investigated the set-point tracking problem in linear systems with disturbance when system models are unknown. However, this tracking signal is limited and can only track constant values and the proposed algorithm requires the assumption of an initial stable control. The tracking signal studied in this paper does not have this limitation and provides a data-driven method for providing an initial stable control.
- In contrast to reference [30], this paper applies off- policy Q-learning to solve the LQR problem of unknown discrete systems. It provides a data-based method for designing an initial stable control, which can obtain an initial stable controller using a pole placement strategy, with the system coefficient matrix constructed from data. However, this does not take into account the situation when the system output is tracking an external reference. For LQT problems with external stochastic disturbances, this paper solves the LQT problem of a linear discrete system using an average off-policy Q-learning algorithm.
2. LQT Problem with Stochastic Disturbance
Problem Description
Algorithm 1 PI Algorithm with System Model [2] |
|
3. Solving Stochastic LQT Problems with Unknown System
3.1. Model Free Average Off-Policy Q-Learning Algorithm
3.2. Data-Driven Average Off-Policy Q-Learning Algorithm
Algorithm 2 Average Off-policy Q-learning Algorithm |
|
3.3. Convergence Analysis of Algorithm 2
4. Simulation Experiment
4.1. Example 1
4.2. Example 2
4.3. Comparison Simulation Experiment
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Rizvi, S.A.A.; Lin, Z. Output feedback Q-learning control for the discrete-time linear quadratic regulator problem. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 1523–1536. [Google Scholar] [CrossRef]
- Hewer, G. An iterative technique for the computation of the steady state gains for the discrete optimal regulator. IEEE Trans. Autom. Control 1971, 16, 382–384. [Google Scholar] [CrossRef]
- Li, X.; Xue, L.; Sun, C. Linear quadratic tracking control of unknown discrete-time systems using value iteration algorithm. Neurocomputing 2018, 314, 86–93. [Google Scholar] [CrossRef]
- Jiang, Y.; Jiang, Z.P. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 2012, 48, 2699–2704. [Google Scholar] [CrossRef]
- Modares, H.; Lewis, F.L.; Jiang, Z.P. Optimal output-feedback control of unknown continuous-time linear systems using off-policy reinforcement learning. IEEE Trans. Cybern. 2016, 46, 2401–2410. [Google Scholar] [CrossRef]
- Luo, B.; Wu, H.N.; Huang, T.; Liu, D. Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica 2014, 50, 3281–3290. [Google Scholar] [CrossRef]
- Lee, J.Y.; Park, J.B.; Choi, Y.H. On integral generalized policy iteration for continuous-time linear quadratic regulations. Automatica 2014, 50, 475–489. [Google Scholar] [CrossRef]
- Vrabie, D.; Pastravanu, O.; Abu-Khalaf, M.; Lewis, F. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 2009, 45, 477–484. [Google Scholar] [CrossRef]
- Sutton, R.; Barto, A. Reinforcement Learning: An Introduction; The MIT Press: London, UK, 1998. [Google Scholar]
- Song, R.; Lewis, F.L.; Wei, Q.; Zhang, H. Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems with Disturbances. IEEE Trans. Cybern. 2016, 46, 1041–1050. [Google Scholar] [CrossRef]
- Lewis, F.L.; Liu, D. Robust Adaptive Dynamic Programming. In Reinforcement Learning and Approximate Dynamic Programming for Feedback Control; John Wiley & Sons: Hoboken, NJ, USA, 2013; pp. 281–302. [Google Scholar] [CrossRef]
- Wonham, W.M. Optimal stationary control of a linear system with state-dependent noise. SIAM J. Control 1967, 5, 486–500. [Google Scholar] [CrossRef]
- Jiang, Y.; Jiang, Z.P. Approximate dynamic programming for optimal stationary control with control-dependent noise. IEEE Trans. Neural Netw. 2011, 22, 2392–2398. [Google Scholar] [CrossRef] [PubMed]
- Bian, T.; Jiang, Y.; Jiang, Z.P. Adaptive dynamic programming for stochastic systems with state and control dependent noise. IEEE Trans. Autom. Control 2016, 61, 4170–4175. [Google Scholar] [CrossRef]
- Pang, B.; Jiang, Z.P. Reinforcement learning for adaptive optimal stationary control of linear stochastic systems. IEEE Trans. Autom. Control 2023, 68, 2383–2390. [Google Scholar] [CrossRef]
- Tsitsiklis, J.N.; Van Roy, B. Average cost temporal-difference learning. Automatica 1999, 35, 1799–1808. [Google Scholar] [CrossRef]
- Adib Yaghmaie, F.; Gunnarsson, S.; Lewis, F.L. Output regulation of unknown linear systems using average cost reinforcement learning. Automatica 2019, 110, 108549. [Google Scholar] [CrossRef]
- Yaghmaie, F.A.; Gustafsson, F.; Ljung, L. Linear quadratic control using model-free reinforcement learning. IEEE Trans. Autom. Control 2023, 68, 737–752. [Google Scholar] [CrossRef]
- Yaghmaie, F.A.; Gustafsson, F. Using Reinforcement learning for model-free linear quadratic control with process and measurement noises. In Proceedings of the 2019 IEEE 58th Conference on Decision and Control (CDC), Nice, France, 11–13 December 2019. [Google Scholar]
- Rami, M.A.; Chen, X.; Zhou, X.Y. Discrete-time Indefinite LQ Control with State and Control Dependent Noises. J. Glob. Optim. 2002, 23, 245–265. [Google Scholar] [CrossRef]
- Ni, Y.H.; Elliott, R.; Li, X. Discrete-time mean-field Stochastic linear-quadratic optimal control problems, II: Infinite horizon case. Automatica 2015, 57, 65–77. [Google Scholar] [CrossRef]
- Chen, S.; Yong, J. Stochastic Linear Quadratic Optimal Control Problems. Appl. Math. Optim. 2001, 43, 21–45. [Google Scholar] [CrossRef]
- Rami, M.; Zhou, X.Y. Linear matrix inequalities, Riccati equations, and indefinite stochastic linear quadratic controls. IEEE Trans. Autom. Control 2000, 45, 1131–1143. [Google Scholar] [CrossRef]
- Liu, X.; Li, Y.; Zhang, W. Stochastic linear quadratic optimal control with constraint for discrete-time systems. Appl. Math. Comput. 2014, 228, 264–270. [Google Scholar] [CrossRef]
- Kiumarsi, B.; Lewis, F.L.; Modares, H.; Karimpour, A.; Naghibi-Sistani, M.B. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 2014, 50, 1167–1175. [Google Scholar] [CrossRef]
- Sharma, S.K.; Jha, S.K.; Dhawan, A.; Tiwari, M. Q-learning based adaptive optimal control for linear quadratic tracking problem. Int. J. Control. Autom. Syst. 2023, 21, 2718–2725. [Google Scholar] [CrossRef]
- Liu, X.; Zhang, L.; Peng, Y. Off-policy Q-learning-based tracking control for stochastic linear discrete-time systems. In Proceedings of the 2022 4th International Conference on Control and Robotics, ICCR 2022, Guangzhou, China, 2–4 December 2022; pp. 252–256. [Google Scholar]
- Modares, H.; Lewis, F.L. Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans. Autom. Control 2014, 59, 3051–3056. [Google Scholar] [CrossRef]
- Zhao, J.; Yang, C.; Gao, W.; Zhou, L. Reinforcement learning and optimal setpoint tracking control of linear systems with external disturbances. IEEE Trans. Ind. Inform. 2022, 18, 7770–7779. [Google Scholar] [CrossRef]
- Lopez, V.G.; Alsalti, M.; Müller, M.A. Efficient off-policy Q-learning for data-based discrete-time LQR problems. IEEE Trans. Autom. Control 2023, 68, 2922–2933. [Google Scholar] [CrossRef]
- Zhang, W.; Chen, B.S. On stabilizability and exact observability of stochastic systems with their applications. Automatica 2004, 40, 87–94. [Google Scholar] [CrossRef]
- Thompson, M.; Freedman, H.I. Deterministic mathematical models in population ecology. Am. Math. Mon. 1982, 89, 798. [Google Scholar] [CrossRef]
- Koning, W.L.D. Optimal estimation of linear discrete-time systems with stochastic parameters. Automatica 1984, 20, 113–115. [Google Scholar] [CrossRef]
- Gao, J. Machine learning applications for data center optimization. Google White Pap. 2014, 21, 1–13. [Google Scholar]
- Yu, H.; Bertsekas, D.P. Convergence results for some temporal difference methods based on least squares. IEEE Trans. Autom. Control 2009, 54, 1515–1531. [Google Scholar] [CrossRef]
- Lamperti, J. Stochastic Processes: A Survey of the Mathematical Theory. J. Am. Stat. Assoc. 1979, 74, 970–974. [Google Scholar]
- Al-Tamimi, A.; Lewis, F.L.; Abu-Khalaf, M. Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control. Automatica 2007, 43, 473–481. [Google Scholar] [CrossRef]
- Willems, J.C.; Rapisarda, P.; Markovsky, I.; De Moor, B.L. A note on persistency of excitation. Syst. Control Lett. 2005, 54, 325–329. [Google Scholar] [CrossRef]
- Luenberger, D. Canonical forms for linear multivariable systems. Autom. Control IEEE Trans. 1967, 12, 290–293. [Google Scholar] [CrossRef]
- Jiang, Y.; Fan, J.; Chai, T.; Lewis, F.L.; Li, J. Tracking control for linear discrete-time networked control systems with unknown dynamics and dropout. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 4607–4620. [Google Scholar] [CrossRef]
- Prashanth, L.A.; Korda, N.; Munos, R. Fast LSTD using stochastic approximation: Finite time analysis and application to traffic control. In Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2014, Nancy, France, 15–19 September 2014; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
50 ≤ k ≤ 200 | IAE | MSE | Iteration Time |
---|---|---|---|
Algorithm 2 | 0.63 | 0.74 | 30 |
Compared approach | 0.81 | 0.78 | 20 |
The Number of Parameters | Complexity | |
---|---|---|
Algorithm 2 | ||
Compared approach |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hao, L.; Wang, C.; Shi, Y. Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method. Mathematics 2024, 12, 1533. https://doi.org/10.3390/math12101533
Hao L, Wang C, Shi Y. Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method. Mathematics. 2024; 12(10):1533. https://doi.org/10.3390/math12101533
Chicago/Turabian StyleHao, Longyan, Chaoli Wang, and Yibo Shi. 2024. "Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method" Mathematics 12, no. 10: 1533. https://doi.org/10.3390/math12101533
APA StyleHao, L., Wang, C., & Shi, Y. (2024). Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method. Mathematics, 12(10), 1533. https://doi.org/10.3390/math12101533