On Model Identification Based Optimal Control and It’s Applications to Multi-Agent Learning and Control
Abstract
:1. Introduction
2. Adaptive Dynamic Programming-Based Optimal Control Method
2.1. Basic Structures of ADP
2.2. Developments of ADP-Based Optimal Control
2.3. ADP-Based Approximate Solution to HJB Equations
3. Model Identification
3.1. Parametric Model Identification Method
3.2. Non-Parametric Model Identification Method
4. Model Identification-Based Optimal Control for SAS
5. Model Identification-Based Optimal Control for MASs
6. Conclusions and Future Work
- In fact, the model identification-based ADP method is mainly focused on the design of a single controller currently, but not so much on the design of multiple controllers. It will be a very beneficial work to use the model identification-based ADP method to realize the distributed coordinated control of MASs.
- Most of the existing model identification-based ADP methods need to satisfy the PE condition. However, PE conditions are difficult to verify in practical applications. How to design a novel identification-based ADP method such that the PE condition is easier to be checked and remain low pressure [82].
- For more complex MASs such as power grids and transportation, where their accurate models cannot be obtained, the model identification-based ADP method may be used to solve large-scale practical optimization problems, which have important practical applications.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hu, J.; Liu, Z.; Wang, J.; Wang, L.; Hu, X. Estimation, intervention and interaction of multi-agent systems. Acta Autom. Sin. 2013, 39, 1796–1804. [Google Scholar] [CrossRef]
- Ji, Y.; Wang, G.; Li, Q.; Wang, C. Event-triggered optimal consensus of heterogeneous nonlinear multi-agent systems. Mathematics 2022, 10, 4622. [Google Scholar] [CrossRef]
- Hu, J. Second-order event-triggered multi-agent consensus control. In Proceedings of the 31th Chinese Control Conference, Hefei, China, 25–27 July 2012; pp. 6339–6344. [Google Scholar]
- Hu, J.; Feng, G. Quantized tracking control for a multi-agent system with high-order leader dynamics. Asian J. Control 2011, 13, 988–997. [Google Scholar] [CrossRef]
- Wang, Q.; Hu, J.; Wu, Y.; Zhao, Y. Output synchronization of wide-area heterogeneous multi-agent systems over intermittent clustered networks. Inf. Sci. 2023, 619, 263–275. [Google Scholar] [CrossRef]
- Chen, B.; Hu, J.; Zhao, Y.; Ghosh, B.K. Finite-time velocity-free rendezvous control of multiple AUV systems with intermittent communication. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 6618–6629. [Google Scholar] [CrossRef]
- Peng, Y.; Zhao, Y.; Hu, J. On the role of community structure in evolution of opinion formation: A new bounded confidence opinion dynamics. Inf. Sci. 2023, 621, 672–690. [Google Scholar] [CrossRef]
- Murray, J.J.; Cox, C.J.; Lendaris, G.G.; Saeks, R. Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern. Syst. 2002, 32, 140–153. [Google Scholar] [CrossRef]
- Wang, F.Y.; Zhang, H.; Liu, D. Adaptive dynamic programming: An introduction. IEEE Comput. Intell. Mag. 2009, 4, 39–47. [Google Scholar] [CrossRef]
- Wu, Y.; Liang, Q.; Hu, J. Optimal output regulation for general linear systems via adaptive dynamic programming. IEEEE Trans. Cybern. 2022, 52, 11916–11926. [Google Scholar] [CrossRef] [PubMed]
- Werbos, P. Approximate Dynamic Programming for Realtime Control and Neural Modelling; White, D.A., Sofge, D.A., Eds.; Van Nostrand: New York, NY, USA, 1992. [Google Scholar]
- Bertsekas, D.P. Dynamic Programming and Optimal Control; Athena Scientific: Belmont, MA, USA, 1995. [Google Scholar]
- Prokhorov, D.V.; Wunsch, D.C. Adaptive critic designs. IEEE Trans. Neural Netw. 1997, 8, 997–1007. [Google Scholar] [CrossRef]
- Bellman, R. Dynamic programming. Science 1966, 153, 34–37. [Google Scholar] [CrossRef]
- Werbos, P. Advanced forecasting methods for global crisis warning and models of intelligence. Gen. Syst. Yearb. 1977, 22, 25–38. [Google Scholar]
- Zhang, H.-G.; Zhang, X.; Luo, Y.-H.; Yang, J. An overview of research on adaptive dynamic programming. Acta Autom. Sin. 2013, 39, 303–311. [Google Scholar] [CrossRef]
- Lewis, F.L.; Vrabie, D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 2009, 9, 32–50. [Google Scholar] [CrossRef]
- Abu Khalaf, M.; Lewis, F.L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network hjb approach. Automatica 2005, 41, 779–791. [Google Scholar] [CrossRef]
- Vrabie, D.; Lewis, F.L. Neural network approach to continuous time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. 2009, 22, 237–246. [Google Scholar] [CrossRef]
- Modares, H.; Lewis, F.L. Optimal tracking control of nonlinear partially unknown constrained input systems using integral reinforcement learning. Automatica 2014, 50, 1780–1792. [Google Scholar] [CrossRef]
- Vamvoudakis, K.G.; Lewis, F.L. Online actor-critic algorithm to solve the continuous time infinite horizon optimal control problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
- Zhang, H.; Wei, Q.; Luo, Y. A novel infinite time optimal tracking control scheme for a class of discrete time nonlinear systems via the greedy hdp iteration algorithm. IEEE Trans. Syst. Man Cybern. Syst. Part B (Cybernetics) 2008, 38, 937–942. [Google Scholar] [CrossRef]
- Al Tamimi, A.; Lewis, F.L.; Abu Khalaf, M. Discrete time nonlinear hjb solution using approximate dynamic programming: Convergence proof. IEEE Trans. Syst. Man Cybern. Syst. Part B (Cybernetics) 2008, 38, 943–949. [Google Scholar] [CrossRef]
- Liu, D.; Wang, D.; Zhao, D.; Wei, Q.; Jin, N. Neural network based optimal control for a class of unknown discrete time nonlinear systems using globalized dual heuristic programming. IEEE Trans. Autom. Sci. Eng. 2012, 9, 628–634. [Google Scholar] [CrossRef]
- Liu, D.; Wei, Q. Policy iteration adaptive dynamic programming algorithm for discrete time non linear systems. IEEE Trans. Neural Netw. Learn. Syst. 2013, 25, 621–634. [Google Scholar] [CrossRef] [PubMed]
- Kiumarsi, B.; Vamvoudakis, K.G.; Modares, H.; Lewis, F.L. Optimal and autonomous control using reinforcement learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 2042–2062. [Google Scholar] [CrossRef] [PubMed]
- Hou, Z.S.; Wang, Z. From model based control to data driven control: Survey, classification and perspective. Inf. Sci. 2013, 235, 3–35. [Google Scholar] [CrossRef]
- Peng, Z.; Luo, R.; Hu, J.; Shi, K.; Nguang, S.K.; Ghosh, B.K. Optimal tracking control of nonlinear multiagent systems using internal reinforce Q-learning. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 4043–4055. [Google Scholar] [CrossRef] [PubMed]
- Peng, Z.; Zhao, Y.; Hu, J.; Ghosh, B.K. Data-driven optimal tracking control of discrete-time multi-agent systems with two-stage policy iteration algorithm. Inf. Sci. 2019, 481, 189–202. [Google Scholar] [CrossRef]
- Peng, Z.; Zhao, Y.; Hu, J.; Luo, R.; Ghosh, B.K.; Nguang, S.K. Input-output data-based output antisynchronization control of multi-agent systems using reinforcement learning approach. IEEE Trans. Ind. Inform. 2021, 17, 7359–7367. [Google Scholar] [CrossRef]
- Modares, H.; Lewis, F.L.; Naghibi-Sistani, M.B. Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2013, 24, 1513–1525. [Google Scholar] [CrossRef]
- Ding, F.; Wang, F.F. Recursive least squares identification algorithms for linear-in-parameter systems with missing data. Control Decis. 2016, 31, 2261–2266. [Google Scholar]
- Ding, F.; Wang, F.F.; Xu, L.; Wu, M.H. Decomposition based least squares iterative identification algorithm for multivariate pseudo-linear ARMA systems using the data filtering. J. Franklin Inst. 2017, 354, 1321–1339. [Google Scholar] [CrossRef]
- Elisei-Iliescu, C.; Stanciu, C.; Paleologu, C.; Benesty, J.; Anghel, C.; Ciochina, S. Efficient recursive least-squares algorithms for the identification of bilinear forms. Digit. Signal Process 2018, 83, 280–296. [Google Scholar] [CrossRef]
- Huang, W.; Ding, F.; Hayat, T.; Alsaedi, A. Coupled stochastic gradient identification algorithms for multivariate output-error systems using the auxiliary model. Int. J. Control Autom. 2017, 15, 1622–1631. [Google Scholar] [CrossRef]
- Ding, F.; Xu, L.; Meng, D.; Jin, X.-B.; Alsaedi, A.; Hayat, T. Gradient estimation algorithms for the parameter identification of bilinear systems using the auxiliary model. J. Comput. Appl. Math. 2020, 369, 112575. [Google Scholar] [CrossRef]
- Åström, K.J.; Wittenmark, B. Adaptive Control; Courier Corporation: Mineola, NY, USA, 2013. [Google Scholar]
- Hu, J.; Hu, X. Optimal target trajectory estimation and filtering using networked sensors. In Proceedings of the 27th Chinese Control Conference, Kunming, China, 16–18 July 2008; pp. 540–545. [Google Scholar]
- Lion, P.M. Rapid identification of linear and nonlinear systems. AIAA J. 1967, 5, 1835–1842. [Google Scholar] [CrossRef]
- Kreisselmeier, G. Adaptive observers with exponential rate of convergence. IEEE Trans. Autom. Control 1977, 22, 2–8. [Google Scholar] [CrossRef]
- Duarte, M.A.; Narendra, K.S. Combined direct and indirect approach to adaptive control. IEEE Trans. Autom. Control 1989, 34, 1071–1075. [Google Scholar] [CrossRef]
- Slotine, J.E.; Li, W. Composite adaptive control of robot manipulators. Automatica 1989, 25, 509–519. [Google Scholar] [CrossRef]
- Panteley, E.; Ortega, R.; Moya, P. Overcoming the detectability obstacle in certainty equivalence adaptive control. Automatica 2002, 38, 1125–1132. [Google Scholar] [CrossRef]
- Lavretsky, E. Combined composite model reference adaptive control. IEEE Trans. Autom. Control 2009, 54, 2692–2697. [Google Scholar] [CrossRef]
- Chowdhary, G.; Yucelen, T.; Muhlegg, M.; Johnson, E. Concurrent learning adaptive control of linear systems with exponentially convergent bounds. Int. J. Adapt. Control Signal Process 2013, 27, 280–301. [Google Scholar] [CrossRef]
- Cho, N.; Shin, H.; Kim, Y.; Tsourdos, A. Composite MRAC with parameter convergence under finite excitation. IEEE Trans. Autom. Control 2018, 63, 811–818. [Google Scholar] [CrossRef]
- Roy, S.; Bhasin, S.; Kar, I. A UGES switched MRAC architecture using initial excitation. In Proceedings of the 2017 20th IFAC World Congress, Toulouse, France, 9–14 July 2017; pp. 7044–7051. [Google Scholar]
- Krause, J.; Khargonekar, P. Parameter information content of measurable signals in direct adaptive control. IEEE Trans. Autom. Control 1987, 32, 802–810. [Google Scholar] [CrossRef]
- Ortega, R. An on-line least-squares parameter estimator with finite convergence time. IEEE Inst. Electr. Electron. Eng. 1988, 76, 847–848. [Google Scholar] [CrossRef]
- Roy, S.; Bhasin, S.; Kar, I. Combined MRAC for unknown MIMO LTI systems with parameter convergence. IEEE Trans. Autom. Control 2018, 63, 283–290. [Google Scholar] [CrossRef]
- Adetola, V.; Guay, M. Finite-time parameter estimation in adaptive control of nonlinear systems. IEEE Trans. Autom. Control 2008, 53, 807–811. [Google Scholar] [CrossRef]
- Aranovskiy, S.; Bobtsov, A.; Ortega, R.; Pyrkin, A. Performance enhancement of parameter estimator via dynamic regressor extension and mixing. IEEE Trans. Autom. Control 2017, 62, 3546–3550. [Google Scholar] [CrossRef]
- Panuska, V.; Rogers, A.E.; Steiglitz, K. On the maximum likelihood estimation of rational pulse transfer-function parameters. IEEE Trans. Autom. Control 1968, 13, 304–305. [Google Scholar] [CrossRef]
- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Series B Stat. Methodol. 1977, 39, 1–22. [Google Scholar]
- Sammaknejad, N.; Zhao, Y.; Huang, B. A review of the expectation maximization algorithm in data-driven process identification. J. Process Control 2019, 73, 123–136. [Google Scholar] [CrossRef]
- Yang, X.; Liu, X.; Han, B. LPV model identification with an unknown scheduling variable in the presence of missing observations—A robust global approach. IET Control Theory Appl. 2018, 12, 1465–1473. [Google Scholar] [CrossRef]
- Wang, D.; Zhang, S.; Gan, M.; Qiu, J. A novel EM identification method for Hammerstein systems with missing output data. Trans. Ind. Inform. 2019, 16, 2500–2508. [Google Scholar] [CrossRef]
- Coban, R. A context layered locally recurrent neural network for dynamic system identification. Eng. Appl. Artif. Intell. 2013, 26, 241–250. [Google Scholar] [CrossRef]
- Nguyen, S.N.; Ho-Huu, V.A.; Ho, P.H. A neural differential evolution identification approach to nonlinear systems and modelling of shape memory alloy actuator. Asian J. Control 2018, 20, 57–70. [Google Scholar] [CrossRef]
- Aguilar, C.J.Z.; Gómez-Aguilar, J.F.; Alvarado-Martínez, V.M.; Romero-Ugalde, H.M. Fractional order neural networks for system identification. Chaos Solitons Fractals 2020, 130, 109444. [Google Scholar] [CrossRef]
- Li, H.; Zhang, L. A bilevel learning model and algorithm for self-organizing feed-forward neural networks for pattern classification. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4901–4915. [Google Scholar] [CrossRef]
- Singh, U.P.; Jain, S.; Tiwari, A.; Singh, R.K. Gradient evolution-based counter propagation network for approximation of noncanonical system. Soft Comput. 2019, 23, 4955–4967. [Google Scholar] [CrossRef]
- Qiao, J.F.; Han, H.G. Identification and modeling of nonlinear dynamical systems using a novel self-organizing RBF-based approach. Automatica 2012, 48, 1729–1734. [Google Scholar] [CrossRef]
- Slimani, A.; Errachdi, A.; Benrejeb, M. Genetic algorithm for RBF multi-model optimization for nonlinear system identification. In Proceedings of the IEEE International Conference on Control, Automation and Diagnosis, Grenoble, France, 2–4 July 2019; pp. 2–4. [Google Scholar]
- Errachdi, A.; Benrejeb, M. Online identification using radial basis function neural network coupled with KPCA. Int. J. Gen. Syst. 2017, 46, 52–65. [Google Scholar] [CrossRef]
- Han, H.G.; Lu, W.; Hou, Y.; Qiao, J.-F. An adaptive-PSO-based self-organizing RBF neural network. IEEE Trans. Neural Netw. Learn. Syst. 2016, 29, 104–117. [Google Scholar] [CrossRef] [PubMed]
- Qiao, J.; Li, F.; Yang, C.; Li, W.; Gu, K. A self-organizing RBF neural network based on distance concentration immune algorithm. IEEE/CAA J. Autom. Sin. 2019, 7, 276–291. [Google Scholar] [CrossRef]
- Bhasina, S.; Kamalapurkar, R.; Johnson, M.; Vamvoudakis, K.G.; Lewis, F.L.; Dixon, W.E. A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 2013, 49, 82–92. [Google Scholar] [CrossRef]
- Modares, H.; Lewis, F.L.; Naghibi-Sistani, M.-B. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 2014, 50, 193–202. [Google Scholar] [CrossRef]
- Modares, H.; Lewis, F.L.; Jiang, Z.P. H∞ Tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 2550–2562. [Google Scholar] [CrossRef] [PubMed]
- Zhao, D.; Zhang, Q.; Wang, D.; Zhu, W. Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans. Cybern. 2015, 46, 854–865. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; He, H. Adaptive critic designs for event-triggered robust control of nonlinear systems with unknown dynamics. IEEE Trans. Cybern. 2018, 49, 2255–2267. [Google Scholar] [CrossRef]
- Mu, C.; Zhang, Y.; Sun, C. Data-Based feedback relearning control for uncertain nonlinear systems with actuator faults. IEEE Trans. Cybern. 2022, 1–14. [Google Scholar] [CrossRef]
- Lv, Y.; Na, J.; Yang, Q.; Wu, X.; Guo, Y. Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics. Int. J. Control Autom. 2016, 89, 99–112. [Google Scholar] [CrossRef]
- Lv, Y.; Na, J.; Ren, X. Online H∞ control for completely unknown nonlinear systems via an identifier–critic-based ADP structure. Int. J. Control Autom. 2019, 92, 100–111. [Google Scholar] [CrossRef]
- Lv, Y.; Ren, X.; Na, J. Online Nash-optimization tracking control of multi-motor driven load system with simplified RL scheme. ISA Trans. 2020, 98, 251–262. [Google Scholar] [CrossRef]
- Na, J.; Lv, Y.; Zhang, K.; Zhao, J. Adaptive identifier-critic-based optimal tracking control for nonlinear systems with experimental validation. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 459–472. [Google Scholar] [CrossRef]
- Tatari, F.; Naghibi-Sistani, M.B.; Vamvoudakis, K.G. Distributed optimal synchronization control of linear networked systems under unknown dynamics. In Proceedings of the 2017 American Control Conference (ACC), Seattle, WA, USA, 24–26 May 2017; pp. 668–673. [Google Scholar]
- Tatari, F.; Vamvoudakis, K.G.; Mazouchi, M. Optimal distributed learning for disturbance rejection in networked non-linear games under unknown dynamics. IET Control. Theory Appl. 2018, 13, 2838–2848. [Google Scholar] [CrossRef]
- Shi, J.; Yue, D.; Xie, X. Optimal leader-follower consensus for constrained-input multiagent systems with completely unknown dynamics. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 1182–1191. [Google Scholar] [CrossRef]
- Tan, W.; Peng, Z.; Ji, H.; Luo, R.; Kuang, Y.; Hu, J. Event-triggered model-free optimal consensus for unknown multi-agent systems with input constraints. In Proceedings of the 2022 Chinese Control Conference (CCC), Hefei, China, 25–27 July 2022; pp. 4729–4734. [Google Scholar]
- Luo, R.; Peng, Z.; Hu, J.; Ghosh, B.K. Adaptive optimal control of completely unknown systems with relaxed PE conditions. In Proceedings of the IEEE 11th Data Driven Control and Learning Systems Conference, Chengdu, China, 3–5 August 2022; pp. 836–841. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Luo, R.; Peng, Z.; Hu, J. On Model Identification Based Optimal Control and It’s Applications to Multi-Agent Learning and Control. Mathematics 2023, 11, 906. https://doi.org/10.3390/math11040906
Luo R, Peng Z, Hu J. On Model Identification Based Optimal Control and It’s Applications to Multi-Agent Learning and Control. Mathematics. 2023; 11(4):906. https://doi.org/10.3390/math11040906
Chicago/Turabian StyleLuo, Rui, Zhinan Peng, and Jiangping Hu. 2023. "On Model Identification Based Optimal Control and It’s Applications to Multi-Agent Learning and Control" Mathematics 11, no. 4: 906. https://doi.org/10.3390/math11040906