Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum
Abstract
:1. Introduction
2. Construction of a Control System Based on DDPG
2.1. Markov Decision Process
2.2. Actor–Critic Algorithm
3. Improved DDPG Based on Local Optimization and Q-Value Overestimation
3.1. Recursive Small Experience Pool DDPG Algorithm
3.2. Optimization Design of Critic Network Structure
4. Experimental Results and Analysis
4.1. Simulation Environment
4.2. Results and Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Chang, H.; Li, X.; Zhong, W. Based on Adaptive Fuzzy PID of Two-stage Inverted Pendulum Control Method. Fire Control Command Control 2022, 47, 108–113. [Google Scholar]
- Liu, C.; Tao, Y.; Guo, S.; Chen, Y. Adaptive Integral Backstepping Control Strategy for Inverted Pendulum. Appl. Res. Comput. 2020, 37, 452–455. [Google Scholar] [CrossRef]
- Liu, Q.; Zhai, J.; Zhang, Z.; Zhong, S.; Zhou, Q.; Zhang, P.; Xu, J. A Survey on Deep Reinforcement Learning. Chin. J. Comput. 2018, 41, 1–27. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with Deep Reinforcement Learning. Comput. Sci. 2013, 12, 1–9. [Google Scholar]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic Policy Gradient Algorithms. In Proceedings of the 31st International Conference on Machine Learning (ICML 2014), Beijing, China, 21–26 June 2014. [Google Scholar]
- Li, Y.; Fang, Y.; Akhtar, Z. Accelerating Deep Reinforcement Learning Model for Game Strategy. Neurocomputing 2020, 408, 157–168. [Google Scholar] [CrossRef]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. In Proceedings of the 4th International Conference on Learning Representations (ICLR) 2016, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Zhang, J.; Liu, Q. Deep Deterministic Policy Gradient with Episode Experience Replay. Comput. Sci. 2021, 48, 37–43. [Google Scholar]
- Zhao, J.; Su, C.; Wang, Y. Research on Microservice Coordination Technologies based on Deep Reinforcement Learning. In Proceedings of the 2022 2nd International Conference on Electronic Information Technology and Smart Agriculture (ICEITSA), Huaihua, China, 9–11 December 2022. [Google Scholar]
- Wang, Y.; Ren, T.; Fan, Z. Autonomous Maneuver Decision of UAV Based on Deep Reinforcement Learning: Comparison of DQN and DDPG. In Proceedings of the 2022 34th Chinese Control and Decision Conference (CCDC), Hefei, China, 15–17 August 2022. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Hasselt, H.v.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the 13th AAAI Conference on Artificial Intelligence (AAAI 2016), Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
- Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.V.; Lanctot, M.; Freitas, N.D. Dueling Network Architectures for Deep Reinforcement Learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016. [Google Scholar]
- Liu, C.; Gao, J.; Bi, Y.; Shi, X.; Tian, D. A Multitasking-Oriented Robot Arm Motion Planning Scheme Based on Deep Reinforcement Learning and Twin Synchro-Control. Sensors 2020, 20, 3515. [Google Scholar] [CrossRef] [PubMed]
- Gu, S.; Holly, E.; Lillicrap, T.; Levine, S. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017. [Google Scholar]
- Ahmad El Sallab, M.A.; Perot, E.; Yogamani, S. Deep Reinforcement Learning framework for Autonomous Driving. Statistics 2017, 29, art00012. [Google Scholar] [CrossRef] [Green Version]
- Liu, S.; Lin, Q.; Yang, Z.; Wu, Y.; Zhai, Y. Balance Control of Two-wheeled Robot Based on Deep Deterministic Policy Gradient. Mech. Eng. 2020, 345, 142–144. [Google Scholar]
- Liu, Y.; Li, X.; Jiang, P.; Sun, B.; Wu, Z.; Jiang, X.; Qian, S. Research on Robot Dynamic Target Tracking and Obstacle Avoidance Control Based on DDPG–PID. J. Nanjing Univ. Aeronaut. Astronaut. 2022, 54, 41–50. [Google Scholar] [CrossRef]
- Wang, Y.; Chen, S.; Huang, H. Inverted Pendulum Controller Based on Improved Deep Reinforcement Learning. Control Eng. China 2022, 29, 2018–2026. [Google Scholar] [CrossRef]
- Xue, H.; Zhe, F.; Fang, Q.; Liu, X. Reinforcement learning based fractional gradient descent RBF neural network control of inverted pendulum. Control Decis. 2021, 36, 125–134. [Google Scholar] [CrossRef]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Harley, T.; Lillicrap, T.P.; Silver, D.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement Learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2018; pp. 13–15. [Google Scholar]
- Zhang, S.; Sutton, R.S. A deeper look at experience replay. arXiv 2017, arXiv:1712.01275. [Google Scholar]
- Hasselt, H.V.; Mahmood, A.R.; Sutton, R.S. Off-policy TD(λ) with a true online equivalence. In Proceedings of the 30th Conference on Uncertainty in Artifical Intelligence (UAI 2014), Quebec, QC, Canada, 23–27 July 2014. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
Parameter | Value |
---|---|
Pendulum mass (kg) | 0.2 |
Cart mass (kg) | 0.46 |
Length from centroid to hinge point (m) | 0.25 |
frictional coefficient | 0.08 |
Gravitational acceleration (m × s2) | 9.81 |
Hyperparameter | Value |
---|---|
Critic network learning rate | 0.001 |
Actor network learning rate | 0.0005 |
Critic network structure | (5 × 128 × 200 + 1 × 200) × 1 |
Actor network structure | 5 × 128 × 200 × 1 |
Discount factor | 0.995 |
Target update frequency | 10 |
Update interval | 100 |
Soft update parameters | 0.001 |
Activation function | ReLU |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, H.; Chen, Y.; Wang, T.; Feng, F.; Chen, W. Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum. Appl. Sci. 2023, 13, 7594. https://doi.org/10.3390/app13137594
Hu H, Chen Y, Wang T, Feng F, Chen W. Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum. Applied Sciences. 2023; 13(13):7594. https://doi.org/10.3390/app13137594
Chicago/Turabian StyleHu, Hailin, Yuhui Chen, Tao Wang, Fu Feng, and Weijin Chen. 2023. "Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum" Applied Sciences 13, no. 13: 7594. https://doi.org/10.3390/app13137594
APA StyleHu, H., Chen, Y., Wang, T., Feng, F., & Chen, W. (2023). Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum. Applied Sciences, 13(13), 7594. https://doi.org/10.3390/app13137594