A Disturbance Rejection Control Method Based on Deep Reinforcement Learning for a Biped Robot
Abstract
:1. Introduction
2. Related Work
2.1. Calculation of the ZMP Position
2.2. Biped Robot Controlled by DRL
3. Method
- (a)
- The acceleration of the cart is increased when the target ZMP is set behind the current ZMP ().
- (b)
- The acceleration of the cart is decreased when the target ZMP is set in front of the current ZMP ().
3.1. Model Generation
3.2. Policy Training
4. Experiments and Results
4.1. Agent Training for the Cart–Table Model
Algorithm 1. Cart–table model with DRL algorithms. |
Build a cart–table model in V-REP |
Choose a DRL algorithm |
Randomly initialize a set of weights and biases of the deep neural network of this algorithm |
forep = 1, EPISODE do |
Initialize the simulation environment: set the Cart to zero position to (the midpoint of the desktop), set the speed to and acceleration to |
Set the table tile angle to a random value within the range from −0.5° to 0.5° (task 1) or a fixed value of 1° (task 2). |
Get initial state by sorting (, , ) |
for t = 1, STEP do |
Select action according to current policy |
Execute to obtain reward and observe new state |
Add to reward |
Obtain loss according to network output |
Update network parameters using information such as the loss and gradient: |
end for |
endfor |
4.2. Agent Training for the Balance Control of a Biped Robot
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kajita, S.; Morisawa, M.; Harada, K.; Kaneko, K.; Kanehiro, F.; Fujiwara, K.; Hirukawa, H. Biped walking pattern generator allowing auxiliary zmp control. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 9–15 October 2006; pp. 2993–2999. [Google Scholar]
- Vukobratović, M.; Borovac, B.; Surla, D.; Stokić, D. Biped Locomotion—Dynamics, Stability, Control and Application; Springer: Berlin/Heidelberg, Germany, 1990. [Google Scholar]
- Hyon, S.-H.; Osu, R.; Otaka, Y. Integration of multi-level postural balancing on humanoid robots. In Proceedings of the IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; pp. 1549–1556. [Google Scholar]
- Stephens, B.J.; Atkeson, C.G. Dynamic balance force control for compliant humanoid robots. In Proceedings of the International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010; pp. 1248–1255. [Google Scholar]
- Li, Z.; VanderBorght, B.; Tsagarakis, N.G.; Colasanto, L.; Caldwell, D.G. Stabilization for the compliant humanoid robot COMAN exploiting intrinsic and controlled compliance. In Proceedings of the International Conference on Robotics and Automation, Saint Paul, MI, USA, 14–18 May 2012; pp. 2000–2006. [Google Scholar]
- Hwangbo, J.; Lee, J.; Dosovitskiy, A.; Bellicoso, D.; Tsounis, V.; Koltun, V.; Hutter, M. Learning agile and dynamic motor skills for legged robots. Sci. Robot. 2019, 4, eaau5872. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wu, Y.; Yao, D.; Xiao, X.; Guo, Z. Intelligent controller for passivity-based biped robot using deep Q network. J. Intell. Fuzzy Syst. 2019, 36, 731–745. [Google Scholar] [CrossRef]
- Vuga, R.; Ogrinc, M.; Gams, A.; Petric, T.; Sugimoto, N.; Ude, A.; Morimoto, J. Motion capture and reinforcement learning of dynamically stable humanoid movement primitives. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, 6–10 May 2013; pp. 5284–5290. [Google Scholar]
- Wu, W.; Gao, L. Posture self-stabilizer of a biped robot based on training platform and reinforcement learning. Robot. Auton. Syst. 2017, 98, 42–55. [Google Scholar] [CrossRef]
- Tedrake, R.; Zhang, T.W.; Seung, H.S. Stochastic policy gradient reinforcement learning on a simple 3D biped. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No. 04CH37566), Sendai, Japan, 28 September–2 October 2004; Volume 3, pp. 2849–2854. [Google Scholar]
- Gil, C.R.; Calvo, H.; Sossa, H. Learning an efficient gait cycle of a biped robot based on reinforcement learning and artificial neural networks. Appl. Sci. 2019, 9, 502. [Google Scholar] [CrossRef] [Green Version]
- Xi, A.; Chen, C. Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning. Sensors 2020, 20, 4468. [Google Scholar] [CrossRef]
- Matsubara, T.; Morimoto, J.; Nakanishi, J.; Sato, M.A.; Doya, K. Learning CPG-based biped locomotion with a policy gradient method. Robot. Auton. Syst. 2006, 54, 911–920. [Google Scholar] [CrossRef]
- Li, C.; Lowe, R.; Ziemke, T. A novel approach to locomotion learning: Actor-Critic architecture using central pattern generators and dynamic motor primitives. Front. Neurorobot. 2014, 8, 23. [Google Scholar] [CrossRef] [PubMed]
- Kasaei, M.; Abreu, M.; Lau, N.; Pereira, A.; Reis, L.P. A Hybrid Biped Stabilizer System Based on Analytical Control and Learning of Symmetrical Residual Physics. arXiv 2020, arXiv:2011.13798. [Google Scholar]
- Wu, X.-G.; Liu, S.-W.; Yang, L.; Deng, W.-Q.; Jia, Z.-H. A Gait Control Method for Biped Robot on Slope Based on Deep Reinforcement Learning. Acta Autom. Sin. 2020, 46, 1–12. [Google Scholar]
- Kasaei, M.; Ahmadi, A.; Lau, N.; Pereira, A. A Robust Model-Based Biped Locomotion Framework Based on Three-Mass Model: From Planning to Control. In Proceedings of the 2020 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), Ponta Delgada, Portugal, 15–17 April 2020; pp. 257–262. [Google Scholar]
- Koolen, T.; Bertrand, S.; Thomas, G.; De Boer, T.; Wu, T.; Smith, J.; Englsberger, J.; Pratt, J. Design of a momentum-based control framework and application to the humanoid robot atlas. Int. J. Hum. Robot. 2016, 13, 1650007. [Google Scholar] [CrossRef]
- Herzog, A.; Rotella, N.; Mason, S.; Grimminger, F.; Schaal, S.; Righetti, L. Momentum control with hierarchical inverse dynamics on a torque-controlled humanoid. Auton. Robot. 2016, 40, 473–491. [Google Scholar] [CrossRef] [Green Version]
- Birjandi, S.A.B.; Haddadin, S. Model-Adaptive High-Speed Collision Detection for Serial-Chain Robot Manipulators. IEEE Robot. Autom. Lett. 2020, 5, 6544–6551. [Google Scholar] [CrossRef]
- Hirai, K.; Hirose, M.; Haikawa, Y.; Takenaka, T. The development of Honda humanoid robot. In Proceedings of the 1998 IEEE International Conference on Robotics and Automation (Cat. No. 98CH36146), Leuven, Belgium, 20 May 1998; Volume 2, pp. 1321–1326. [Google Scholar]
- Kajita, S.; Hirukawa, H.; Harada, K.; Yokoi, K. Introduction to Humanoid Robotics; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
- Web Site of CoppeliaSim. Available online: https://www.coppeliarobotics.com/ (accessed on 3 February 2021).
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Williams, G.; Wagener, N.; Goldfain, B.; Drews, P.; Rehg, J.M.; Boots, B.; Theodorou, E.A. Information theoretic MPC for model-based reinforcement learning. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 1714–1721. [Google Scholar]
(a) DDPG algorithm | |
DDPG Setup Hyper-Parameters | |
Actor/Critic learning rate | 1 × 10−3 |
Reward discount factor | 0.9 |
Soft replacement | 0.01 |
Batch size | 8 |
Running episodes | 80 |
Experience buffer | 5000 |
Number of neuron on actor network | 400 |
Number of neuron on critic network | 200 |
(b) Model-based MPC algorithm | |
Model-based MPC Setup Hyper-Parameters | |
learning rate | 1 × 10−2 |
Batch size | 16 |
Running episodes | 100 |
Buffer size | 10,000 |
Number of candidate action sequences | 200 |
Number of actions per candidate action sequence | 20 |
Number of neuron on input layer | 800 |
Number of neuron on hidden layer | 400 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, C.; Gao, J.; Tian, D.; Zhang, X.; Liu, H.; Meng, L. A Disturbance Rejection Control Method Based on Deep Reinforcement Learning for a Biped Robot. Appl. Sci. 2021, 11, 1587. https://doi.org/10.3390/app11041587
Liu C, Gao J, Tian D, Zhang X, Liu H, Meng L. A Disturbance Rejection Control Method Based on Deep Reinforcement Learning for a Biped Robot. Applied Sciences. 2021; 11(4):1587. https://doi.org/10.3390/app11041587
Chicago/Turabian StyleLiu, Chuzhao, Junyao Gao, Dingkui Tian, Xuefeng Zhang, Huaxin Liu, and Libo Meng. 2021. "A Disturbance Rejection Control Method Based on Deep Reinforcement Learning for a Biped Robot" Applied Sciences 11, no. 4: 1587. https://doi.org/10.3390/app11041587