Efficient Robot Manipulation via Reinforcement Learning with Dynamic Movement Primitives-Based Policy
Abstract
:1. Introduction
- Developing a novel RL approach specific for robot control tasks by integrating DMP-based policy and DDPG’s on-policy actor–critic framework. It enhanced the learning capability and the sample efficiency compared to the related work on NDPs [30], which inherits the off-policy characteristics of PPO.
- Designing an adaptive inverse controller that learns the hidden relationships between DMP outputs and various robot control actions. This overcomes the limitation of NDPs [30], which rely on human prior knowledge for designing inverse controllers. As a result, it significantly expands the applicability of DMPs in the RL domain.
- Evaluated on various simulated robot arm scenarios, the proposed method achieved superior control performances compared with NDPs and the original DDPG, highlighting the potential of RL with a DMP-based policy in complex robot control applications.
2. Preliminaries
2.1. Markov Decision Process
2.2. Deep Deterministic Policy Gradient
2.3. Dynamic Movement Primitives
3. Approach
3.1. Neural Dynamic Policies
3.2. DDPG with DMP-Based Policy and Adaptive Inverse Controller
3.3. Update of DDPG-DMP
Algorithm 1: Learning and Update Processes of the Proposed DDPG-DMP |
4. Experimental Results
4.1. Simulation Settings
4.2. OpenAI Robot Arm Scenarios
4.3. Panda Robot Arm Scenarios
4.4. Case Study
5. Conclusions and Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
The following abbreviations are used in this manuscript: | |
DDPG | Deep Deterministic Policy Gradient |
DQN | Deep Q Network |
DMP | Dynamic Movement Primitive |
MDP | Markov Decision Process |
NDP | Neural Dynamic Policy |
PI2 | Policy Improvement with Path Integrals |
PPO | Proximal Policy Optimization |
RL | Reinforcement Learning |
SAC | Soft Actor Critic |
TD3 | Twin Delayed Deep Deterministic Policy Gradient |
References
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Wang, X.; Wang, S.; Liang, X.; Zhao, D.; Huang, J.; Xu, X.; Dai, B.; Miao, Q. Deep Reinforcement Learning: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 5064–5078. [Google Scholar] [CrossRef] [PubMed]
- Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef] [PubMed]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
- Ye, D.; Liu, Z.; Sun, M.; Shi, B.; Zhao, P.; Wu, H.; Yu, H.; Yang, S.; Wu, X.; Guo, Q.; et al. Mastering complex control in moba games with deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 6672–6679. [Google Scholar]
- Masmitja, I.; Martin, M.; O’Reilly, T.; Kieft, B.; Palomeras, N.; Navarro, J.; Katija, K. Dynamic robotic tracking of underwater targets using reinforcement learning. Sci. Robot. 2023, 8, eade7811. [Google Scholar] [CrossRef]
- Cui, Y.; Peng, L.; Li, H. Filtered Probabilistic Model Predictive Control-based Reinforcement Learning for Unmanned Surface Vehicles. IEEE Trans. Ind. Inform. 2022, 18, 6950–6961. [Google Scholar] [CrossRef]
- Kaufmann, E.; Bauersfeld, L.; Loquercio, A.; Müller, M.; Koltun, V.; Scaramuzza, D. Champion-level drone racing using deep reinforcement learning. Nature 2023, 620, 982–987. [Google Scholar] [CrossRef]
- Zhu, L.; Cui, Y.; Takami, G.; Kanokogi, H.; Matsubara, T. Scalable reinforcement learning for plant-wide control of vinyl acetate monomer process. Control. Eng. Pract. 2020, 97, 104331. [Google Scholar] [CrossRef]
- Liu, B.; Akcakaya, M.; McDermott, T.E. Automated control of transactive hvacs in energy distribution systems. IEEE Trans. Smart Grid 2020, 12, 2462–2471. [Google Scholar] [CrossRef]
- Yuan, Z.; Zhang, Z.; Li, X.; Cui, Y.; Li, M.; Ban, X. Controlling Partially Observed Industrial System Based on Offline Reinforcement Learning—A Case Study of Paste Thickener. IEEE Trans. Ind. Inform. 2024, 1–11. [Google Scholar] [CrossRef]
- Tsurumine, Y.; Cui, Y.; Uchibe, E.; Matsubara, T. Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation. Robot. Auton. Syst. 2019, 112, 72–83. [Google Scholar] [CrossRef]
- Won, D.O.; Müller, K.R.; Lee, S.W. An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions. Sci. Robot. 2020, 5, eabb9764. [Google Scholar] [CrossRef] [PubMed]
- Haarnoja, T.; Moran, B.; Lever, G.; Huang, S.H.; Tirumala, D.; Humplik, J.; Wulfmeier, M.; Tunyasuvunakool, S.; Siegel, N.Y.; Hafner, R.; et al. Learning agile soccer skills for a bipedal robot with deep reinforcement learning. Sci. Robot. 2024, 9, eadi8022. [Google Scholar] [CrossRef] [PubMed]
- Kober, J.; Bagnell, J.A.; Peters, J. Reinforcement learning in robotics: A survey. Int. J. Robot. Res. 2013, 32, 1238–1274. [Google Scholar] [CrossRef]
- Tutsoy, O.; Asadi, D.; Ahmadi, K.; Nabavi-Chashmi, S.Y.; Iqbal, J. Minimum Distance and Minimum Time Optimal Path Planning With Bioinspired Machine Learning Algorithms for Faulty Unmanned Air Vehicles. IEEE Trans. Intell. Transp. Syst. 2024, 25, 9069–9077. [Google Scholar] [CrossRef]
- Vieillard, N.; Kozuno, T.; Scherrer, B.; Pietquin, O.; Munos, R.; Geist, M. Leverage the average: An analysis of KL regularization in reinforcement learning. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
- Shang, Z.; Li, R.; Zheng, C.; Li, H.; Cui, Y. Relative Entropy Regularized Sample-Efficient Reinforcement Learning with Continuous Actions. IEEE Trans. Neural Netw. Learn. Syst. 2023, 1–11. [Google Scholar] [CrossRef]
- Janner, M.; Fu, J.; Zhang, M.; Levine, S. When to trust your model: Model-based policy optimization. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
- Hansen, N.; Wang, X.; Su, H. Temporal Difference Learning for Model Predictive Control. In Proceedings of the International Conference on Machine Learning (ICML), Baltimore, MD, USA, 17–23 July 2022. [Google Scholar]
- Schaal, S. Dynamic movement primitives-a framework for motor control in humans and humanoid robotics. In Adaptive Motion of Animals and Machines; Springer: Berlin/Heidelberg, Germany, 2006; pp. 261–280. [Google Scholar]
- Saveriano, M.; Abu-Dakka, F.J.; Kramberger, A.; Peternel, L. Dynamic movement primitives in robotics: A tutorial survey. Int. J. Robot. Res. 2023, 42, 1133–1184. [Google Scholar] [CrossRef]
- Nemec, B.; Ude, A. Speed profile optimization through directed explorative learning. In Proceedings of the 2014 IEEE-RAS International Conference on Humanoid Robots, Atlanta, GA, USA, 18–20 November 2014; pp. 547–553. [Google Scholar]
- Vuga, R.; Nemec, B.; Ude, A. Enhanced policy adaptation through directed explorative learning. Int. J. Humanoid Robot. 2015, 12, 1550028. [Google Scholar] [CrossRef]
- Colomé, A.; Torras, C. Dimensionality reduction for dynamic movement primitives and application to bimanual manipulation of clothes. IEEE Trans. Robot. 2018, 34, 602–615. [Google Scholar] [CrossRef]
- Cui, Y.; Poon, J.; Miro, J.V.; Yamazaki, K.; Sugimoto, K.; Matsubara, T. Environment-adaptive interaction primitives through visual context for human–robot motor skill learning. Auton. Robot. 2019, 43, 1225–1240. [Google Scholar] [CrossRef]
- Lai, Y.; Paul, G.; Cui, Y.; Matsubara, T. User intent estimation during robot learning using physical human robot interaction primitives. Auton. Robot. 2022, 46, 421–436. [Google Scholar] [CrossRef]
- Theodorou, E.; Buchli, J.; Schaal, S. A generalized path integral control approach to reinforcement learning. J. Mach. Learn. Res. 2010, 11, 3137–3181. [Google Scholar]
- Hazara, M.; Kyrki, V. Reinforcement learning for improving imitated in-contact skills. In Proceedings of the 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids), Cancun, Mexico, 15–17 November 2016; pp. 194–201. [Google Scholar] [CrossRef]
- Bahl, S.; Mukadam, M.; Gupta, A.; Pathak, D. Neural dynamic policies for end-to-end sensorimotor learning. Adv. Neural Inf. Process. Syst. 2020, 33, 5058–5069. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning. PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning. PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Kamthe, S.; Deisenroth, M. Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Playa Blanca, Spain, 9 April 2018; pp. 1701–1710. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Gallouédec, Q.; Cazin, N.; Dellandréa, E.; Chen, L. Panda-gym: Open-source goal-conditioned environments for robotic learning. In Proceedings of the 4th Robot Learning Workshop: Self-Supervised and Lifelong Learning at NeurIPS 2021, Virtual, 14 December 2021. [Google Scholar]
- Fujimoto, S.; Chang, W.D.; Smith, E.; Gu, S.S.; Precup, D.; Meger, D. For sale: State-action representation learning for deep reinforcement learning. Adv. Neural Inf. Process. Syst. 2024, 36. [Google Scholar]
- Bhatt, A.; Palenicek, D.; Belousov, B.; Argus, M.; Amiranashvili, A.; Brox, T.; Peters, J. CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Huang, W.; Cui, Y.; Li, H.; Wu, X. Practical Probabilistic Model-Based Reinforcement Learning by Integrating Dropout Uncertainty and Trajectory Sampling. IEEE Trans. Neural Netw. Learn. Syst. 2024, 1–15. [Google Scholar] [CrossRef]
Reacher-v1 | Pusher-v0 | Thrower-v0 | PandaReach-v1 | PandaReachJoints-v1 | |
---|---|---|---|---|---|
State Dimension | 11 | 23 | 23 | 20 | 20 |
Action Dimension | 2 | 7 | 7 | 3 | 7 |
Critic Learning Rate | |||||
Actor Learning Rate | |||||
Target Update Rate () | |||||
Networks Structure | |||||
Batch Size (J) | 256 | 256 | 256 | 256 | 256 |
Discount Factor () | 0.99 | 0.99 | 0.99 | 0.95 | 0.95 |
Steps per Update (K) | 5 | 5 | 5 | 5 | 5 |
Memory Buffer Size | |||||
Warmup Steps | 2000 | 2000 | |||
Control Modes | Torque | Torque | Torque | End-Effector Position | Joint Angle |
DDPG-DMP | DDPG | TD3 | SAC | NDP | |
---|---|---|---|---|---|
Reacher-v1 | |||||
Pusher-v0 | |||||
Thrower-v0 |
DDPG-DMP | DDPG | TD3 | SAC | NDP | |
---|---|---|---|---|---|
PandaReach-v2 | |||||
PandaReachJoints-v2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, S.; Huang, W.; Miao, C.; Xu, K.; Chen, Y.; Sun, T.; Cui, Y. Efficient Robot Manipulation via Reinforcement Learning with Dynamic Movement Primitives-Based Policy. Appl. Sci. 2024, 14, 10665. https://doi.org/10.3390/app142210665
Li S, Huang W, Miao C, Xu K, Chen Y, Sun T, Cui Y. Efficient Robot Manipulation via Reinforcement Learning with Dynamic Movement Primitives-Based Policy. Applied Sciences. 2024; 14(22):10665. https://doi.org/10.3390/app142210665
Chicago/Turabian StyleLi, Shangde, Wenjun Huang, Chenyang Miao, Kun Xu, Yidong Chen, Tianfu Sun, and Yunduan Cui. 2024. "Efficient Robot Manipulation via Reinforcement Learning with Dynamic Movement Primitives-Based Policy" Applied Sciences 14, no. 22: 10665. https://doi.org/10.3390/app142210665
APA StyleLi, S., Huang, W., Miao, C., Xu, K., Chen, Y., Sun, T., & Cui, Y. (2024). Efficient Robot Manipulation via Reinforcement Learning with Dynamic Movement Primitives-Based Policy. Applied Sciences, 14(22), 10665. https://doi.org/10.3390/app142210665