Overcoming Challenges of Applying Reinforcement Learning for Intelligent Vehicle Control
Abstract
:1. Introduction
- We investigate the impact of the environmental complexity in the learning process of RL tasks involving path planning scenarios;
- We discuss a method for transferring RL policies from the simulation domain to the real-world domain, supported by empirical evidence and a working algorithm for the discussed method;
- We show how CL can be applied within the context of intelligent vehicle control in tasks involving multiple agents.
2. Background
2.1. Q-Learning
2.2. Multi-Agent Reinforcement Learning and Curriculum Learning
3. Methodology and Experimental Setup
Algorithm 1. Algorithm used by the Arduino robots to use the transferred policies | |
1. | Set static map |
2. | Input array of measured distances front, right and left, |
3. | For each distance d in do |
4. | map to value in m, |
5. | add d to new array of mapped values, |
6. | End For |
7. | Get state s corresponding to |
8. | Get-value for state , |
9. | Output action |
4. Results and Discussions
4.1. Impact of Environmental Complexity in the Learning Process
4.2. Curriculum Learning for MARL Driving Decision-Making Scenarios
4.3. Sim-to-Real: Transfer of Reinforcement Learning Policies from Simulation to Reality
4.3.1. Simulation Domain
4.3.2. Real World Domain
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Chen, G. A New Framework for Multi-Agent Reinforcement Learning—Centralized Training and Exploration with Decentralized Execution via Policy Distillation. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, Auckland, New Zealand, 9–13 May 2020. [Google Scholar]
- Hernandez-Leal, P.; Kartal, B.; Taylor, M.E. A survey and critique of multiagent deep reinforcement learning. Auton. Agents Multi-Agent Syst. 2019, 33, 750–797. [Google Scholar] [CrossRef] [Green Version]
- Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [Google Scholar] [CrossRef] [Green Version]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Liu, S.; See, K.C.; Ngiam, K.Y.; Celi, L.A.; Sun, X.; Feng, M. Reinforcement Learning for Clinical Decision Support in Critical Care: Comprehensive Review. J. Med. Internet Res. 2020, 22, e18477. [Google Scholar] [CrossRef]
- Pineau, J.; Guez, A.; Vincent, R.; Panuccio, G.; Avoli, M. Treating epilepsy via adaptive neurostimulation: A reinforcement learning approach. Int. J. Neural Syst. 2009, 19, 227–240. [Google Scholar] [CrossRef] [Green Version]
- Escandell-Montero, P.; Chermisi, M.; Martínez-Martínez, J.M.; Gómez-Sanchis, J.; Barbieri, C.; Soria-Olivas, E.; Mari, F.; Vila-Francés, J.; Stopper, A.; Gatti, E.; et al. Optimization of anemia treatment in hemodialysis patients via reinforcement learning. Artif. Intell. Med. 2014, 62, 47–60. [Google Scholar] [CrossRef] [Green Version]
- Hu, Y.-J.; Lin, S.-J. Deep Reinforcement Learning for Optimizing Finance Portfolio Management. In Proceedings of the 2019 Amity International Conference on Artificial Intelligence, Dubai, United Arab Emirates, 4–6 February 2019. [Google Scholar]
- Liu, X.-Y.; Yang, H.; Chen, Q.; Zhang, R.; Yang, L.; Xiao, B.; Wang, C.D. FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance. In Proceedings of the 34th Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
- Mahadevan, S.; Connell, J. Automatic programming of behavior-based robots using reinforcement learning. Artif. Intell. 1992, 55, 311–365. [Google Scholar] [CrossRef]
- Martinez-Marin, T.; Duckett, T. Fast Reinforcement Learning for Vision-guided Mobile Robots. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005. [Google Scholar]
- Hester, T.; Quinlan, M.; Stone, P. A Real-Time Model-Based Reinforcement Learning Architecture for Robot Control. In Proceedings of the IEEE International Conference on Robotics and Automation, Saint Paul, MN, USA, 14–18 May 2012. [Google Scholar]
- Ng, A.Y.; Kim, H.J.; Jordan, M.I.; Sastry, S. Autonomous helicopter flight via reinforcement learning. In Proceedings of the Advances in Neural Information Processing Systems 16, Vancouver, BC, Canada, 8–13 December 2003. [Google Scholar]
- Wang, Y.; Fang, Y.; Lou, P.; Yan, J.; Liu, N. Deep Reinforcement Learning based Path Planning for Mobile Robot in Unknown Environment. J. Phys. Conf. Ser. 2020, 1576, 012009. [Google Scholar] [CrossRef]
- Raajan, J.; Srihari, P.V.; Satya, J.P.; Bhikkaji, B.; Pasumarthy, R. Real Time Path Planning of Robot using Deep Reinforcement Learning. IFAC-PapersOnLine 2020, 53, 15602–15607. [Google Scholar] [CrossRef]
- Wulfmeier, M.; Rao, D.; Wang, D.Z.; Ondruska, P.; Posner, I. Large-scale cost function learning for path planning using deep inverse reinforcement learning. Int. J. Robot. Res. 2017, 36, 1073–1087. [Google Scholar] [CrossRef]
- Zheng, Y.; Liu, S. Bibliometric analysis for talent identification by the subject–author–citation three-dimensional evaluation model in the discipline of physical education. Libr. Hi Tech 2020. ahead-of-print. [Google Scholar] [CrossRef]
- Wang, B.; Liu, Z.; Li, Q.; Prorok, A. Mobile Robot Path Planning in Dynamic Environments through Globally Guided Reinforcement Learning. IEEE Robot. Autom. Lett. 2020, 5, 6932–6939. [Google Scholar] [CrossRef]
- Sichkar, V.N. Reinforcement Learning Algorithms in Global Path Planning for Mobile Robot. In Proceedings of the 2019 International Conference on Industrial Engineering, Applications and Manufacturing, Sochi, Russia, 25–29 March 2019. [Google Scholar]
- Xin, J.; Zhao, H.; Liu, D.; Li, M. Application of deep reinforcement learning in mobile robot path planning. In Proceedings of the 2017 Chinese Automation Congress, Jinan, China, 20–22 October 2017. [Google Scholar]
- Liu, X.-H.; Zhang, D.-G.; Yan, H.-R.; Cui, Y.-Y.; Chen, L. A New Algorithm of the Best Path Selection Based on Machine Learning. IEEE Access 2019, 7, 126913–126928. [Google Scholar] [CrossRef]
- Mataric, M.J. Interaction and Intelligent Behavior. Ph.D. Thesis, MIT, Cambridge, MA, USA, 1994. [Google Scholar]
- Gao, J.; Ye, W.; Guo, J.; Li, Z. Deep Reinforcement Learning for Indoor Mobile Robot Path Planning. Sensors 2020, 20, 5493. [Google Scholar] [CrossRef]
- Xu, H.; Wang, N.; Zhao, H.; Zheng, Z. Deep reinforcement learning-based path planning of underactuated surface vessels. Cyber-Phys. Syst. 2019, 5, 1–17. [Google Scholar] [CrossRef]
- Yan, C.; Xiang, X.; Wang, C. Towards Real-Time Path Planning through Deep Reinforcement Learning for a UAV in Dynamic Environments. J. Intell. Robot. Syst. 2020, 98, 297–309. [Google Scholar] [CrossRef]
- Luis, S.Y.; Reina, D.G.; Marín, S.L.T. A Multiagent Deep Reinforcement Learning Approach for Path Planning in Autonomous Surface Vehicles: The Ypacaraí Lake Patrolling Case. IEEE Access 2021, 9, 17084–17099. [Google Scholar] [CrossRef]
- Prianto, E.; Park, J.-H.; Bae, J.-H.; Kim, J.-S. Deep Reinforcement Learning-Based Path Planning for Multi-Arm Manipulators with Periodically Moving Obstacles. Appl. Sci. 2021, 11, 2587. [Google Scholar] [CrossRef]
- Watkins, C.; Dayan, P. Technical Note Q-Learning. In Machine Learning; Kluwer Academic Publishers: Boston, MA, USA, 1992; Volume 8, pp. 279–292. [Google Scholar]
- Gu, S.; Holly, E.; Lillicrap, T.; Levine, S. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation, Singapore, 29 May–3 June 2017. [Google Scholar]
- Ong, P.; Low, E.S.; Low, C.Y. Mobile Robot Path Planning using Q-Learning with Guided Distance and Moving Target Concept. Int. J. Integr. Eng. 2020, 13, 177–188. [Google Scholar]
- Wu, S.; Hu, J.; Zhao, C.; Pan, Q. Path planning for autonomous mobile robot using transfer learning-based Q-learning. In Proceedings of the 2020 3rd International Conference on Unmanned Systems (ICUS), Harbin, China, 27–28 November 2020. [Google Scholar]
- Kröse, B.J.A. Learning from delayed rewards. Robot. Auton. Syst. 1995, 15, 233–235. [Google Scholar] [CrossRef]
- Kim, H.; Lee, W. Real-Time Path Planning Through Q-learning’s Exploration Strategy Adjustment. In Proceedings of the 2021 International Conference on Electronics, Information, and Communication, Jeju, Korea (South), 31 January–3 February 2021. [Google Scholar]
- Low, E.S.; Ong, P.; Cheah, K.C. Solving the optimal path planning of a mobile robot using improved Q-learning. Robot. Auton. Syst. 2019, 115, 143–161. [Google Scholar] [CrossRef]
- Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum Learning. In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009. [Google Scholar]
- Gülçehre, Ç.; Bengio, Y. Knowledge matters: Importance of prior information for optimization. J. Mach. Learn. Res. 2016, 17, 226–257. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- Bengio, Y.; Lamblin, P.; Popovici, D.; Larochelle, H. Greedy Layer-Wise Training of Deep Networks. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–8 December 2007. [Google Scholar]
- Narvekar, S.; Peng, B.; Leonetti, M.; Sinapov, J.; Taylor, M.E.; Stone, P. Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey. J. Mach. Learn. Res. 2020, 21, 1–50. [Google Scholar]
- Koul, A. ma-gym: Collection of Multi-Agent Environments Based on OpenAI gym. GitHub Repository. Available online: https://github.com/koulanurag/ma-gym (accessed on 1 September 2021).
- Lucchi, M.; Zindler, F.; Muhlbacher-Karrer, S.; Pichler, H. robo-gym—An Open Source Toolkit for Distributed Deep Reinforcement Learning on Real and Simulated Robots. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, Nevada, 25–29 October 2020. [Google Scholar]
- Do, T.-D.; Duong, M.-T.; Dang, Q.-V.; Le, M.-H. Real-Time Self-Driving Car Navigation Using Deep Neural Network. In Proceedings of the 2018 4th International Conference on Green Technology and Sustainable Development (GTSD), Ho Chi Minh City, Vietnam, 23–24 November 2018. [Google Scholar]
- Kusupati, A.; Singh, M.; Bhatia, K.; Kumar, A.; Jain, P.; Varma, M. FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, Canada, 3–8 December 2018. [Google Scholar]
- Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain randomization for transferring deep neural networks from simulation to the real world. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017. [Google Scholar]
- Panda, K.G.; Agrawal, D.; Nshimiyimana, A.; Hossain, A. Effects of environment on accuracy of ultrasonic sensor operates in millimetre range. Perspect. Sci. 2016, 8, 574–576. [Google Scholar] [CrossRef] [Green Version]
- Gupta, J.K.; Egorov, M.; Kochenderfer, M. Cooperative Multi-agent Control Using Deep Reinforcement Learning. In Proceedings of the Autonomous Agents and Multiagent Systems, São Paulo, Brazil, 8–12 May 2017. [Google Scholar]
- Rashid, T.; Samvelyan, M.; de Witt, C.S.; Farquhar, G.; Foerster, J.; Whiteson, S. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Kalapos, A.; Gór, C.; Moni, R.; Harmati, I. Sim-to-real reinforcement learning applied to end-to-end vehicle control. In Proceedings of the 2020 23rd International Symposium on Measurement and Control in Robotics (ISMCR), Budapest, Hungary, 15–17 October 2020. [Google Scholar]
Environment | Grid World | Obstacles | Obstacles | Obstacles |
---|---|---|---|---|
Environment 1 | 100 × 100 | easy | moderate | hard |
Environment 2 | 200 × 120 | easy | moderate | hard |
Environment | Easy | Moderate | Hard |
---|---|---|---|
100 × 100 | 18.23 s | 18.41 s | 19.52 s |
200 × 120 | 25.11 s | 25.41 s | 27.44 s |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pina, R.; Tibebu, H.; Hook, J.; De Silva, V.; Kondoz, A. Overcoming Challenges of Applying Reinforcement Learning for Intelligent Vehicle Control. Sensors 2021, 21, 7829. https://doi.org/10.3390/s21237829
Pina R, Tibebu H, Hook J, De Silva V, Kondoz A. Overcoming Challenges of Applying Reinforcement Learning for Intelligent Vehicle Control. Sensors. 2021; 21(23):7829. https://doi.org/10.3390/s21237829
Chicago/Turabian StylePina, Rafael, Haileleol Tibebu, Joosep Hook, Varuna De Silva, and Ahmet Kondoz. 2021. "Overcoming Challenges of Applying Reinforcement Learning for Intelligent Vehicle Control" Sensors 21, no. 23: 7829. https://doi.org/10.3390/s21237829
APA StylePina, R., Tibebu, H., Hook, J., De Silva, V., & Kondoz, A. (2021). Overcoming Challenges of Applying Reinforcement Learning for Intelligent Vehicle Control. Sensors, 21(23), 7829. https://doi.org/10.3390/s21237829