FGRL: Federated Growing Reinforcement Learning for Resilient Mapless Navigation in Unfamiliar Environments
Abstract
:Featured Application
Abstract
1. Introduction
- We propose a federated growing reinforcement learning (FGRL) approach that allows multiple agents to be trained simultaneously so as to speed up the learning efficiency.
- The proposed FGRL approach provides a sensor-level navigation and collision avoidance mechanism that does not require a precise global environmental map.
- We introduce a knowledge aggregation strategy that ensures the generation of a shared model based on the performance of each agent’s local model.
- We carry out extensive experiment studies in Gazebo, and the results show that traditional RL-based mapless navigation algorithms indeed cannot cope with unfamiliar obstacles. Comparatively, our proposed approach can make UGVs fuse and transfer prior knowledge to new and unfamiliar obstacles in navigation tasks.
2. Related Work
2.1. RL-Based Mapless Navigation
2.2. Federated Learning
3. FGRL for Mapless Navigation
3.1. RL-Based Mapless Navigation
3.1.1. General Framework
3.1.2. Observation Space
3.1.3. Action Space
3.1.4. Reward Function
3.2. Federated Growing Reinforcement Learning (FGRL)
Algorithm 1: Federated growing reinforcement learning algorithm. (K agents are indexed by k, M global aggregation rounds are indexed by m, E is the number of local episodes.) |
4. Experiments and Discussion
4.1. Experiment Setup
4.2. Training Performance
4.3. Evaluation in Familiar Environments
4.4. Evaluation in Unfamiliar Environments
4.4.1. Unfamiliar Plain Environment
4.4.2. Unfamiliar Factory Environment
4.5. Evaluation in Real-World Environments
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Mabkhot, M.M.; Al-Ahmari, A.M.; Salah, B.; Alkhalefah, H. Requirements of the smart factory system: A survey and perspective. Machines 2018, 6, 23. [Google Scholar] [CrossRef]
- Xue, H.; Hein, B.; Bakr, M.; Schildbach, G.; Abel, B.; Rueckert, E. Using deep reinforcement learning with automatic curriculum learning for mapless navigation in intralogistics. Appl. Sci. 2022, 12, 3153. [Google Scholar] [CrossRef]
- Kriegel, J.; Rissbacher, C.; Reckwitz, L.; Tuttle-Weidinger, L. The requirements and applications of autonomous mobile robotics (AMR) in hospitals from the perspective of nursing officers. Int. J. Healthc. Manag. 2022, 15, 204–210. [Google Scholar] [CrossRef]
- Zhao, Y.L.; Hong, Y.T.; Huang, H.P. Comprehensive Performance Evaluation between Visual SLAM and LiDAR SLAM for Mobile Robots: Theories and Experiments. Appl. Sci. 2024, 14, 3945. [Google Scholar] [CrossRef]
- Blochliger, F.; Fehr, M.; Dymczyk, M.; Schneider, T.; Siegwart, R. Topomap: Topological mapping and navigation based on visual slam maps. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; IEEE: New York, NY, USA, 2018; pp. 3818–3825. [Google Scholar]
- Wang, J.; Chi, W.; Li, C.; Wang, C.; Meng, M.Q.H. Neural RRT*: Learning-based optimal path planning. IEEE Trans. Autom. Sci. Eng. 2020, 17, 1748–1758. [Google Scholar] [CrossRef]
- Su, Y.; Wang, T.; Shao, S.; Yao, C.; Wang, Z. GR-LOAM: LiDAR-based sensor fusion SLAM for ground robots on complex terrain. Robot. Auton. Syst. 2021, 140, 103759. [Google Scholar] [CrossRef]
- Schrittwieser, J.; Antonoglou, I.; Hubert, T.; Simonyan, K.; Sifre, L.; Schmitt, S.; Guez, A.; Lockhart, E.; Hassabis, D.; Graepel, T.; et al. Mastering atari, go, chess and shogi by planning with a learned model. Nature 2020, 588, 604–609. [Google Scholar] [CrossRef]
- Perolat, J.; De Vylder, B.; Hennes, D.; Tarassov, E.; Strub, F.; de Boer, V.; Muller, P.; Connor, J.T.; Burch, N.; Anthony, T.; et al. Mastering the game of Stratego with model-free multiagent reinforcement learning. Science 2022, 378, 990–996. [Google Scholar] [CrossRef]
- Kalashnikov, D.; Irpan, A.; Pastor, P.; Ibarz, J.; Herzog, A.; Jang, E.; Quillen, D.; Holly, E.; Kalakrishnan, M.; Vanhoucke, V.; et al. Scalable deep reinforcement learning for vision-based robotic manipulation. In Proceedings of the Conference on Robot Learning, PMLR, Zürich, Switzerland, 29–31 October 2018; pp. 651–673. [Google Scholar]
- Kilinc, O.; Montana, G. Reinforcement learning for robotic manipulation using simulated locomotion demonstrations. Mach. Learn. 2022, 111, 465–486. [Google Scholar] [CrossRef]
- Pintos Gómez de las Heras, B.; Martínez-Tomás, R.; Cuadra Troncoso, J.M. Self-Learning Robot Autonomous Navigation with Deep Reinforcement Learning Techniques. Appl. Sci. 2023, 14, 366. [Google Scholar] [CrossRef]
- Patel, U.; Kumar, N.K.S.; Sathyamoorthy, A.J.; Manocha, D. DWA-RL: Dynamically feasible deep reinforcement learning policy for robot navigation among mobile obstacles. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May 30–5 June 2021; IEEE: New York, NY, USA, 2021; pp. 6057–6063. [Google Scholar]
- Chen, T.; Zhang, K.; Giannakis, G.B.; Başar, T. Communication-efficient policy gradient methods for distributed reinforcement learning. IEEE Trans. Control Netw. Syst. 2021, 9, 917–929. [Google Scholar] [CrossRef]
- Ma, C.; Zhang, J.; Liu, J.; Ji, L.; Gao, F. A parallel multi-module deep reinforcement learning algorithm for stock trading. Neurocomputing 2021, 449, 290–302. [Google Scholar] [CrossRef]
- Liu, B.; Wang, L.; Liu, M. Lifelong federated reinforcement learning: A learning architecture for navigation in cloud robotic systems. IEEE Robot. Autom. Lett. 2019, 4, 4555–4562. [Google Scholar] [CrossRef]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, PMLR, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- Wenzel, P.; Schön, T.; Leal-Taixé, L.; Cremers, D. Vision-based mobile robotics obstacle avoidance with deep reinforcement learning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: New York, NY, USA, 2021; pp. 14360–14366. [Google Scholar]
- Han, Y.; Zhan, I.H.; Zhao, W.; Pan, J.; Zhang, Z.; Wang, Y.; Liu, Y.J. Deep Reinforcement Learning for Robot Collision Avoidance With Self-State-Attention and Sensor Fusion. IEEE Robot. Autom. Lett. 2022, 7, 6886–6893. [Google Scholar] [CrossRef]
- Jang, Y.; Baek, J.; Han, S. Hindsight Intermediate Targets for Mapless Navigation with Deep Reinforcement Learning. IEEE Trans. Ind. Electron. 2021, 69, 11816–11825. [Google Scholar] [CrossRef]
- Tai, L.; Paolo, G.; Liu, M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; IEEE: New York, NY, USA, 2017; pp. 31–36. [Google Scholar]
- Marchesini, E.; Farinelli, A. Discrete deep reinforcement learning for mapless navigation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; IEEE: New York, NY, USA, 2020; pp. 10688–10694. [Google Scholar]
- Long, P.; Fan, T.; Liao, X.; Liu, W.; Zhang, H.; Pan, J. Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; IEEE: New York, NY, USA, 2018; pp. 6252–6259. [Google Scholar]
- Hadidi, R.; Cao, J.; Woodward, M.; Ryoo, M.S.; Kim, H. Distributed perception by collaborative robots. IEEE Robot. Autom. Lett. 2018, 3, 3709–3716. [Google Scholar] [CrossRef]
- Clemente, A.V.; Castejón, H.N.; Chandra, A. Efficient parallel methods for deep reinforcement learning. arXiv 2017, arXiv:1705.04862. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, PMLR, New York City, NY, USA, 19–24 June 2016; pp. 1928–1937. [Google Scholar]
- Xu, M.; Shen, Y.; Zhang, S.; Lu, Y.; Zhao, D.; Tenenbaum, J.; Gan, C. Prompting decision transformer for few-shot policy generalization. In Proceedings of the International Conference on Machine Learning (ICML), PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 24631–24645. [Google Scholar]
- Fan, T.; Long, P.; Liu, W.; Pan, J.; Yang, R.; Manocha, D. Learning resilient behaviors for navigation under uncertainty. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; IEEE: New York, NY, USA, 2020; pp. 5299–5305. [Google Scholar]
- Imteaj, A.; Amini, M.H. Fedar: Activity and resource-aware federated learning model for distributed mobile robots. In Proceedings of the IEEE International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 14–17 December 2020; IEEE: New York, NY, USA, 2020; pp. 1153–1160. [Google Scholar]
- Tursunboev, J.; Kang, Y.S.; Huh, S.B.; Lim, D.W.; Kang, J.M.; Jung, H. Hierarchical Federated Learning for Edge-Aided Unmanned Aerial Vehicle Networks. Appl. Sci. 2022, 12, 670. [Google Scholar] [CrossRef]
- Zhou, X.; Liang, W.; She, J.; Yan, Z.; Kevin, I.; Wang, K. Two-layer federated learning with heterogeneous model aggregation for 6g supported internet of vehicles. IEEE Trans. Veh. Technol. 2021, 70, 5308–5317. [Google Scholar] [CrossRef]
- Mohri, M.; Sivek, G.; Suresh, A.T. Agnostic federated learning. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 4615–4625. [Google Scholar]
- Wang, Y.; Kantarci, B. Reputation-enabled federated learning model aggregation in mobile platforms. In Proceedings of the IEEE International Conference on Communications, Montreal, QC, Canada, 14–23 June 2021; IEEE: New York, NY, USA, 2021; pp. 1–6. [Google Scholar]
- Majcherczyk, N.; Srishankar, N.; Pinciroli, C. Flow-fl: Data-driven federated learning for spatio-temporal predictions in multi-robot systems. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: New York, NY, USA, 2021; pp. 8836–8842. [Google Scholar]
- Wang, H.; Kaplan, Z.; Niu, D.; Li, B. Optimizing federated learning on non-iid data with reinforcement learning. In Proceedings of the IEEE Conference on Computer Communications, Toronto, ON, Canada, 6–9 July 2020; IEEE: New York, NY, USA, 2020; pp. 1698–1707. [Google Scholar]
- Zhang, P.; Wang, C.; Jiang, C.; Han, Z. Deep reinforcement learning assisted federated learning algorithm for data management of IIoT. IEEE Trans. Ind. Inform. 2021, 17, 8475–8484. [Google Scholar] [CrossRef]
- Yu, S.; Chen, X.; Zhou, Z.; Gong, X.; Wu, D. When deep reinforcement learning meets federated learning: Intelligent multitimescale resource management for multiaccess edge computing in 5G ultradense network. IEEE Internet Things J. 2020, 8, 2238–2251. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.; Lanctot, M.; Freitas, N. Dueling network architectures for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, PMLR, New York City, NY, USA, 19–24 June 2016; pp. 1995–2003. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
Symbol | Definition | Value |
---|---|---|
learning rate | 0.0004 | |
discount factor | 0.99 | |
adjustment coefficient of the collision | −15 | |
adjustment coefficient of the distance reward | 500 | |
E | number of the local episodes | 10 |
M | number of the global update periods | 40 |
adjustment coefficient of the growth value | 0.5 | |
threshold of the growth value | 0.1 | |
adjustment coefficient of the soft update | 0.9 |
Algorithm | Metric | env1 | env2 | env3 | env4 |
---|---|---|---|---|---|
TD3 | Success rate | 95% | 96% | 95% | 94% |
Average successful steps | 444 | 461 | 459 | 482 | |
FL-TD3 | Success rate | 100% | 100% | 99% | 99% |
Average successful steps | 440 | 444 | 439 | 447 | |
FGRL | Success rate | 100% | 100% | 100% | 100% |
Average successful steps | 432 | 438 | 430 | 440 |
Trained Model | Success Rate | Crash Rate | Timeout Rate | Average Successful Steps |
---|---|---|---|---|
TD3-env1 | 0% | 84% | 16% | - |
TD3-env2 | 76% | 8% | 16% | 523 |
TD3-env3 | 89% | 3% | 8% | 513 |
TD3-env4 | 92% | 2% | 6% | 505 |
FL-TD3 | 96% | 0% | 4% | 404 |
FGRL | 100% | 0% | 0% | 349 |
Trained Model | Success Rate | Crash Rate | Timeout Rate | Average Successful Steps |
---|---|---|---|---|
TD3-env1 | 0% | 0% | 100% | - |
TD3-env2 | 0% | 0% | 100% | - |
TD3-env3 | 1% | 39% | 60% | 1449 |
TD3-env4 | 14% | 37% | 49% | 1238 |
FL-TD3 | 31% | 69% | 0% | 1158 |
FGRL | 79% | 21% | 0% | 924 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tian, S.; Wei, C.; Li, Y.; Ji, Z. FGRL: Federated Growing Reinforcement Learning for Resilient Mapless Navigation in Unfamiliar Environments. Appl. Sci. 2024, 14, 11336. https://doi.org/10.3390/app142311336
Tian S, Wei C, Li Y, Ji Z. FGRL: Federated Growing Reinforcement Learning for Resilient Mapless Navigation in Unfamiliar Environments. Applied Sciences. 2024; 14(23):11336. https://doi.org/10.3390/app142311336
Chicago/Turabian StyleTian, Shunyu, Changyun Wei, Yajun Li, and Ze Ji. 2024. "FGRL: Federated Growing Reinforcement Learning for Resilient Mapless Navigation in Unfamiliar Environments" Applied Sciences 14, no. 23: 11336. https://doi.org/10.3390/app142311336
APA StyleTian, S., Wei, C., Li, Y., & Ji, Z. (2024). FGRL: Federated Growing Reinforcement Learning for Resilient Mapless Navigation in Unfamiliar Environments. Applied Sciences, 14(23), 11336. https://doi.org/10.3390/app142311336