Mobile Robot Navigation Based on Noisy N-Step Dueling Double Deep Q-Network and Prioritized Experience Replay
Abstract
:1. Introduction
2. Background
2.1. Problem Statement
2.2. Reinforcement Learning Basics
2.3. One-Step and N-Step TD Methods
2.4. Deep Q-Network
3. PER-n2D3QN Method
3.1. Double Deep Q-Network
3.2. DDQN with Dueling Network Structure
3.3. Prioritized Experience Replay
3.4. Target “Soft” Update
3.5. Exploration Policy
4. PER-n2D3QN for Mobile Robot Navigation
4.1. Action Space
4.2. State Space
4.3. Target-Oriented Reward Function
Algorithm 1 Prioritized Experience Reply Noisy n-step Dueling DDQN algorithm |
|
5. Numerical Experiments and Results
5.1. Experimental Settings
5.2. Quantitative Analysis of Simulation Environment
5.3. Results and Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Liu, Z.; Zhai, Y.; Li, J.; Wang, G.; Miao, Y.; Wang, H. Graph Relational Reinforcement Learning for Mobile Robot Navigation in Large-Scale Crowded Environments. IEEE Trans. Intell. Transp. Syst. 2023, 24, 8776–8787. [Google Scholar] [CrossRef]
- Lee, H.; Ho, H.W.; Zhou, Y. Deep Learning-based monocular obstacle avoidance for unmanned aerial vehicle navigation in tree plantations: Faster region-based convolutional neural network approach. J. Intell. Robot. Syst. 2021, 101, 1–18. [Google Scholar] [CrossRef]
- Müller, C.J.; van Daalen, C.E. Map point selection for visual SLAM. Robot. Auton. Syst. 2023, 167, 104485. [Google Scholar] [CrossRef]
- Shi, H.; Shi, L.; Xu, M.; Hwang, K.S. End-to-end navigation strategy with deep reinforcement learning for mobile robots. IEEE Trans. Ind. Inform. 2019, 16, 2393–2402. [Google Scholar] [CrossRef]
- Temeltas, H.; Kayak, D. SLAM for robot navigation. IEEE Aerosp. Electron. Syst. Mag. 2008, 23, 16–19. [Google Scholar] [CrossRef]
- Quan, H.; Li, Y.; Zhang, Y. A novel mobile robot navigation method based on deep reinforcement learning. Int. J. Adv. Robot. Syst. 2020, 17, 1729881420921672. [Google Scholar] [CrossRef]
- Kaufmann, E.; Bauersfeld, L.; Loquercio, A.; Müller, M.; Koltun, V.; Scaramuzza, D. Champion-level drone racing using deep reinforcement learning. Nature 2023, 620, 982–987. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Peng, X.; Chen, R.; Zhang, J.; Chen, B.; Tseng, H.W.; Wu, T.L.; Meen, T.H. Enhanced Autonomous Navigation of Robots by Deep Reinforcement Learning Algorithm with Multistep Method. Sensor Mater. 2021, 33, 825–842. [Google Scholar] [CrossRef]
- Kaelbling, L.P.; Littman, M.L.; Moore, A.W. Reinforcement learning: A survey. J. Artif. Intell. Res. 1996, 4, 237–285. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Zhou, Y.; Ho, H.W. Online robot guidance and navigation in non-stationary environment with hybrid Hierarchical Reinforcement Learning. Eng. Appl. Artif. Intell. 2022, 114, 105152. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Yuan, Y.; Yu, Z.L.; Gu, Z.; Yeboah, Y.; Wei, W.; Deng, X.; Li, J.; Li, Y. A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning. Knowl.-Based Syst. 2019, 175, 107–117. [Google Scholar] [CrossRef]
- Choi, J.; Park, K.; Kim, M.; Seok, S. Deep Reinforcement Learning of Navigation in a Complex and Crowded Environment with a Limited Field of View. In Proceedings of the 2019 IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 5993–6000. [Google Scholar] [CrossRef]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
- Levine, S.; Finn, C.; Darrell, T.; Abbeel, P. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 2016, 17, 1334–1373. [Google Scholar]
- Zhu, W.; Hayashibe, M. A Hierarchical Deep Reinforcement Learning Framework With High Efficiency and Generalization for Fast and Safe Navigation. IEEE Trans. Ind. Electron. 2023, 70, 4962–4971. [Google Scholar] [CrossRef]
- Hessel, M.; Modayil, J.; Van Hasselt, H.; Schaul, T.; Ostrovski, G.; Dabney, W.; Horgan, D.; Piot, B.; Azar, M.; Silver, D. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2015, arXiv:1511.05952. [Google Scholar]
- Li, T.; Yang, D.; Xie, X. Prioritized experience replay based reinforcement learning for adaptive tracking control of autonomous underwater vehicle. Appl. Math. Comput. 2023, 443, 127734. [Google Scholar] [CrossRef]
- Fortunato, M.; Azar, M.G.; Piot, B.; Menick, J.; Osband, I.; Graves, A.; Mnih, V.; Munos, R.; Hassabis, D.; Pietquin, O.; et al. Noisy networks for exploration. arXiv 2017, arXiv:1706.10295. [Google Scholar]
- Hernandez-Garcia, J.F.; Sutton, R.S. Understanding multi-step deep reinforcement learning: A systematic study of the DQN target. arXiv 2019, arXiv:1901.07510. [Google Scholar]
- Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.; Lanctot, M.; Freitas, N. Dueling network architectures for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning (PMLR), New York, NY, USA, 20–22 June 2016; pp. 1995–2003. [Google Scholar]
- Gök, M. Dynamic path planning via Dueling Double Deep Q-Network (D3QN) with prioritized experience replay. Appl. Soft Comput. 2024, 158, 111503. [Google Scholar] [CrossRef]
- Motlagh, O.; Nakhaeinia, D.; Tang, S.H.; Karasfi, B.; Khaksar, W. Automatic navigation of mobile robots in unknown environments. Neural Comput. Appl. 2014, 24, 1569–1581. [Google Scholar] [CrossRef]
- Li, H.; Qin, J.; Liu, Q.; Yan, C. An Efficient Deep Reinforcement Learning Algorithm for Mapless Navigation with Gap-Guided Switching Strategy. J. Intell. Robot. Syst. 2023, 108, 43. [Google Scholar] [CrossRef]
- Chen, Y.F.; Everett, M.; Liu, M.; How, J.P. Socially aware motion planning with deep reinforcement learning. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 1343–1350. [Google Scholar]
- Din, A.; Ismail, M.Y.; Shah, B.; Babar, M.; Ali, F.; Baig, S.U. A deep reinforcement learning-based multi-agent area coverage control for smart agriculture. Comput. Electr. Eng. 2022, 101, 108089. [Google Scholar] [CrossRef]
- Singh, S.; Jaakkola, T.; Littman, M.L.; Szepesvári, C. Convergence results for single-step on-policy reinforcement-learning algorithms. Mach. Learn. 2000, 38, 287–308. [Google Scholar] [CrossRef]
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Peng, B.; Sun, Q.; Li, S.E.; Kum, D.; Yin, Y.; Wei, J.; Gu, T. End-to-end autonomous driving through dueling double deep Q-network. Automot. Innov. 2021, 4, 328–337. [Google Scholar] [CrossRef]
- Cao, X.; Wan, H.; Lin, Y.; Han, S. High-value prioritized experience replay for off-policy reinforcement learning. In Proceedings of the 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), Portland, OR, USA, 4–6 November 2019; pp. 1510–1514. [Google Scholar]
- Neal, R.M. Annealed importance sampling. Stat. Comput. 2001, 11, 125–139. [Google Scholar] [CrossRef]
- Horgan, D.; Quan, J.; Budden, D.; Barth-Maron, G.; Hessel, M.; Van Hasselt, H.; Silver, D. Distributed prioritized experience replay. arXiv 2018, arXiv:1803.00933. [Google Scholar]
- Kobayashi, T.; Ilboudo, W.E.L. T-soft update of target network for deep reinforcement learning. Neural Netw. 2021, 136, 63–71. [Google Scholar] [CrossRef]
- Xue, X.; Li, Z.; Zhang, D.; Yan, Y. A deep reinforcement learning method for mobile robot collision avoidance based on double dqn. In Proceedings of the 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), Vancouver, BC, Canada, 12–14 June 2019; pp. 2131–2136. [Google Scholar]
- Ladosz, P.; Weng, L.; Kim, M.; Oh, H. Exploration in deep reinforcement learning: A survey. Inf. Fusion 2022, 85, 1–22. [Google Scholar] [CrossRef]
- Wang, D.; Deng, H.; Pan, Z. Mrcdrl: Multi-robot coordination with deep reinforcement learning. Neurocomputing 2020, 406, 68–76. [Google Scholar] [CrossRef]
- Jaradat, M.A.K.; Al-Rousan, M.; Quadan, L. Reinforcement based mobile robot navigation in dynamic environment. Robot. Comput.-Integr. Manuf. 2011, 27, 135–149. [Google Scholar] [CrossRef]
- Marchesini, E.; Farinelli, A. Discrete deep reinforcement learning for mapless navigation. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 10688–10694. [Google Scholar]
- Tai, L.; Paolo, G.; Liu, M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 31–36. [Google Scholar]
- Li, W.; Yue, M.; Shangguan, J.; Jin, Y. Navigation of Mobile Robots Based on Deep Reinforcement Learning: Reward Function Optimization and Knowledge Transfer. Int. J. Control. Autom. Syst. 2023, 21, 563–574. [Google Scholar] [CrossRef]
- Liu, L.; Dugas, D.; Cesari, G.; Siegwart, R.; Dubé, R. Robot Navigation in Crowded Environments Using Deep Reinforcement Learning. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 5671–5677. [Google Scholar] [CrossRef]
- Ng, A.Y.; Harada, D.; Russell, S. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the ICML, Bled, Slovenia, 27–30 June 1999; Volume 99, pp. 278–287. [Google Scholar]
- Dong, Y.; Tang, X.; Yuan, Y. Principled reward shaping for reinforcement learning via lyapunov stability theory. Neurocomputing 2020, 393, 83–90. [Google Scholar] [CrossRef]
- Koubâa, A. Robot Operating System (ROS); Springer: Cham, Switzerland, 2017; Volume 1. [Google Scholar]
- Tai, L.; Li, S.; Liu, M. A deep-network solution towards model-less obstacle avoidance. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; pp. 2759–2764. [Google Scholar]
- Koenig, N.; Howard, A. Design and use paradigms for gazebo, an open-source multi-robot simulator. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sendai, Japan, 28 September–2 October 2004; Volume 3, pp. 2149–2154. [Google Scholar]
Parameters | Values | Remark |
---|---|---|
0.99 | Discount factor | |
M | 200,000 | Size of experience replay memory |
D | 200,000 | Size of SumTree in PER memory |
BATCH_SIZE | 64 | Size of sampled batch |
0.6 | Extent of prioritization | |
0.4 | Extent to correct bias of prioritization sampling | |
0.005 | Update magnitude of target network | |
0.001 | Learning rate | |
n | 5 | n-step in TD method |
0.13 m | Threshold of collision | |
0.2 m | Threshold of reaching goal | |
0.15 m/s | Linear velocity of mobile robot | |
0.75 rad/s | Rotational speed of mobile robot |
Algorithm | Indicator | Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 | Trial 6 | Trial 7 | Trial 8 | Trial 9 | Trial 10 | Average | SD |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DQN | SR | 58.41% | 67.05% | 62.97% | 69.72% | 61.12% | 53.54% | 55.75% | 67.55% | 70.60% | 60.74% | 62.74% | 5.88% |
AS | 678.66 | 1040.13 | 771.10 | 1223.60 | 709.68 | 475.56 | 509.12 | 1014.16 | 1055.19 | 804.62 | 828.18 | 248.05 | |
DDQN | SR | 63.69% | 69.07% | 65.22% | 63.95% | 68.93% | 63.75% | 65.25% | 61.03% | 53.73% | 64.59% | 63.92% | 4.32% |
AS | 1002.67 | 1290.22 | 1072.25 | 1080.94 | 1340.69 | 1006.20 | 1097.82 | 909.00 | 619.17 | 1076.14 | 1049.51 | 198.96 | |
PER-n2D3QN | SR | 98.95% | 99.05% | 99.02% | 99.12% | 98.26% | 99.35% | 98.34% | 98.98% | 99.06% | 98.99% | 98.91% | 3.41% |
AS | 3394.45 | 3720.51 | 3503.49 | 3466.41 | 2981.46 | 3449.78 | 3036.66 | 3327.22 | 3415.28 | 3553.57 | 3384.88 | 224.69 |
Algorithm | Indicator | Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 | Trial 6 | Trial 7 | Trial 8 | Trial 9 | Trial 10 | Average | SD |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DQN | SR | 1.46% | 5.79% | 4.28% | 3.73% | 1.05% | 2.93% | 0.74% | 0.14% | 1.73% | 7.88% | 2.97% | 2.47% |
AS | 126.14 | −468.26 | −39.58 | −378.76 | −295.53 | −354.15 | −434.54 | −357.35 | −511.33 | −382.24 | −309.56 | 199.66 | |
DDQN | SR | 34.72% | 9.05% | 4.16% | 9.04% | 11.53% | 25.60% | 7.15% | 10.95% | 6.77% | 31.85% | 15.08% | 11.21% |
AS | 335.99 | −336.34 | −439.81 | −243.87 | −339.58 | −134.19 | −346.52 | −308.70 | −374.07 | 97.01 | −209.01 | 244.97 | |
PER-nD3QN | SR | 98.39% | 98.78% | 97.45% | 98.51% | 98.21% | 98.31% | 97.96% | 98.41% | 98.48% | 98.69% | 98.32% | 3.83% |
AS | 2846.56 | 3060.75 | 2347.53 | 3108.40 | 2751.96 | 2946.01 | 2700.06 | 2861.52 | 2831.30 | 3168.20 | 2862.23 | 236.93 |
Algorithm | Indicator | Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 | Trial 6 | Trial 7 | Trial 8 | Trial 9 | Trial 10 | Average | SD |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DQN | SR | 2.52% | 1.18% | 2.11% | 2.98% | 0.22% | 1.53% | 0.56% | 1.24% | 2.07% | 1.49% | 1.59% | 0.85% |
AS | −307.24 | −492.05 | −428.13 | −311.97 | −370.97 | 62.36 | −516.04 | −315.77 | −314.34 | −248.65 | −324.28 | 161.14 | |
DDQN | SR | 2.64% | 3.83% | 15.92% | 10.92% | 8.21% | 13.86% | 6.47% | 6.46% | 13.96% | 7.20% | 8.95% | 4.52% |
AS | −441.57 | −317.00 | −268.187 | −164.32 | −376.43 | −182.39 | −260.03 | −205.50 | −294.43 | −146.51 | −265.63 | 95.30 | |
PER-nD3QN | SR | 91.27% | 91.67% | 94.20% | 93.50% | 92.13% | 91.76% | 92.49% | 93.36% | 92.89% | 93.07% | 92.63% | 0.93% |
AS | 2434.01 | 2489.42 | 2769.54 | 2695.33 | 2491.18 | 2571.50 | 2637.07 | 2541.79 | 2618.10 | 2464.87 | 2571.28 | 108.20 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, W.; Zhou, Y.; Ho, H.W. Mobile Robot Navigation Based on Noisy N-Step Dueling Double Deep Q-Network and Prioritized Experience Replay. Electronics 2024, 13, 2423. https://doi.org/10.3390/electronics13122423
Hu W, Zhou Y, Ho HW. Mobile Robot Navigation Based on Noisy N-Step Dueling Double Deep Q-Network and Prioritized Experience Replay. Electronics. 2024; 13(12):2423. https://doi.org/10.3390/electronics13122423
Chicago/Turabian StyleHu, Wenjie, Ye Zhou, and Hann Woei Ho. 2024. "Mobile Robot Navigation Based on Noisy N-Step Dueling Double Deep Q-Network and Prioritized Experience Replay" Electronics 13, no. 12: 2423. https://doi.org/10.3390/electronics13122423
APA StyleHu, W., Zhou, Y., & Ho, H. W. (2024). Mobile Robot Navigation Based on Noisy N-Step Dueling Double Deep Q-Network and Prioritized Experience Replay. Electronics, 13(12), 2423. https://doi.org/10.3390/electronics13122423