Navigation and Obstacle Avoidance for USV in Autonomous Buoy Inspection: A Deep Reinforcement Learning Approach
Abstract
:1. Introduction
- An angular deviation weighting mechanism is introduced, enabling a circular navigation algorithm for USVs that demonstrates robust adaptability across different circumnavigation radii, even when trained on a specific radius.
- A novel generalized radar image encoding technique is developed, facilitating an end-to-end USV collision avoidance solution that relies solely on radar sensors for obstacle scenario recognition.
- Inspired by the concept of hierarchical reinforcement learning [21], a decoupled architecture is proposed, separating navigation and obstacle avoidance for USVs. The effectiveness of this architecture is validated in autonomous multi-buoy inspection tasks.
2. Background
2.1. Kinematic Model of USV
2.2. Reinforcement Learning
2.3. Proximal Policy Optimization
3. Methodology
3.1. Problem Formulation
- Goal-directed navigation: Guides the USV from a start point to a goal point.
- Circular navigation: Enables smooth transition into orbit radius and maintains stable circular motion around the target buoy.
- Obstacle avoidance: Detects and evades moving obstacles during navigation, complying with COLREGs.
3.2. Design of Navigation Modules
3.2.1. Environments
- If the USV’s distance from the buoy exceeds the trajectory deviation tolerance limit , the accumulated circumnavigation angle is reset to zero.
- If the USV remains within the tolerance range, is continuously recorded and incremented.
3.2.2. Action Space
3.2.3. State Spaces
3.2.4. Reward Functions
- When far outside the orbit, should approach 0.
- When far inside the orbit, should approach .
- Approaching the orbital boundary, smoothly transitions to or .
3.2.5. Deep Neural Networks
3.3. Design of Obstacle Avoidance Module
3.3.1. Environment
3.3.2. Action Space
3.3.3. State Space
3.3.4. Reward Function
3.3.5. Deep Neural Network
3.4. Design of Decision and Encoding Unit
4. Training and Validation
4.1. Training Results
4.2. Validation Results
4.2.1. Validations for Goal-Directed Navigation
4.2.2. Validations for Circular Navigation
4.2.3. Validations for Obstacle Avoidance
4.2.4. Validation for Multi-Buoy Inspection
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Xu, G.; Shi, Y.; Sun, X.; Shen, W. Internet of Things in Marine Environment Monitoring: A Review. Sensors 2019, 19, 1711. [Google Scholar] [CrossRef] [PubMed]
- Otero, P.; Hernández-Romero, Á.; Luque-Nieto, M.; Ariza, A. Underwater Positioning System Based on Drifting Buoys and Acoustic Modems. J. Mar. Sci. Eng. 2023, 11, 682. [Google Scholar] [CrossRef]
- Wang, J.; Wang, Z.; Wang, Y.; Liu, S.; Li, Y. Current situation and trend of marine data buoy and monitoring network technology of China. Acta Oceanol. Sin. 2016, 35, 1–10. [Google Scholar] [CrossRef]
- Lu, Z.; Li, W.; Zhang, X.; Wang, J.; Zhuang, Z.; Liu, C. Design and Testing of an Autonomous Navigation Unmanned Surface Vehicle for Buoy Inspection. J. Mar. Sci. Eng. 2024, 12, 819. [Google Scholar] [CrossRef]
- Fossen, T.I.; Breivik, M.; Skjetne, R. Line-of-sight path following of underactuated marine craft. IFAC Proc. Vol. 2003, 36, 211–216. [Google Scholar] [CrossRef]
- Moe, S.; Pettersen, K.Y.; Fossen, T.I.; Gravdahl, J.T. Line-of-sight curved path following for underactuated USVs and AUVs in the horizontal plane under the influence of ocean currents. In Proceedings of the 2016 24th Mediterranean Conference on Control and Automation (MED), Athens, Greece, 21–24 June 2016. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Adaptive computation and machine learning series. In Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Deraj, R.; Kumar, R.S.; Alam, S.; Somayajula, A. Deep reinforcement learning based controller for ship navigation. Ocean Eng. 2023, 273, 113937. [Google Scholar] [CrossRef]
- Sivaraj, S.; Rajendran, S.; Prasad, L.P. Data driven control based on Deep Q-Network algorithm for heading control and path following of a ship in calm water and waves. Ocean Eng. 2022, 259, 111802. [Google Scholar] [CrossRef]
- Zhao, Y.; Qi, X.; Ma, Y.; Li, Z.; Malekian, R.; Sotelo, M.A. Path Following Optimization for an Underactuated USV Using Smoothly-Convergent Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2020, 22, 6208–6220. [Google Scholar] [CrossRef]
- Singh, Y.; Sharma, S.; Sutton, R.; Hatton, D.; Khan, A. A constrained A* approach towards optimal path planning for an unmanned surface vehicle in a maritime environment containing dynamic obstacles and ocean currents. Ocean Eng. 2018, 169, 187–201. [Google Scholar] [CrossRef]
- Hashali, S.D.; Yang, S.; Xiang, X. Route Planning Algorithms for Unmanned Surface Vehicles (USVs): A Comprehensive Analysis. J. Mar. Sci. Eng. 2024, 12, 382. [Google Scholar] [CrossRef]
- Yan, X.; Jiang, D.; Miao, R.; Li, Y. Formation Control and Obstacle Avoidance Algorithm of a Multi-USV System Based on Virtual Structure and Artificial Potential Field. J. Mar. Sci. Eng. 2021, 9, 161. [Google Scholar] [CrossRef]
- Zhang, W.; Wei, S.; Teng, Y.; Zhang, J.; Wang, X.; Yan, Z. Dynamic Obstacle Avoidance for Unmanned Underwater Vehicles Based on an Improved Velocity Obstacle Method. Sensors 2017, 17, 2742. [Google Scholar] [CrossRef]
- Jo, H.-J.; Kim, S.-R.; Kim, J.-H.; Park, J.-Y. Comparison of Velocity Obstacle and Artificial Potential Field Methods for Collision Avoidance in Swarm Operation of Unmanned Surface Vehicles. J. Mar. Sci. Eng. 2022, 10, 2036. [Google Scholar] [CrossRef]
- Yuan, X.; Tong, C.; He, G.; Wang, H. Unmanned Vessel Collision Avoidance Algorithm by Dynamic Window Approach Based on COLREGs Considering the Effects of the Wind and Wave. J. Mar. Sci. Eng. 2023, 11, 1831. [Google Scholar] [CrossRef]
- Xia, J.; Zhu, X.; Liu, Z.; Luo, Y.; Wu, Z.; Wu, Q. Research on Collision Avoidance Algorithm of Unmanned Surface Vehicle Based on Deep Reinforcement Learning. IEEE Sens. J. 2022, 23, 11262–11273. [Google Scholar] [CrossRef]
- Xu, X.; Cai, P.; Ahmed, Z.; Yellapu, V.S.; Zhang, W. Path planning and dynamic collision avoidance algorithm under COLREGs via deep reinforcement learning. Neurocomputing 2022, 468, 181–197. [Google Scholar] [CrossRef]
- Xu, X.; Lu, Y.; Liu, X.; Zhang, W. Intelligent collision avoidance algorithms for USVs via deep reinforcement learning under COLREGs. Ocean Eng. 2020, 217, 107704. [Google Scholar] [CrossRef]
- Zhang, S.; Li, Y.; Dong, Q. Autonomous navigation of UAV in multi-obstacle environments based on a Deep Reinforcement Learning approach. Appl. Soft Comput. 2022, 115, 108194. [Google Scholar] [CrossRef]
- Hutsebaut-Buysse, M.; Mets, K.; Latré, S. Hierarchical Reinforcement Learning: A Survey and Open Research Challenges. Mach. Learn. Knowl. Extr. 2022, 4, 172–221. [Google Scholar] [CrossRef]
- Mccue, L. Handbook of Marine Craft Hydrodynamics and Motion Control. J. IEEE Control Methods 2016, 36, 78–79. [Google Scholar] [CrossRef]
- Sutton, R.S.; McAllester, D.; Singh, S.; Mansour, Y. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Advances in Neural Information Processing Systems; Solla, S., Leen, T., Müller, K., Eds.; MIT Press: Cambridge, MA, USA, 1999; Available online: https://proceedings.neurips.cc/paper_files/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf (accessed on 27 March 2025).
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Schulman, J.; Moritz, P.; Levine, S.; Jordan, M.; Abbeel, P. High-Dimensional Continuous Control Using Generalized Advantage Estimation. arXiv 2018, arXiv:1506.02438. [Google Scholar] [CrossRef]
- Van Moffaert, K.; Drugan, M.M.; Nowe, A. Scalarized multi-objective reinforcement learning: Novel design techniques. In Proceedings of the 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), Singapore, 16–19 April 2013; pp. 191–199. [Google Scholar] [CrossRef]
- Ecoffet, A.; Huizinga, J.; Lehman, J.; Stanley, K.O.; Clune, J. First return, then explore. Nature 2021, 590, 580–586. [Google Scholar] [CrossRef] [PubMed]
- Han, R.; Chen, S.; Wang, S.; Zhang, Z.; Gao, R.; Hao, Q.; Pan, J. Reinforcement Learned Distributed Multi-Robot Navigation With Reciprocal Velocity Obstacle Shaped Rewards. IEEE Robot. Autom. Lett. 2022, 7, 5896–5903. [Google Scholar] [CrossRef]
- Riedmiller, M.; Hafner, R.; Lampe, T.; Neunert, M.; Degrave, J.; Van de Wiele, T.; Mnih, V.; Heess, N.; Springenberg, J.T. Learning by Playing-Solving Sparse Reward Tasks from Scratch. arXiv 2018, arXiv:1802.10567. [Google Scholar] [CrossRef]
- Everitt, T.; Krakovna, V.; Orseau, L.; Hutter, M.; Legg, S. Reinforcement Learning with a Corrupted Reward Channel. arXiv 2017, arXiv:1705.08417. [Google Scholar] [CrossRef]
- Packer, C.; Gao, K.; Kos, J.; Krähenbühl, P.; Koltun, V.; Song, D. Assessing Generalization in Deep Reinforcement Learning. arXiv 2019, arXiv:1810.12282. [Google Scholar] [CrossRef]
- Pateriya, N.; Jain, P.; Niveditha, K.P.; Tiwari, V.; Vishwakarma, S. Deep Residual Networks for Image Recognition. Int. J. Innov. Res. Comput. Commun. Eng. 2023, 11, 10742–10747. [Google Scholar] [CrossRef]
Parameter | Value |
---|---|
0.5 m | |
5.0 m | |
0.5 m | |
0.01 rad | |
0.8 m | |
360 | |
10.0 m | |
0.75 m | |
1.0 m | |
0.8 m |
Hyperparameter | Value * |
---|---|
Training episodes | 2000/8000 |
Experience buffer size | 2048/1024 |
Max timesteps | 500/100 |
Update epochs | 10/10 |
Learning rate | 2 × 10−4/2 × 10−4 |
0.99/0.99 | |
0.95/0.95 | |
0.2/0.1 |
Parameter | Value |
---|---|
Lookahead distance | 10 m |
Number of Waypoints | 24 |
Path Following Tolerance | 1.0 m |
3.0 | |
1.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, J.; Lu, Z.; Hong, X.; Wu, Z.; Li, W. Navigation and Obstacle Avoidance for USV in Autonomous Buoy Inspection: A Deep Reinforcement Learning Approach. J. Mar. Sci. Eng. 2025, 13, 843. https://doi.org/10.3390/jmse13050843
Wang J, Lu Z, Hong X, Wu Z, Li W. Navigation and Obstacle Avoidance for USV in Autonomous Buoy Inspection: A Deep Reinforcement Learning Approach. Journal of Marine Science and Engineering. 2025; 13(5):843. https://doi.org/10.3390/jmse13050843
Chicago/Turabian StyleWang, Jianhui, Zhiqiang Lu, Xunjie Hong, Zeye Wu, and Weihua Li. 2025. "Navigation and Obstacle Avoidance for USV in Autonomous Buoy Inspection: A Deep Reinforcement Learning Approach" Journal of Marine Science and Engineering 13, no. 5: 843. https://doi.org/10.3390/jmse13050843
APA StyleWang, J., Lu, Z., Hong, X., Wu, Z., & Li, W. (2025). Navigation and Obstacle Avoidance for USV in Autonomous Buoy Inspection: A Deep Reinforcement Learning Approach. Journal of Marine Science and Engineering, 13(5), 843. https://doi.org/10.3390/jmse13050843