Collision Avoidance of Multi-UUV Systems Based on Deep Reinforcement Learning in Complex Marine Environments
Abstract
1. Introduction
- To the best of our knowledge, this is the first study to propose a multi-agent deep reinforcement learning framework for multi-UUV systems that develops collision avoidance policies in marine environments with complex non-convex obstacles through onboard sensor information while satisfying cooperative constraints.
- A novel multi-agent dynamic encoder is proposed, based on an efficient self-attention mechanism, to effectively handle observations from an arbitrary number of neighboring agents without requiring additional training. It also significantly reduces computational complexity compared to traditional attention mechanisms.
- The policy trained in simulation successfully transfers to the real-world environment without requiring additional training. Experimental results demonstrate that our method significantly outperforms typical collision avoidance methods, exhibiting strong generalizability and robustness.
2. Problem Formulation
3. Multi-Agent Dynamic Encoder
4. Reinforcement Learning for Collision Avoidance in Multi-UUV Systems
4.1. State Representation
4.2. Reward Design
4.3. Action Design
4.4. Curriculum Learning
5. Results
5.1. Training Configuration and Computational Complexity
5.2. Performance Metrics
- Success Rate: The ratio of the number of times the UUV successfully reaches the goal without collisions within the given time steps to the total number of test cases.
- Collision Rate: The ratio of the number of collisions during testing to the total number of test cases.
- Timeout Rate: The ratio of the number of times the UUV fails to reach the goal without collision within the specified time limit.
- Extra Distance: The difference between the UUV’s average travel trajectory length and the lower bound of the UUV’s travel distance (i.e., the average traveled distance for the UUV following the shortest path towards the goal).
5.3. Simulation Experiments
5.3.1. Multi-UUV Collision Avoidance Experiment in Unknown Complex Obstacle Environments
5.3.2. Collision Avoidance Within Multi-UUV Systems
5.3.3. Communication Distance Constraint Experiments
5.4. Lake Experiments
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wynn, R.B.; Huvenne, V.A.; Le Bas, T.P.; Murton, B.J.; Connelly, D.P.; Bett, B.J.; Ruhl, H.A.; Morris, K.J.; Peakall, J.; Parsons, D.R.; et al. Autonomous Underwater Vehicles (AUVs): Their past, present and future contributions to the advancement of marine geoscience. Mar. Geol. 2014, 352, 451–468. [Google Scholar] [CrossRef]
- Liu, Y.; Li, J.; Guo, W.; Ngo, H.H.; Hu, J.; Gao, M.T. Use of magnetic powder to effectively improve the performance of sequencing batch reactors (SBRs) in municipal wastewater treatment. Bioresour. Technol. 2018, 248, 135–139. [Google Scholar] [CrossRef] [PubMed]
- Cheng, C.; Sha, Q.; He, B.; Li, G. Path planning and obstacle avoidance for AUV: A review. Ocean Eng. 2021, 235, 109355. [Google Scholar] [CrossRef]
- Yan, Z.; Zhao, L.; Wang, Y.; Zhang, M.; Yang, H.; Zhang, C. Path Planning of AUV for Obstacle Avoidance with Improved Artificial Potential Field. In Proceedings of the IECON 2023-49th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 16–19 October 2023; pp. 1–5. [Google Scholar]
- Shokouhi, S.; Mu, B.; Thein, M.-W. Optimized Path Planning and Control for Autonomous Surface Vehicles using B-Splines and Nonlinear Model Predictive Control. In Proceedings of the OCEANS 2023—MTS/IEEE U.S. Gulf Coast, Biloxi, MS, USA, 25–28 September 2023; pp. 1–9. [Google Scholar]
- Chen, T.; Zhang, Z.; Fang, Z.; Jiang, D.; Li, G. Imitation learning from imperfect demonstrations for AUV path tracking and obstacle avoidance. Ocean Eng. 2024, 298, 117287. [Google Scholar] [CrossRef]
- Fan, X.; Guo, Y.; Liu, H.; Wei, B.; Lyu, W. Improved artificial potential field method applied for AUV path planning. Math. Probl. Eng. 2020, 2020, 6523158. [Google Scholar] [CrossRef]
- Taheri, E.; Ferdowsi, M.H.; Danesh, M. Closed-loop randomized kinodynamic path planning for an autonomous underwater vehicle. Appl. Ocean Res. 2019, 83, 48–64. [Google Scholar] [CrossRef]
- Alonso-Mora, J.; Breitenmoser, A.; Rufli, M.; Beardsley, P.; Siegwart, R. Optimal reciprocal collision avoidance for multiple non-holonomic robots. In Proceedings of the Distributed Autonomous Robotic Systems: The 10th International Symposium; Springer: Berlin/Heidelberg, Germany, 2013; pp. 203–216. [Google Scholar]
- Carlucho, I.; De Paula, M.; Wang, S.; Menna, B.V.; Petillot, Y.R.; Acosta, G.G. AUV position tracking control using end-to-end deep reinforcement learning. In Proceedings of the OCEANS 2018 MTS/IEEE Charleston, Charleston, SC, USA, 22–25 October 2018; pp. 1–8. [Google Scholar]
- Yang, J.; Ni, J.; Xi, M.; Wen, J.; Li, Y. Intelligent path planning of underwater robot based on reinforcement learning. IEEE Trans. Autom. Sci. Eng. 2022, 20, 1983–1996. [Google Scholar] [CrossRef]
- Saravanan, M.; Kumar, P.S.; Dey, K.; Gaddamidi, S.; Kumar, A.R. Exploring spiking neural networks in single and multi-agent rl methods. In Proceedings of the 2021 International Conference on Rebooting Computing (ICRC), Los Alamitos, CA, USA, 30 November–2 December 2021; pp. 88–98. [Google Scholar]
- Huang, S.; Zhang, H.; Huang, Z. CoDe: A Cooperative and Decentralized Collision Avoidance Algorithm for Small-Scale UAV Swarms Considering Energy Efficiency. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, 14–18 October 2024; pp. 13152–13159. [Google Scholar]
- Liu, H.; Shen, Y.; Zhou, C.; Zou, Y.; Gao, Z.; Wang, Q. TD3 based collision free motion planning for robot navigation. In Proceedings of the 2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE), Guangzhou, China, 10–12 May 2024; pp. 247–250. [Google Scholar]
- Everett, M.; Chen, Y.F.; How, J.P. Motion planning among dynamic, decision-making agents with deep reinforcement learning. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 3052–3059. [Google Scholar]
- Long, P.; Fan, T.; Liao, X.; Liu, W.; Zhang, H.; Pan, J. Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 6252–6259. [Google Scholar]
- Du, Y.; Zhang, J.; Xu, J.; Cheng, X.; Cui, S. Global map assisted multi-agent collision avoidance via deep reinforcement learning around complex obstacles. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 298–305. [Google Scholar]
- Chen, C.; Liu, Y.; Kreiss, S.; Alahi, A. Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 6015–6022. [Google Scholar]
- Wu, J.; Wang, Y.; Asama, H.; An, Q.; Yamashita, A. Risk-Sensitive Mobile Robot Navigation in Crowded Environment via Offline Reinforcement Learning. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 7456–7462. [Google Scholar]
- Zhang, C.; Cheng, P.; Du, B.; Dong, B.; Zhang, W. AUV path tracking with real-time obstacle avoidance via reinforcement learning under adaptive constraints. Ocean Eng. 2022, 256, 111453. [Google Scholar] [CrossRef]
- Xu, J.; Huang, F.; Wu, D.; Cui, Y.; Yan, Z.; Du, X. A learning method for AUV collision avoidance through deep reinforcement learning. Ocean Eng. 2022, 260, 112038. [Google Scholar] [CrossRef]
- Li, X.; Yu, S. Obstacle avoidance path planning for AUVs in a three-dimensional unknown environment based on the C-APF-TD3 algorithm. Ocean Eng. 2025, 315, 119886. [Google Scholar] [CrossRef]
- Wang, P.; Liu, R.; Tian, X.; Zhang, X.; Qiao, L.; Wang, Y. Obstacle avoidance for environmentally-driven USVs based on deep reinforcement learning in large-scale uncertain environments. Ocean Eng. 2023, 270, 113670. [Google Scholar] [CrossRef]
- Hadi, B.; Khosravi, A.; Sarhadi, P. Adaptive formation motion planning and control of autonomous underwater vehicles using deep reinforcement learning. IEEE J. Ocean. Eng. 2023, 49, 311–328. [Google Scholar] [CrossRef]
- Hadi, B.; Khosravi, A.; Sarhadi, P. Hybrid Motion Planning and Formation Control of Multi-AUV Systems Based on DRL. In Proceedings of the 2024 American Control Conference (ACC), Toronto, ON, Canada, 10–12 July 2024; pp. 2368–2373. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Zhang, C.; Yip, K.W.; Yang, B.; Zhang, Z.; Yuan, M.; Yan, R.; Tang, H. CASRL: Collision Avoidance with Spiking Reinforcement Learning Among Dynamic, Decision-Making Agents. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, 14–18 October 2024; pp. 8031–8038. [Google Scholar]
- Shaker, A.; Maaz, M.; Rasheed, H.; Khan, S.; Yang, M.H.; Khan, F.S. Swiftformer: Efficient additive attention for transformer-based real-time mobile vision applications. In Proceedings of the IEEE/CVF International Conference on Computer Cision, Paris, France, 1–6 October 2023; pp. 17425–17436. [Google Scholar]
- Manhães, M.M.M.; Scherer, S.A.; Voss, M.; Douat, L.R.; Rauschenbach, T. UUV simulator: A gazebo-based package for underwater intervention and multi-robot simulation. In Proceedings of the Oceans 2016 MTS/IEEE Monterey, Monterey, CA, USA, 19–23 September 2016; pp. 1–8. [Google Scholar]
- Zhang, Y.; Wang, Q.; Shen, Y.; Dai, N.; He, B. Multi-AUV cooperative control and autonomous obstacle avoidance study. Ocean Eng. 2024, 304, 117634. [Google Scholar] [CrossRef]
- Guo, K.; Wang, D.; Fan, T.; Pan, J. VR-ORCA: Variable responsibility optimal reciprocal collision avoidance. IEEE Robot. Autom. Lett. 2021, 6, 4520–4527. [Google Scholar] [CrossRef]
Hyperparameter | Value |
---|---|
Optimizer | Adam |
Learning Rate | |
Batch Size | 4096 |
Discount Factor () | 0.99 |
Clip Parameter | 0.2 |
Entropy Coefficient | 0.005 |
Learning Epochs | 5 |
Mini Batches | 4 |
Value Loss Coefficient | 1 |
Metric | Method | Case 0 | Case 1 | Case 2 | Case 3 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UUV0 | UUV1 | UUV2 | UUV0 | UUV1 | UUV2 | UUV0 | UUV1 | UUV2 | UUV0 | UUV1 | UUV2 | ||
Success rate ↑ | VR-ORCA | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.470 | 0.434 | 0.679 | 1.000 | 1.000 | 1.000 |
IAPF | 0.710 | 0.000 | 0.830 | 0.510 | 0.438 | 0.630 | 0.590 | 0.770 | 0.700 | 0.400 | 0.350 | 0.210 | |
Ours | 0.730 | 0.810 | 0.842 | 0.690 | 0.865 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
Crash rate ↓ | VR-ORCA | 0.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.000 | 0.520 | 0.400 | 0.321 | 0.000 | 0.000 | 0.000 |
IAPF | 0.230 | 0.000 | 0.023 | 0.450 | 0.562 | 0.340 | 0.400 | 0.190 | 0.300 | 0.580 | 0.640 | 0.710 | |
Ours | 0.310 | 0.190 | 0.158 | 0.310 | 0.135 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
Timeout rate ↓ | VR-ORCA | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.010 | 0.160 | 0.000 | 0.000 | 0.000 | 0.000 |
IAPF | 0.060 | 1.000 | 0.147 | 0.040 | 0.000 | 0.030 | 0.010 | 0.050 | 0.000 | 0.020 | 0.010 | 0.070 | |
Ours | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
Extra distance ↓ | VR-ORCA | 12.527 | – | – | – | – | 23.706 | 18.130 | 97.315 | 31.023 | 19.437 | 7.193 | 8.264 |
IAPF | 30.252 | – | 137.398 | 78.921 | 36.047 | 90.304 | 69.637 | 10.610 | 70.056 | 39.457 | 44.721 | 22.633 | |
Ours | 11.846 | 59.326 | 97.914 | 43.363 | 1.552 | 41.622 | 12.145 | 1.270 | 36.135 | 16.551 | 11.139 | 10.044 |
Case | Success Rate ↑ | Crash Rate ↓ | Extra Distance ↓ |
---|---|---|---|
a | 1.000 | 0.000 | 2.563 |
b | 0.963 | 0.037 | 2.736 |
c | 0.951 | 0.048 | 2.421 |
d | 0.978 | 0.022 | 2.439 |
e | 0.904 | 0.096 | 3.033 |
f | 0.894 | 0.106 | 3.353 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cao, F.; Xu, H.; Ru, J.; Li, Z.; Zhang, H.; Liu, H. Collision Avoidance of Multi-UUV Systems Based on Deep Reinforcement Learning in Complex Marine Environments. J. Mar. Sci. Eng. 2025, 13, 1615. https://doi.org/10.3390/jmse13091615
Cao F, Xu H, Ru J, Li Z, Zhang H, Liu H. Collision Avoidance of Multi-UUV Systems Based on Deep Reinforcement Learning in Complex Marine Environments. Journal of Marine Science and Engineering. 2025; 13(9):1615. https://doi.org/10.3390/jmse13091615
Chicago/Turabian StyleCao, Fuyu, Hongli Xu, Jingyu Ru, Zhengqi Li, Haopeng Zhang, and Hao Liu. 2025. "Collision Avoidance of Multi-UUV Systems Based on Deep Reinforcement Learning in Complex Marine Environments" Journal of Marine Science and Engineering 13, no. 9: 1615. https://doi.org/10.3390/jmse13091615
APA StyleCao, F., Xu, H., Ru, J., Li, Z., Zhang, H., & Liu, H. (2025). Collision Avoidance of Multi-UUV Systems Based on Deep Reinforcement Learning in Complex Marine Environments. Journal of Marine Science and Engineering, 13(9), 1615. https://doi.org/10.3390/jmse13091615