BIT*+TD3 Hybrid Algorithm for Energy-Efficient Path Planning of Unmanned Surface Vehicles in Complex Inland Waterways
Abstract
:1. Introduction
1.1. The Current State of Research on the BIT* Algorithm
1.2. The Current State of TD3 Deep Reinforcement Learning
2. BIT* Algorithm Procedure
2.1. Algorithm Description
2.2. Algorithm Steps
- i.
- Sampling: The algorithm first samples a fixed number of new state from the informed set. The informed set is defined by the heuristic function and the current best solution cost , and it contains the regions of the state space that are likely to yield better paths. By sampling from the informed set, the algorithm can explore promising regions more efficiently and avoid wasting computational resources on areas that are unlikely to produce an optimal solution.
- ii.
- Pruning: When both the edge queue and the vertex queue are empty, it indicates that the current tree’s expansion has no better edges or vertices to choose from for the time being, and the algorithm enters the pruning phase. The pruning operation aims to remove vertices and edges that cannot provide a better solution for the current or future paths, thereby optimizing the tree’s structure and reducing the burden of subsequent calculations. These removed elements will be temporarily stored in the reuse set so that they can be reused as needed in subsequent iterations. Specifically, the pruning process traverses all vertices and edges in the current tree and judges whether they have the potential to improve the path based on their cost to the start state, estimated cost to the goal state, and the current best solution cost . If a vertex or edge is deemed unable to provide a better solution, it is removed from the tree and added to the set.
- iii.
- Update: After completing the pruning operation, the algorithm merges the reuse set with the new sample points to form a new set of sample points . The unconnected state set is updated to include all newly sampled points. Then, the vertex priority queue is reinitialized to contain all vertices in the current tree in order to select the best vertex for expansion in the subsequent steps.
- iv.
- Vertex Expansion and Edge Selection: The algorithm decides whether to perform vertex expansion or edge selection based on the head elements of and . When the head element of is superior to the head element of , vertex expansion is prioritized; otherwise, edge processing is prioritized. For vertex expansion, the algorithm selects the vertex with the highest priority from the vertex priority queue . This point is usually the vertex with the lowest current path cost or the smallest heuristic value. The selected vertex is added to the edge priority queue to prepare for further exploration of its connected edges. For edge selection, the algorithm selects the edge with the highest priority from the edge priority queue . This edge, connecting from vertex to state , is the candidate edge that is currently most likely to improve the path quality.
- v.
- Edge Processing: For the selected edge , it is evaluated using a heuristic function. If , this edge is expected to be better than the current solution and is worth continuing. Then, the algorithm further calculates the true cost of the edge and checks if adding it to the tree can reduce the known path cost of the target state , that is, comparing . If the former is smaller, the tree structure is updated: if is already in the tree, the connecting edge with its original parent node is first removed, and then the edge is added; if is a newly added vertex to the tree, it is removed from the unconnected state set , added to the vertex set , and marked as an unexpanded vertex. If is a goal state, the cost of the current best solution also needs to be updated.
- vi.
- Empty Queues: After completing the vertex expansion and edge processing of the current iteration, the algorithm empties the vertex queue and the edge queue to prepare for the next round of iteration. This step ensures that each iteration is based on the current latest tree structure and avoids interference from the results of the previous iteration.
Algorithm 1: Pseudo-code of the BIT* algorithm. |
|
3. BIT*+TD3 Deep Reinforcement Learning Algorithm: A Solution Model for Path Planning of Unmanned Surface Vehicles in Hybrid Intelligent Systems
3.1. Path Planning Problem and Environmental Setup for Unmanned Surface Vehicles in Hybrid Intelligent Systems Based on the A*+DKN Deep Reinforcement Learning Algorithm
3.2. A Performance-Enhanced Variant of the BIT* Algorithm
3.2.1. Underwater Energy Consumption Equation
3.2.2. Thread-Parallel Algorithm and Its Optimization
3.3. Improved BIT* + Energy Consumption Equation + TD3 Hybrid Path Planning Strategy Model
3.3.1. Initialization Phase
3.3.2. Ellipsoidal Informed Sampling and Goal Node Configuration
3.3.3. Further Reading
Algorithm 2: Xavier Initialization |
|
3.3.4. Additional Hyperparameter Configuration
3.3.5. Path Optimization and the Energy Consumption Equation
3.3.6. Path Reward Mechanism Design
3.3.7. Adaptive Parameter Optimization Based on Multiple Seeds (Multi-Seed)
3.3.8. Parameter Configuration for the TD3 Deep Reinforcement Learning Strategy Model
4. Experimental Validation
4.1. Improved BIT* Algorithm with Adaptive Parameter Optimization for Optimal Path Planning
4.1.1. Performance of the BIT* Algorithm
4.1.2. Improved BIT* Algorithm
4.1.3. Analysis of Visualized Results for the Improved Algorithm
4.2. TD3 Algorithm Optimization
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, Y.; Zhu, Y.; Li, H.; Wang, J. A hybrid optimization algorithm for multi-agent dynamic planning with guaranteed convergence in probability. Neurocomputing 2024, 592, 127764. [Google Scholar]
- Liu, Y.; Gao, X.; Wang, B.; Fan, J.; Li, Q.; Dai, W. A passage time-cost optimal A* algorithm for cross-country path planning. Int. J. Appl. Earth Obs. Geoinf. 2024, 130, 103907. [Google Scholar]
- Gammell, J.D.; Srinivasa, S.S.; Barfoot, T.D. BIT*: Batch Informed Trees for Optimal Sampling-based Planning via Dynamic Programming on Implicit Random Geometric Graphs. arXiv 2014, arXiv:1405.5848. [Google Scholar]
- Li, P.; Wang, Y.; Gao, Z. Path planning of mobile robot based on improved td3 algorithm. In Proceedings of the 2022 IEEE International Conference on Mechatronics and Automation (ICMA), Guilin, China, 7–10 August 2022. [Google Scholar]
- Zhao, F.; Li, D.; Wang, Z.; Mao, J.; Wang, N. Autonomous localized path planning algorithm for UAVs based on TD3 strategy. Sci. Rep. 2024, 14, 763. [Google Scholar]
- Choudhury, S.; Gammell, J.D.; Barfoot, T.D.; Srinivasa, S.S.; Scherer, S. Regionally Accelerated Batch Informed Trees (RABIT*): A Framework to Integrate Local Information into Optimal Path Planning. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016. [Google Scholar]
- Zhang, L.; Bing, Z.; Chen, K.; Chen, L.; Cai, K.; Zhang, Y.; Wu, F.; Krumbholz, P.; Yuan, Z.; Haddadin, S.; et al. Flexible Informed Trees (FIT*): Adaptive Batch-Size Approach in Informed Sampling-Based Path Planning. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023. [Google Scholar]
- Gammell, J.D.; Srinivasa, S.S.; Barfoot, T.D. Batch Informed Trees (BIT*): Sampling-based Optimal Planning via the Heuristically Guided Search of Implicit Random Geometric Graphs. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015. [Google Scholar]
- Cao, Z. A Novel Dynamic Motion Planning Based on Error Tolerance Batch Informed Tree. In Proceedings of the 5th WRC Symposium on Advanced Robotics and Automation, Beijing, China, 19 August 2023. [Google Scholar]
- Zheng, D.; Tsiotras, P. IBBT: Informed Batch Belief Trees for Motion Planning Under Uncertainty. arXiv 2023, arXiv:2304.10984. [Google Scholar]
- Luo, X.; Wang, Q.; Gong, H.; Tang, C. UAV path planning based on the average TD3 algorithm with prioritized experience replay. IEEE Access 2024, 12, 38017–38029. [Google Scholar]
- Liu, J.; Yap, H.J.; Khairuddin, A.S.M. Path Planning for the Robotic Manipulator in Dynamic Environments Based on a Deep Reinforcement Learning Method. J. Intell. Robot. Syst. 2025, 111, 3. [Google Scholar]
- Kang, D.; Yun, J.Y.; Myeong, N.; Park, J.; Kim, P. Robust Path Planning Using Adaptive Reinforcement Learning in Simulation Environment. In Proceedings of the 2024 13th International Conference on Control, Automation and Information Sciences (ICCAIS), Ho Chi Minh City, Vietnam, 26–28 November 2024. [Google Scholar]
- Fan, Y.; Dong, H.; Zhao, X.; Denissenko, P. Path-Following Control of Unmanned Underwater Vehicle Based on an Improved TD3 Deep Reinforcement Learning. IEEE Trans. Control. Syst. Technol. 2024, 32, 1904–1919. [Google Scholar]
- Zhou, Y.; Gong, C.; Chen, K. Adaptive Control Scheme for USV Trajectory-Tracking under Complex Environmental Disturbances via Deep Reinforcement Learning. IEEE Internet Things J. 2025. [Google Scholar] [CrossRef]
- Wu, X.; Wei, C. DRL-Based Motion Control for Unmanned Surface Vehicles with Environmental Disturbances. In Proceedings of the 2023 IEEE International Conference on Unmanned Systems (ICUS), Hefei, China, 13–15 October 2023. [Google Scholar]
- Jiang, D.; Yuan, M.; Xiong, J.; Xiao, J.; Duan, Y. Obstacle avoidance USV in multi-static obstacle environments based on a deep reinforcement learning approach. Meas. Control 2024, 57, 415–427. [Google Scholar]
- Tranos, T.; Chaysri, P.; Spatharis, C.; Blekas, K. An Advanced Deep Reinforcement Learning Framework for Docking Unmanned Surface Vessels in Variable Environmental Conditions and Amid Moving Ships. In Proceedings of the 13th Hellenic Conference on Artificial Intelligence, Piraeus, Greece, 11–13 September 2024. [Google Scholar]
- Wang, Y.; Cao, J.; Sun, J.; Zou, X.; Sun, C. Path Following Control for Unmanned Surface Vehicles: A Reinforcement Learning-Based Method with Experimental Validation. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 18237–18250. [Google Scholar]
- Xi, M.; Yang, J.; Wen, J.; Li, Z.; Lu, W.; Gao, X. An Information-Assisted Deep Reinforcement Learning Path Planning Scheme for Dynamic and Unknown Underwater Environment. IEEE Trans. Neural Netw. Learn. Syst. 2023, 36, 842–853. [Google Scholar] [CrossRef] [PubMed]
- Gu, Y.; Wang, X.; Cao, X.; Zhang, X.; Li, M.; Hong, Z.; Chen, Y.; Zhao, J. Multi-USV Formation Control and Obstacle Avoidance Under Virtual Leader. In Proceedings of the 2023 China Automation Congress (CAC), Chongqing, China, 17–19 November 2023. [Google Scholar]
- Xu, J.; Huang, F.; Cui, Y.; Du, X. Multi-objective path planning based on deep reinforcement learning. In Proceedings of the 2022 41st Chinese Control Conference (CCC), Hefei, China, 25–27 July 2022. [Google Scholar]
Algorithm | Optimal Path Length | Number of Turns | Turning Cost |
---|---|---|---|
A * | 600 | 447 | 8586.00 |
RRT* | 507.17 | 47 | 473.62 |
Original BIT* | 460.23 | 26 | 75.43 |
Enhanced BIT* | 430.75 | 24 | 40.95 |
Parameter Name | Description | Value |
---|---|---|
Start Point | Initial position | (0,0) |
Goal Point | Target position | (299,299) |
Number of Samples | Number of new sample points per iteration | 100 |
Gaussian Std | Parameter controlling the Gaussian sampling distribution | 10 |
Initial Radius | Initial search radius | 150 |
Turning Weight | Weighting coefficient for turning cost | 0.5 |
Tangent Point Distance | Distance to tangent point for arc path calculation | 3 |
Map Size (Grid Cells) | Reached Destination? | Obstacle Avoidance Successful? | CPU Usage (Clock Speed) | GPU Utilization (%) | |
---|---|---|---|---|---|
Before Optimization | 2 × 104 | Yes | Yes | 4.25 GHz | 0 |
After Optimization | 2 × 104 | Yes | Yes | 3.59 GHz | 56% |
Hyperparameter Name | Value | Description |
---|---|---|
Maximum Steps per Episode | 1000 | The maximum number of steps allowed within each episode. |
Episodes | 200 | The total number of episodes for training. |
Discount Factor | 0.95 | The discount factor used for calculating future rewards. |
Soft Update Coefficient | 0.005 | The parameter controlling the soft update of the target networks. |
Policy Noise | 0.1 | The standard deviation of the noise added to the target actions. |
Noise Clip | 0.3 | The range to which the noise is clipped. |
Policy Update Delay | 4 | The frequency at which the policy network is updated relative to the value network. |
Actor Network Learning Rate | 0.0002 | The learning rate for the Actor network. |
Critic Network Learning Rate | 0.0002 | The learning rate for the Critic networks. |
Replay Buffer Size | 2E6 | The size of the experience replay buffer. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xie, Y.; Ma, Y.; Cheng, Y.; Li, Z.; Liu, X. BIT*+TD3 Hybrid Algorithm for Energy-Efficient Path Planning of Unmanned Surface Vehicles in Complex Inland Waterways. Appl. Sci. 2025, 15, 3446. https://doi.org/10.3390/app15073446
Xie Y, Ma Y, Cheng Y, Li Z, Liu X. BIT*+TD3 Hybrid Algorithm for Energy-Efficient Path Planning of Unmanned Surface Vehicles in Complex Inland Waterways. Applied Sciences. 2025; 15(7):3446. https://doi.org/10.3390/app15073446
Chicago/Turabian StyleXie, Yunze, Yiping Ma, Yiming Cheng, Zhiqian Li, and Xiaoyu Liu. 2025. "BIT*+TD3 Hybrid Algorithm for Energy-Efficient Path Planning of Unmanned Surface Vehicles in Complex Inland Waterways" Applied Sciences 15, no. 7: 3446. https://doi.org/10.3390/app15073446
APA StyleXie, Y., Ma, Y., Cheng, Y., Li, Z., & Liu, X. (2025). BIT*+TD3 Hybrid Algorithm for Energy-Efficient Path Planning of Unmanned Surface Vehicles in Complex Inland Waterways. Applied Sciences, 15(7), 3446. https://doi.org/10.3390/app15073446