Next Article in Journal
Prediction of Total Organic Carbon Content in Shale Based on PCA-PSO-XGBoost
Previous Article in Journal
Modeling Complex Material Interactions to Replicate and Uncover Mechanisms Driving the Performance of Steel Fiber-Reinforced Concrete Cylinders
Previous Article in Special Issue
Dynamic Response Simulation for a Novel Single-Point Mooring Gravity-Type Deep-Water Net Cage Under Irregular Wave and Current
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

BIT*+TD3 Hybrid Algorithm for Energy-Efficient Path Planning of Unmanned Surface Vehicles in Complex Inland Waterways

1
Information College, Shanghai Maritime University, Shanghai 200135, China
2
Merchant Marine College, Shanghai Maritime University, Shanghai 200135, China
3
College of Logistics Engineering, Shanghai Maritime University, Shanghai 200135, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(7), 3446; https://doi.org/10.3390/app15073446
Submission received: 28 January 2025 / Revised: 15 March 2025 / Accepted: 19 March 2025 / Published: 21 March 2025
(This article belongs to the Special Issue Advances in Applied Marine Sciences and Engineering—2nd Edition)

Abstract

:
This research proposes a hybrid path planning framework for intelligent inland waterway Unmanned Surface Vehicles (USVs), which integrates the enhanced BIT* (Batch Informed Trees) algorithm with the TD3 (Twin Delayed Deep Deterministic Policy Gradient) deep reinforcement learning method. To address the limitations of traditional path planning algorithms in dynamic environments, the proposed BIT*+TD3 model leverages the BIT* algorithm to generate initial paths in static environments through elliptical informed sampling and heuristic search. Simultaneously, it utilizes the TD3 algorithm to dynamically optimize these paths through twin Critic networks and delayed policy updates. This research designs a novel reward mechanism aimed at minimizing turning angles, smoothing speed transitions, and shortening path lengths. Furthermore, it incorporates a hydrodynamics-based energy consumption model and multi-threaded parallel computation to enhance computational efficiency. Experimental validation demonstrates that, compared to traditional methods, this model exhibits significant improvements in obstacle avoidance success rate, safe distance maintenance, convergence speed, and smoothness. By bridging sampling-based planning methods with deep reinforcement learning methods, this research advances autonomous navigation technology and provides a scalable and energy-efficient solution for maritime applications.

1. Introduction

With the rapid advancement of autonomous driving technology and artificial intelligence, the application of intelligent unmanned ship technology in the shipping industry is becoming increasingly crucial. The autonomous navigation capability of intelligent ships is not only related to the improvement of shipping efficiency but also to maritime traffic safety and environmental protection. Driven by the national strategy of building a “Smart Yangtze River”, the intelligentization of inland waterway vessels has become key to achieving the sustainable development of the shipping industry. As a core component of intelligent ship autonomous navigation systems, path planning directly impacts the operational efficiency and safety of vessels. Common solutions for path planning include the classic A* algorithm and its various improved forms. For example, Zhang Y et al. [1] addressed the multi-agent path planning problem by incorporating novel optimization algorithms that combine chaotic initialization, reverse search, and differential evolution. Liu Y et al. [2], while performing well in static environments, show obvious limitations in path optimization and obstacle avoidance in dynamic environments. By combining the improved BIT algorithm with the TD3 deep reinforcement learning method, this paper not only addresses the shortcomings of traditional algorithms in dynamic environments but also introduces hydrodynamic models and multi-threaded parallel computing, further optimizing path smoothness and computational efficiency. Additionally, the reward mechanism designed in this paper comprehensively considers path length, turning cost, and energy consumption, providing new technical approaches for the development of intelligent shipping technology. Compared with the BIT* algorithm proposed by Gammell et al. [3], this paper significantly improves the real-time performance and robustness of path planning by introducing TD3’s dynamic optimization capabilities. Compared with reinforcement learning-based research by Li et al. [4] and Zhao et al. [5], this paper’s hybrid algorithm framework demonstrates superior performance in complex water environments, providing a more efficient solution for autonomous navigation of Unmanned Surface Vehicles.
Addressing the shortcomings of existing technologies, this research proposes a deep reinforcement learning-based path planning algorithm for intelligent inland waterway vessels. By integrating an improved BIT* algorithm with a customized reward mechanism, this research develops the BIT*+TD3 hybrid algorithm to adapt to the dynamic environments of inland waterways and improve the efficiency and accuracy of path planning. Simulation results demonstrate that the proposed algorithm exhibits excellent performance in obstacle avoidance success rate, convergence speed, stability, and robustness. The main contribution of this research lies in proposing an innovative path planning method that effectively overcomes the limitations of traditional algorithms in dynamic environments. Through this method, we not only provide a new perspective for the development of inland waterway vessel path planning but also offer strong technical support. Furthermore, this research showcases the immense potential of deep learning technology in enhancing the autonomous navigation capabilities of unmanned ships, laying a solid foundation for the future development of intelligent ship technology.
Extensive simulations were conducted to validate the effectiveness and superiority of the proposed BIT*+TD3 hybrid algorithm. The results of these simulations convincingly demonstrate that the algorithm achieves excellent performance in terms of obstacle avoidance success rate, convergence speed, stability, and robustness, especially when compared to traditional path planning methods. The main contribution of this research lies in proposing an innovative path planning method that effectively overcomes the limitations of traditional algorithms in dynamic environments. Through this method, we not only provide a new perspective for the development of inland waterway vessel path planning but also offer strong technical support. Furthermore, this research showcases the immense potential of deep learning technology in enhancing the autonomous navigation capabilities of unmanned ships, laying a solid foundation for the future development of intelligent ship technology. Compared with existing research, the uniqueness of this study lies in the combination of the enhanced BIT* algorithm and the TD3 deep reinforcement learning algorithm, forming a hybrid path planning strategy. This strategy not only takes full advantage of the BIT* algorithm’s ability to quickly generate initial paths in static environments but also achieves efficient and reliable path planning in complex, dynamic inland river environments through the online learning and dynamic optimization capabilities of the TD3 algorithm. In addition, this research also innovatively integrates a hydrodynamic model into the path evaluation process, considering the energy consumption of the Unmanned Surface Vehicle, further improving the practicality and economy of path planning. This research perspective, which comprehensively considers static environments, dynamic environments, and energy efficiency, is relatively rare in the existing literature and provides new research ideas and solutions for the field of intelligent ship path planning.

1.1. The Current State of Research on the BIT* Algorithm

Path planning, as one of the core problems in robotics and automation, has consistently garnered widespread attention from researchers. Among the numerous path planning algorithms, sampling-based algorithms are highly favored due to their effectiveness in addressing motion planning problems in high-dimensional spaces and complex environments. To further enhance the performance of sampling algorithms, particularly in terms of ensuring path quality and planning efficiency, researchers are continuously exploring new methods and strategies to achieve faster and higher-quality path planning. Gammell et al. [3] proposed the BIT* (Batch Informed Trees) algorithm, which combines the advantages of RRT* and FMT* and optimizes the search process through heuristic methods. The algorithm gradually constructs an implicit random geometric graph (RGG) in batches and uses dynamic programming to search for the optimal path in the graph. The BIT* algorithm has shown significant computational advantages in high-dimensional spaces and can quickly find feasible solutions in complex environments and gradually optimize them to the optimal solution. Choudhury et al. [6] proposed the RABIT* (Regionally Accelerated Batch Informed Trees) algorithm, which further extends the BIT* algorithm. RABIT* accelerates the search process by introducing a local optimizer (such as CHOMP), especially in environments with narrow passages. This algorithm can significantly improve search efficiency while maintaining asymptotic optimality, especially in high-dimensional spaces. Zhang et al. [7] proposed the FIT* (Flexible Informed Trees) algorithm, which further optimizes the BIT* algorithm through an adaptive batch size strategy. The FIT* algorithm dynamically adjusts the batch size according to the dimension of the configuration space and the hypervolume of the hyperellipsoid, thereby increasing the sampling density in the initial path discovery phase and reducing the sampling density in the optimization phase, improving the overall efficiency of the algorithm. Experiments show that FIT* outperforms existing algorithms in spaces from two to eight dimensions. Gammell et al. [8] analyzed the theoretical basis of the BIT* algorithm in another study, proving its probabilistic completeness and asymptotic optimality. By viewing the BIT* algorithm as a search over a series of implicit random geometric graphs, the authors demonstrate the efficiency of the algorithm in high-dimensional spaces and propose its potential future applications. Cao [9] proposed the et-BIT* (error tolerance batch informed tree) algorithm, which is optimized for path planning in dynamic environments. et-BIT* increases the diversity of paths by generating error-tolerant points around the robot’s end effector, enabling rapid avoidance of sudden obstacles in dynamic environments. Experiments show that et-BIT* exhibits good real-time performance and robustness in dynamic path planning tasks in six-dimensional space. Zheng and Tsiotras [10] proposed the IBBT (Informed Batch Belief Trees) algorithm, which extends the BIT* algorithm to path planning problems in uncertain environments. IBBT constructs a nominal trajectory graph and searches in the belief space to find the optimal path considering motion and control uncertainties. Experiments show that IBBT can find non-trivial path planning solutions in complex environments and is computationally more efficient than existing similar methods.

1.2. The Current State of TD3 Deep Reinforcement Learning

The Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, integrating reinforcement learning and deep learning, identifies optimal policies through deterministic policy gradients and double Q-learning. It comprises a total of six networks: one main Actor network, one target Actor network, two main Critic networks, and two target Critic networks. The Actor network is responsible for receiving states and outputting actions, while the Critic networks receive states and actions and output Q-values. Target networks, possessing the same architecture as the main networks, are utilized to generate target Q-values, with parameter updates employing a soft update approach. This decoupled design enhances training stability. Model parameter initialization includes the dimensions of the action and state spaces, learning rates, discount factor, and other parameters, calibrating the learning process. Model training encompasses parameter updates of target networks, sample batch retrieval from the experience replay buffer, gradient updates, and loss calculation. Interaction data are stored in the experience replay buffer via an experience replay mechanism, enabling offline learning. TD3 effectively addresses issues such as Q-value overestimation and training instability through mechanisms like target policy smoothing, delayed policy updates, and double Q-learning. It is applicable to Unmanned Surface Vehicle path planning for continuous control tasks in complex environments. Peng Li et al. [4] focused on mobile robot path planning, introducing prioritized experience replay and a dynamic delayed update strategy. This approach accelerates learning by prioritizing the selection of experience samples with high TD-errors and dynamically adjusts the update frequency of the Actor network based on the Critic network’s update status, thereby improving the algorithm’s success rate and training speed. Zhao et al. [5] applied TD3 to drone path planning, integrating deep reinforcement learning with monocular camera data to achieve autonomous path planning without relying on GPS signals or target depth information. Luo et al. [11] proposed the I-TD3 algorithm, incorporating Prioritized Experience Replay (PER) and an averaged Q-value update strategy. PER enhances the learning efficiency of critical samples through a TD-error-based priority sampling mechanism, while the averaged Q-value update strategy balances Q-value overestimation and underestimation by using the mean of Q1 and Q2 in target value updates, ultimately improving the algorithm’s stability. Liu et al. [12] devised a TD3 algorithm integrated with a safety verification mechanism for real-time obstacle avoidance in dynamic environments for robotic arms. This algorithm mitigates collision risks and planning delays by utilizing incremental path replanning, a deterministic policy, and a simplified state transition function. Daeyoel Kang et al. [13] introduced APPRL, an adaptive path planning reinforcement learning algorithm based on the TD3 framework, which combines TD3 with the Artificial Potential Field (APF) algorithm. APPRL leverages vector fields generated by APF, combined with actions from the Actor network, as inputs to the Critic network, selecting the action with the higher Q-value as the final output. Fan et al. [14] developed TD3-IMP for UUV path tracking control, featuring a dual experience replay mechanism based on TD error and average round reward, along with a dynamic policy smoothing method and a dynamic reward function. Zhou et al. [15] proposed a TD3 algorithm based on Hybrid Prioritized Experience Replay (HPER) for USV trajectory tracking in complex marine environments, optimizing policy exploration with a composite reward function and state transition function. Wu et al. [16] developed TD3-PI, a TD3-based adaptive PI controller, dynamically adjusting PI controller parameters via DRL to address USV motion control challenges under wind and wave disturbances. Jiang et al. [17] introduced an improved TD3 algorithm incorporating a random walk strategy. This method employs a pre-exploration strategy to encourage the agent to actively explore the environment during the initial training phase, storing high-quality samples in the experience pool to accelerate convergence and prevent “cowardly behavior” caused by insufficient initial positive samples. Tranos et al. [18] proposed a reinforcement learning framework based on TD3 and SAC for autonomous USV berthing in port environments, utilizing a Gaussian Mixture Model to simulate real meteorological data and a “safe corridor” strategy to assess moving obstacle risks. Wang et al. [19] developed TCDPG-IC, an improved algorithm based on TD3’s core principles, for USV path tracking control. This algorithm reduces overestimation bias with twin Critic networks and introduces an integral compensator to counteract external disturbance effects. Xi et al. [20] proposed an information-aided reinforcement learning framework, with TD3 as a benchmark, for AUV path planning in dynamic ocean currents. This framework employs information compression to reduce feature redundancy and a confidence evaluator to dynamically adjust the exploration rate. Gu et al. [21] introduced a multi-USV formation control and obstacle avoidance method based on an improved MADDPG algorithm, incorporating a virtual leader and refining the MADDPG algorithm’s network structure and reward function. Xu et al. [22] proposed the HTD3 algorithm, which decomposes multi-objective path planning into multiple single-objective subtasks solved in conjunction with TD3’s twin Q-networks and delayed update mechanism. These advancements primarily focus on experience sampling optimization, network architecture enhancement, and environment interaction strategy adjustment. Leveraging flexible structural adaptations and incorporating domain expertise, TD3 is progressively evolving into a pivotal solution for complex path planning tasks. Its core principle, characterized by the “double critic mechanism to mitigate overestimation and delayed policy updates to promote policy stability”, has established a crucial basis for future research.

2. BIT* Algorithm Procedure

2.1. Algorithm Description

Batch Informed Trees (BIT*) is a sampling-based, asymptotically optimal path planning algorithm that integrates the principles of informed search, akin to algorithms like A*, with the incremental sampling strategies characteristic of algorithms like RRT*. It seeks a low-cost path from the starting state to the goal state by iteratively constructing and optimizing an explicit search tree. The core of this algorithm lies in utilizing batch sampling strategies, heuristic functions, and priority queues to effectively guide the search process, simultaneously enhancing search efficiency while ensuring the quality of the resulting paths.

2.2. Algorithm Steps

Step 1: The initialization phase of the BIT* algorithm mainly involves constructing the initial explicit search tree T = V , E , defining the state sets, and initializing relevant variables. Initially, the vertex set V of the tree T only contains the start state x s t a r t , and the edge set E is empty. The unconnected state set X u n c o n n contains all goal states, indicating that these states are not yet connected to the tree. To guide the search process, the algorithm introduces two priority queues: a vertex priority queue Q v for sorting vertices to be expanded based on their priority (evaluated by a heuristic function) and an edge priority queue Q E for sorting edges to be processed based on their priority (also evaluated by a heuristic function). In addition, the algorithm maintains the following variables: the set of solution vertices found so far V s o l n (initially empty), the set of unexplored vertices V u n e x p e n d (initially containing the start vertex), the set of newly sampled points X n e w (initialized to the unconnected state set X u n c o n n ), and the cost of the current best solution C i (initialized to infinity , indicating that no feasible solution has been found yet). In subsequent iterations, C i will be updated to the minimum cost of the goal vertices in the currently found solutions.
Step 2: The core of the BIT* algorithm is an iterative main loop that continues until a predefined termination condition is met, such as reaching the maximum runtime or finding a path that meets specific quality requirements. In each iteration of the main loop, the algorithm performs the following steps to progressively build and optimize the path:
i.
Sampling: The algorithm first samples a fixed number m of new state X sampling from the informed set. The informed set is defined by the heuristic function and the current best solution cost C i , and it contains the regions of the state space that are likely to yield better paths. By sampling from the informed set, the algorithm can explore promising regions more efficiently and avoid wasting computational resources on areas that are unlikely to produce an optimal solution.
ii.
Pruning: When both the edge queue Q E and the vertex queue Q v are empty, it indicates that the current tree’s expansion has no better edges or vertices to choose from for the time being, and the algorithm enters the pruning phase. The pruning operation aims to remove vertices and edges that cannot provide a better solution for the current or future paths, thereby optimizing the tree’s structure and reducing the burden of subsequent calculations. These removed elements will be temporarily stored in the reuse set X r e u s e so that they can be reused as needed in subsequent iterations. Specifically, the pruning process traverses all vertices and edges in the current tree and judges whether they have the potential to improve the path based on their cost to the start state, estimated cost to the goal state, and the current best solution cost C i . If a vertex or edge is deemed unable to provide a better solution, it is removed from the tree and added to the X r e u s e set.
iii.
Update: After completing the pruning operation, the algorithm merges the reuse set X reuse with the new sample points X sampling to form a new set of sample points X new . The unconnected state set X u n c o n n is updated to include all newly sampled points. Then, the vertex priority queue Q v is reinitialized to contain all vertices in the current tree in order to select the best vertex for expansion in the subsequent steps.
iv.
Vertex Expansion and Edge Selection: The algorithm decides whether to perform vertex expansion or edge selection based on the head elements of Q v and Q E . When the head element of Q v is superior to the head element of Q E , vertex expansion is prioritized; otherwise, edge processing is prioritized. For vertex expansion, the algorithm selects the vertex V m i n with the highest priority from the vertex priority queue Q v . This point is usually the vertex with the lowest current path cost or the smallest heuristic value. The selected vertex v m is added to the edge priority queue Q E to prepare for further exploration of its connected edges. For edge selection, the algorithm selects the edge v m i n , x m i n with the highest priority from the edge priority queue Q E . This edge, connecting from vertex V m i n to state x m i n , is the candidate edge that is currently most likely to improve the path quality.
v.
Edge Processing: For the selected edge v m i n , x m i n , it is evaluated using a heuristic function. If g T v m i n + c ^ v m i n , x m i n + h x m i n < C i , this edge is expected to be better than the current solution and is worth continuing. Then, the algorithm further calculates the true cost c v m i n , x m i n of the edge and checks if adding it to the tree can reduce the known path cost of the target state x m i n , that is, comparing g T v m i n + c v m i n , x m i n . If the former is smaller, the tree structure is updated: if x m i n is already in the tree, the connecting edge with its original parent node is first removed, and then the edge v m i n , x m i n is added; if x m i n is a newly added vertex to the tree, it is removed from the unconnected state set X unconn , added to the vertex set V , and marked as an unexpanded vertex. If x m i n is a goal state, the cost of the current best solution C i also needs to be updated.
vi.
Empty Queues: After completing the vertex expansion and edge processing of the current iteration, the algorithm empties the vertex queue Q v and the edge queue Q E to prepare for the next round of iteration. This step ensures that each iteration is based on the current latest tree structure and avoids interference from the results of the previous iteration.
Step 3: After each iteration, the algorithm checks whether the predefined termination conditions are met. These conditions may include reaching the maximum runtime, finding a path that meets specific quality requirements, or the search space has been sufficiently explored. When any of the termination conditions is met, the algorithm stops the iteration and returns the currently constructed tree T as the final path planning result. The pseudo-code of the algorithm is shown in Algorithm 1.
Algorithm 1: Pseudo-code of the BIT* algorithm.
1
SettingUpPlanning()
2
Repeat
3
   I f ( Q v )   a n d   ( Q E )
4
   [ Prune g T x goal ]
5
   [ X samples Sample m , g T x goal ]
6
   [ V old V , Q v V ]
7
   [ r radius V num + X samples _ num ]
8
  While ( bestVertexValue Q v bestEdgeValue Q E ) do
9
   ( expandVertex bestInVertexValue Q v )
10
    [ { v m , x m } bestInEdgeValue Q E ]
11
   ( Q E { v m , x m } )
12
  If ( CheckValues v m , x m )
13
  If ( x m V )
14
    [ E { v m , x m E } ]
15
  Else:
16
    [ X samples { x m } ]
17
    [ V { x m } , Q v { x m } ]
18
    [ E { v m , x m } ]
19
    ( Q E { v m , x m Q E g T v + c v , x m g T x m } )
20
  else
21
    ( Q E , Q v )
22
Until STOP
23
Return ( T V , E )
In the BIT* algorithm, the following two key formulas play a crucial role in guiding the search process and ensuring the performance of the algorithm:
Connection Radius: In order to maintain good connectivity of the tree while ensuring search efficiency, the BIT* algorithm uses a dynamically adjusted connection radius r BIT * . This radius determines the maximum distance at which a new sample point can attempt to connect with existing vertices in the tree. The formula for calculating the connection radius r BIT * is as follows:
r BIT * = min 1 , γ n ln V + X unconn 2 n V + X unconn m n ,
where n represents the dimensionality of the state space, γ n denotes the volume of an n -dimensional unit hypersphere, m signifies the cardinality of the sample batch, V corresponds to the number of vertices currently in the tree, and X unconn indicates the number of unconnected states. This formulation draws upon the theoretical underpinnings of RRT* and is tailored to accommodate the batch sampling strategy inherent to BIT*. The connection radius r BIT * exhibits a monotonic decrease as the tree expands, a property that underpins the algorithm’s asymptotic optimality. Intuitively, a larger connection radius is adopted during the initial stages of the search, facilitating rapid exploration of the state space. As the search progresses, the radius progressively diminishes, enabling more nuanced path refinement. Furthermore, by constraining new samples to connect solely with existing vertices within the r BIT * radius, the algorithm not only ensures the effective integration of new samples into the tree structure but also curtails superfluous edge evaluations, thereby enhancing computational efficiency.
Heuristic Function Estimate: BIT* incorporates the heuristic search concept from the A* algorithm, employing a heuristic function f x to evaluate the priority of each state x . This function considers both the actual accumulated cost g x from the start to state x and the estimated cost h x from state x to the goal. The heuristic function f x is defined as follows:
f x = g x + h x ,
where g x represents the actual cost of reaching state x from the start state x start along the path in the current tree. h x denotes the estimated cost from state x to the goal state set X goal , typically using the Euclidean distance or other distance metrics that satisfy the admissibility and consistency requirements. h x must satisfy admissibility, meaning the estimated cost cannot exceed the actual cost. By comparing the f values of nodes (or candidate edges), the algorithm can prioritize paths that are most likely to improve the current solution during vertex expansion and edge selection. This heuristic-based guidance strategy allows BIT* to significantly accelerate the search speed while maintaining asymptotic optimality, especially in high-dimensional or complex environments.

3. BIT*+TD3 Deep Reinforcement Learning Algorithm: A Solution Model for Path Planning of Unmanned Surface Vehicles in Hybrid Intelligent Systems

The BIT* algorithm efficiently generates initial paths from a starting point to a goal point in static environments by dynamically adjusting its sampling region and heuristic search strategies. This initial path serves as the input for TD3, providing it with a reference trajectory and state information, thereby expediting the learning process. The TD3 algorithm, grounded in the Actor–Critic framework, dynamically adjusts and optimizes the initial path by refining its policy and value networks. Specifically, TD3 employs a comprehensive reward function designed to minimize turning angles within the path, smooth speed transitions, and shorten the overall path length, ultimately generating a trajectory that is smoother, shorter, and more conducive to USV execution.

3.1. Path Planning Problem and Environmental Setup for Unmanned Surface Vehicles in Hybrid Intelligent Systems Based on the A*+DKN Deep Reinforcement Learning Algorithm

This research introduces a hybrid strategy integrating TD3 and BIT* to plan the shortest path for Unmanned Surface Vehicles (USVs) while ensuring the most time-efficient navigation. Through an end-to-end approach, reward design, and deep learning, the model can learn environmental features, output navigation strategies, avoid obstacles, and achieve efficient path planning. The BIT* algorithm serves as pre-processing and an initialization step for TD3, generating initial paths in static environments to provide TD3 with a reference trajectory and state information, thus expediting its learning process. TD3 then dynamically adjusts and optimizes the path provided by BIT*.
In the environment setup phase, this research utilizes a grid-based method to construct a map model. The grid method divides the entire map environment into grid cells, with the dimensions of the Unmanned Surface Vehicle (USV) serving as the unit. Free space is recorded as 0, and obstacles are marked as 1.
Figure 1 is a rasterized training map. As depicted in Figure 1, the map dimensions are 300 × 300 pixels. White areas denote navigable regions devoid of obstacles, while black areas represent obstacle regions. We assign a value of 1 to black areas, representing obstacles, and a value of 0 to white areas, representing navigable space. The rasterized map used in this study (as shown in Figure 1) was generated by a random algorithm. Specifically, we first created a blank 300 × 300 pixel image and then used Python 3.9’s PIL library and random library to randomly place 3000 black squares of 2 × 2 pixels in size on the image as obstacles. To prevent the obstacles from overlapping, we used a set called occupied to record the areas that have been occupied by obstacles. The upper-left corner coordinates of the obstacles are randomly generated within the image range. If the generated area overlaps with existing obstacles, it is regenerated until an unoccupied area is found. This method of randomly generating obstacles simulates the randomness and uncertainty of the distribution of obstacles. To verify the performance of the algorithm on different maps, we used the above method to generate multiple random maps with different obstacle distributions and conducted tests. The experimental results show that the proposed BIT*+TD3 hybrid algorithm exhibits good path planning performance on these random maps, and the indicators such as success rate, path length, and computation time have good consistency between different maps. This shows that the algorithm has strong robustness to the randomness of the map and can adapt to environments with different obstacle distributions. Nevertheless, it should be noted that the density and size of obstacles may affect the performance of the algorithm. In the case of extremely high obstacle density or excessively large obstacle size, the algorithm may take longer to find a feasible path or even fail to find a feasible path. Therefore, the results of this study are mainly applicable to the case where the obstacle density is moderate and the size is small. We define the following two-dimensional vector mathematical model:
f x , y = 0 , i f   n o   o b s t a c l e   e x i s t s 1 , i f   a n   o b s t a c l e   e x i s t s ,
where x represents the abscissa corresponding to the center of the grid cell, and y represents the ordinate corresponding to the center of the grid cell. We take the length of the AUV as the width d of the grid cell and define the training area for the AUV as the following set:
A U V F e a s i b l e r e g i o n 𝒮 o p o n = x , y , f x , y = 0 A U V F o r b i d d e n a r e a 𝒮 c l o s e = x , y , f x , y = 1 ,
where 𝒮 o p o n is the set of navigable areas and 𝒮 c l o s e is the set of forbidden zones. By mathematically modeling the training map using Equation (4), the real map can be transformed into a computer-readable two-dimensional vector function, thus achieving the purpose of simulation.

3.2. A Performance-Enhanced Variant of the BIT* Algorithm

3.2.1. Underwater Energy Consumption Equation

This research innovatively introduces the BIT* algorithm to generate initial paths and integrates hydrodynamic calculations to model the energy equation, thereby optimizing path planning for underwater engineering vehicles and addressing the issue of insufficient endurance. We have designed a weighting function that considers both distance and energy consumption to evaluate path selection. We utilize the BIT* algorithm to compute initial paths, and subsequently, we encode the BIT* path information into the state space of TD3. TD3, in turn, based on the initial path and perceived environmental information, outputs action commands and further incorporates a reward mechanism to guide TD3 to navigate along the BIT* path while simultaneously avoiding obstacles.
Within the TD3 algorithm, we have streamlined underwater linear and steering kinematics to enhance computational speed, reduce computational time, and resolve the issue of vessel endurance. We have incorporated considerations of the BIT* path length and the number of turns, subsequently transforming these factors into an evaluation function,
f x , y = ω 1 x + ω 2 y ,
where x represents the distance of underwater linear motion, y denotes the number of turns, δ is a steering angle correction parameter, ω 1 is the weight assigned to linear motion, and ω 2 is the weight assigned to turning maneuvers. Equation (5) is grounded in fluid dynamics principles and offers a more accurate estimation of the USV’s energy consumption during motion. Specifically, this equation decomposes the total energy consumption of the USV into two parts: linear motion energy consumption and turning energy consumption. The linear motion energy consumption is proportional to the distance traveled by the USV x , and the coefficient w 1 reflects the energy consumption of linear motion per unit distance. The turning energy consumption is related to the number of turns of the USV y and the angle of each turn (corrected by the parameter δ ), and the coefficient w 2 reflects the energy consumption of a unit turning action. This decomposition method enables Equation (5) to more accurately reflect the differences in energy consumption of the USV under different paths. For example, a longer path with fewer turns may be more energy-efficient than a shorter path with more turns. By adjusting the weighting coefficients w 1 and w 2 , the relative importance of linear motion energy consumption and turning energy consumption can be adjusted according to the specific USV model and mission requirements.
We further integrate the results of hydrodynamic numerical computations to refine the energy equation. These computations offer estimations for crucial parameters such as the weight associated with linear motion distance, turning angles, and their respective weights. Drawing upon these parameters, we have established an energy equation that accounts for BIT* path characteristics through regression fitting methodologies. Specifically, we utilize the total length of the BIT* path as a metric for linear motion distance and employ indicators like cumulative heading angle change or path curvature to quantify the turning cost.

3.2.2. Thread-Parallel Algorithm and Its Optimization

Employing key point selection for sub-map generation, multi-threaded parallel computation can minimize path planning time and memory footprint. Therefore, precise and rapid generation of key point pairs is necessary. The calculation formula is as follows:
x i = x 1 + i × Δ x n y i = y 1 + i × Δ y n ,
where x 1 , y 1 represents the coordinates of the starting point, Δ x and Δ y denote the changes in the horizontal and vertical distances from the starting point to the endpoint, respectively; i represents the segment number; and n represents the total number of segments. Utilizing Equation (6), the key points within the map can be precisely computed, and these key points serve as the starting and ending points for the sub-maps. Subsequently, leveraging these key points, sub-maps are generated using the coordinates of the bottom-left and top-right corners of the global map.
To address the issue of traditional path planning algorithms failing in high-density obstacle environments, this research proposes a parallel optimization framework based on dynamic sub-map expansion. In the rasterized map processing stage, we establish a sub-map feasibility judgment model: Set the sub-map area as ( S = { c i j i x m i n , x m a x , j y m i n , y m a x } ) represents the grid value at coordinates i , j (0 indicates free space, 1 indicates an obstacle). Define the path feasibility criterion as ( c S c 0 ) , i . e . , w h e n there exists at least one obstacle-free path, the product of all grid values traversed by all paths is zero. This mathematical model can formally verify sub-map connectivity. When the criterion is not satisfied, a dynamic expansion mechanism is triggered—expanding the search boundary layer by layer with the current sub-map as the center, according to an exponentially increasing radius ( R k = 2 k 1 δ ) ( δ )   i s   t h e   i n i t i a l   r e s o l u t i o n , ( k N + ) until the ( c = 0 ) condition is met or the system maximum expansion threshold is reached.
We offload the path planning task to the GPU to leverage its parallel processing capabilities. By optimizing the algorithm with CUDA, we divide the map into sub-regions for parallel processing, effectively reducing complexity and enabling rapid path planning. Subsequently, we summarize the results to derive a global optimal solution. The path planning process is illustrated in Figure 2.

3.3. Improved BIT* + Energy Consumption Equation + TD3 Hybrid Path Planning Strategy Model

This research proposes a hybrid path planning strategy suitable for complex environments, integrating an enhanced BIT* algorithm and an energy consumption equation. By leveraging mathematical modeling, ellipsoidal sampling, and a sequential node optimization mechanism, this approach reduces computational complexity and optimizes path planning in dynamic environments. The strategy prioritizes nodes in the early segment of the path, thereby improving the navigation efficiency and accuracy of Unmanned Surface Vehicles (USVs). Concurrently, a path pruning mechanism eliminates invalid paths, ensuring the practical application of the navigation scheme.

3.3.1. Initialization Phase

In this research, the initialization phase of the BIT* algorithm involves setting the starting node, goal node, map information, and sampling parameters. By initializing the ellipsoidal sampling region and the node set, the algorithm is enabled to efficiently identify feasible paths within the search space.

3.3.2. Ellipsoidal Informed Sampling and Goal Node Configuration

In the BIT* algorithm, the ellipsoidal informed sampling region is dynamically adjusted based on the positions of the start and goal nodes. The semi-major axis of the ellipsoid is determined by the current best path cost c i , while the semi-minor axis is calculated from the minimum cost c m i n , between the start and goal nodes. This dynamic adjustment mechanism ensures that the sampling region remains focused on areas likely to contain the optimal path, thereby enhancing search efficiency.

3.3.3. Further Reading

The BIT* algorithm, through the integration of ellipsoidal sampling and heuristic search, is capable of rapidly identifying optimal paths in complex environments. The dynamic adjustment mechanism of the ellipsoidal sampling region, analogous to the weight adjustment strategies in Xavier and He initialization methods, effectively mitigates the risk of converging to local optima and accelerates the algorithm’s convergence speed.
Thus, whether employing the Xavier initialization method or the He initialization method, both methods allow assigning weights specific initial values prior to neural network training, thereby effectively mitigating the risk of vanishing gradients or local optima during the training process. This research selects the Xavier method for parameter initialization. The below Algorithm 2 shows Xavier initialization pseudo-code.
Algorithm 2: Xavier Initialization
1
def xavier_init(layer, mode="uniform"):
2
  if layer is a fully connected layer:
3
    n_in = layer.input_dim # Number of input neurons
4
    n_out = layer.output_dim # Number of output neurons
5
  elif layer is a convolutional layer:
6
    n_in = input_channels * kernel_width * kernel_height
7
    n_out = output_channels * kernel_width * kernel_height
8
  combined_dim = n_in + n_out
9
  if mode == "uniform":
10
    limit = sqrt(6.0/combined_dim)
11
    for weight in layer.weights:
12
    weight = uniform_sample(-limit, limit)
13
  elif mode == "normal":
14
    std = sqrt(2.0/combined_dim)
15
    for weight in layer.weights:
16
    weight = normal_sample(mean=0, std=std)
17
  if layer has bias:
18
    for bias in layer.bias:
19
    bias = 0
20
  return layer

3.3.4. Additional Hyperparameter Configuration

In addition to the ellipsoidal informed sampling and node expansion mechanisms, the BIT* algorithm requires the configuration of other hyperparameters, such as the number of samples, the connection radius, and the termination time. The settings of these parameters directly impact the search efficiency and the quality of the generated path. In this study, the initial number of samples is set within the range of 50 to 200. Through empirical testing, the number of samples per iteration is determined dynamically based on the density of obstacles in the map: when the obstacle density is high, the number of samples is appropriately increased to enhance search efficiency; conversely, an excessively large sample size may lead to prohibitive computational overhead and potentially convergence failure. The initial connection radius is set to 0.75 × r, where r is determined by the radius of the map’s incircle. This configuration ensures a sufficiently large search range while avoiding unnecessary computational waste due to an excessively large radius. The termination time is set to 120 s, a duration that allows the algorithm sufficient time to identify the optimal path while still satisfying real-time requirements.

3.3.5. Path Optimization and the Energy Consumption Equation

In the initial path planning of the BIT* algorithm, the algorithm preemptively optimizes the path’s turning cost by leveraging the energy consumption equation. Through the introduction of the turning weight T θ , the algorithm is enabled to account for the turning energy consumption of the Unmanned Surface Vehicle (USV) in path planning, thereby generating smoother and more energy-efficient paths. The calculation formula for the turning cost is as follows:
C t u r n i n g = T θ × Z a n g l e ,
where C t u r n i n g represents the turning cost, T θ is the turning weight, and Z a n g l e denotes the turning angle between three consecutive nodes, calculated using the vector dot product formula. A smaller turning angle corresponds to a smoother path, while a larger turning angle indicates a sharper turn, resulting in higher energy consumption. By adjusting the turning weight T θ , the algorithm can control the influence of the turning cost on path planning, prioritizing paths with smaller turning angles and consequently reducing the energy expenditure associated with the USV’s turning maneuvers.

3.3.6. Path Reward Mechanism Design

The path reward mechanism of the BIT* algorithm comprehensively considers path length, turning cost, and energy consumption to ensure that the Unmanned Surface Vehicle (USV) avoids collisions while minimizing energy consumption as much as possible. The calculation formula for the reward value is as follows:
R = P l e n g t h + C t u r n i n g + C e n e r g i n g ,
where P l e n g t h represents the path length, C t u r n i n g denotes the turning cost, and C e n e r g i n g represents the energy consumption. A shorter path length, lower turning cost, and lower energy consumption result in a higher reward value. The path length is calculated by summing the Euclidean distances between consecutive nodes, the turning cost is determined by accumulating the weighted sum of all turning angles in the path, and the energy consumption is derived from a combination of path length and turning cost, with weighting coefficients used to balance their respective impacts on energy expenditure. This reward mechanism not only guides the USV towards selecting the shortest path but also discourages the generation of excessively tortuous trajectories, thus enabling efficient and energy-conscious path planning in complex environments. For instance, between two paths, a longer path might be favored due to its fewer turns and smaller turning angles, resulting in a higher reward. Through this mechanism, the BIT* algorithm achieves a balance between path length, turning cost, and energy consumption, generating paths that are both efficient and energy-conscious, suitable for USV navigation tasks in complex environments.
For safety considerations, Unmanned Surface Vehicles (USVs) must avoid collisions and successfully complete navigation tasks. Consequently, we employ the following safety reward to prevent collisions:
R c r a c k = 10 B l a c k .

3.3.7. Adaptive Parameter Optimization Based on Multiple Seeds (Multi-Seed)

This study proposes a multi-seed adaptive parameter optimization method to enhance the BIT* algorithm’s performance. The method iteratively runs the algorithm multiple times with different random seeds, dynamically adjusting key parameters, such as the number of samples n s a m p l e s and the connection radius r B I T , during each run. The optimal parameter combination is determined by comparing the convergence performance across these runs. The process initializes baseline values for n s a m p l e s and r B I T and sets a floating range. For each run, new parameter values are randomly generated within the defined range. The BIT* algorithm is then executed, and a log file records performance metrics, including path length, turning cost, energy consumption, and the final path cost.
After each run, the best-performing set of parameters, based on convergence metrics, becomes the new baseline. The floating range is then reduced to refine the search in subsequent iterations. This iterative process continues until the floating range reaches a predefined threshold or a maximum number of iterations is completed.
The final optimal parameter combination is then used to execute the BIT* algorithm, generating the final optimized path. This multi-seed adaptive optimization approach allows the algorithm to automatically identify optimal parameters, improving path planning performance by generating shorter paths with lower turning costs and minimal energy consumption, ultimately enhancing the algorithm’s adaptability and robustness.

3.3.8. Parameter Configuration for the TD3 Deep Reinforcement Learning Strategy Model

To expedite the implementation of the decision-making model, we opted for the PyTorch 2.5.0 deep learning framework and the Python programming language to train the TD3 model. The network architecture of the TD3 model examined in this research primarily comprises one Actor network and two Critic networks (twin Q-networks). The detailed structure can be referenced in the definitions of the Actor and Critic classes within the code. Figure 3 illustrates the detailed workflow of the TD3 algorithm. As shown in the figure, the TD3 algorithm adopts an Actor–Critic architecture, where the Actor network is responsible for outputting actions based on the current state and the Critic network is responsible for evaluating the value of actions. The TD3 algorithm also introduces target networks and an experience replay mechanism to improve the stability and efficiency of training. The parameters of the target networks are copied from the main networks through a soft update method, and the experience replay mechanism stores the historical data of the agent’s interaction with the environment for subsequent training. The TD3 algorithm updates the parameters of the Critic network by minimizing the temporal difference (TD) error and updates the parameters of the Actor network through the policy gradient method.
In this research, the core objective of the TD3 (Twin Delayed Deep Deterministic Policy Gradient) algorithm is to generate a path that is smoother, shorter, and more conducive to Unmanned Surface Vehicle (USV) execution by incorporating turning angle cost, speed decay mechanisms, and path length rewards. Specifically, given the initial path P = p 1 , p 2 , , p n produced by bit*, two points p i and p j i < j are selected along the trajectory. To mitigate the potential for abrupt changes in heading at the junctures between the optimized segment and the original path, their immediate predecessor p i 1 and successor p j + 1 are also considered. The optimization procedure is delineated as follows:
1. Turning Angle Cost: The turning angle cost penalizes sharp turns in the trajectory, promoting smoothness. For each point p k i k j on the path segment, the turning angle ( θ k ) is computed using the vectors ( v prev = p k p k 1 ) and ( v next = p k + 1 p k ) .
θ k = arccos v prev v next | v prev | | v next | .
The turning angle cost, denoted as ( C angle ) , is defined as the summation of all turning angles along the segment.
C angle = k = i j θ k .
2. Velocity Decay Mechanism: To mitigate the impact of turning angles on velocity, a velocity decay mechanism is introduced. For each point ( p k ) , its velocity ( v k ) is adjusted based on the turning angle ( θ k ) .
v k = v cos θ k ,
where V represents the nominal velocity of the USV. The velocity decay mechanism ensures that the robot automatically decelerates during turns, thereby enhancing the smoothness and safety of the trajectory.
3. Path Length Reward: The path length reward incentivizes the generation of shorter paths. For the optimized path segment ( P = { p i , p i + 1 , , p j } ) , its path length ( L P ) is computed using the Euclidean distance.
L P = k = i j 1 | p k + 1 p k | .
The path length reward, denoted as ( R length ) , is defined as the negative of the path length.
R length = L P .
4. Total Reward Function: The objective of the optimization process is to maximize the total reward function, denoted as ( R total ) , which is composed of the turning angle cost, the velocity decay mechanism (implicitly through its effect on travel time), and the path length reward.
R total = α C angle + β k = i j v k + γ R length ,
where α , β , and γ represent the weighting coefficients for each component, respectively, used to balance the influence of turning angle, velocity, and path length.
5. Optimization Process: Employing the TD3 algorithm, we train a policy network (Actor) to generate optimized path points that maximize the total reward function R total . Specifically, the input to the policy network is the current path point along with its preceding and succeeding points, and the output is the position of the next path point. The Critic network is employed to evaluate the reward value of the generated path points and update the policy network parameters via gradient descent.
6. Smooth Transition: The optimized path segment ( P ) is smoothly connected to the original path at the connection points ( p i 1 ) and ( p j + 1 ) , ensuring that the turning angles ( θ i 1 ) and ( θ j ) do not exceed a predefined threshold, thereby preventing abrupt changes in the trajectory.
Through the aforementioned method, we can achieve local partial optimization on the initial path generated by BIT*, generating a path that is smoother, shorter, and more conducive to Unmanned Surface Vehicle (USV) execution.

4. Experimental Validation

4.1. Improved BIT* Algorithm with Adaptive Parameter Optimization for Optimal Path Planning

4.1.1. Performance of the BIT* Algorithm

This section will elaborate on the simulation experiment design for the Batch Informed Trees (BIT*) algorithm and provide an in-depth analysis of the experimental results. The core of our research lies in how to optimize the performance of the BIT* algorithm in Unmanned Surface Vehicle (USV) path planning to meet the demands of practical applications. Despite BIT*’s theoretical asymptotic optimality and demonstrated superior performance in certain scenarios, its performance in real-world applications still necessitates experimental validation and analysis. Firstly, BIT* algorithm performance is influenced by various factors, including heuristic function selection and sampling strategies. The interaction and impact of these factors require in-depth experimental investigation. Secondly, different application scenarios have varying requirements for path planning algorithms; for example, efficiency and robustness are crucial in complex environments. Simulation experiments can simulate diverse scenarios in a controlled environment, evaluating the BIT* algorithm’s performance under different conditions to guide parameter selection and performance optimization. Furthermore, comparative studies of the BIT* algorithm with other algorithms can objectively assess its strengths and weaknesses, offering references for algorithm selection and improvement. Therefore, simulation experiments are an important and necessary means to validate BIT* algorithm performance, analyze characteristics, and conduct comparative research.
The BIT* algorithm possesses these theoretical advantages, and the original algorithm has already exhibited good performance in some benchmark tests. However, in practical applications, especially under certain specific environmental or task requirements, some issues may still arise. In complex environments with dense obstacles or narrow passages, the sampling strategy of the BIT* algorithm may require further optimization to enhance its efficiency and reliability in finding feasible paths.
Analysis of the results obtained from the original BIT* algorithm, as depicted in Figure 4, reveals that the planned paths exhibit large angles between connected segments at several nodes, resulting in a lack of smoothness. Furthermore, issues such as complex node updates and prolonged search times are observed, leading to an increased number of algorithm iterations and consequently impacting the overall efficiency. To address these shortcomings, this study proposes a more efficient iterative logic for the algorithm to accelerate its execution, enhance its optimization efficiency, and further improve the optimality of the planned path length. To this end, we also incorporate a precise fluid dynamics model to constrain the robot’s motion. The application of this model contributes to a reduction in the robot’s turning time during path planning, thereby lowering energy consumption. This is of significant importance for improving the robot’s operational efficiency and extending its working duration.

4.1.2. Improved BIT* Algorithm

This experiment aims to comprehensively evaluate the performance of the implemented BIT* algorithm. The experiments will be developed and tested using the Python programming language within the PyCharm IDE environment and simulated on a single-channel grayscale image of 300 × 300 pixels. The BIT* algorithm is implemented based on the original study with several optimizations, primarily including employing the Euclidean distance as the heuristic function, incorporating a combination of uniform and Gaussian sampling strategies, and utilizing the KD-Tree data structure for efficient nearest neighbor search. To enhance the algorithm’s efficiency, the edge rewiring process of the original BIT* algorithm is optimized to be performed only when it can significantly improve the path quality.
The experiments will utilize libraries such as OpenCV, matplotlib, and numpy for image processing, graphical plotting, and numerical computations. Key algorithmic parameters, including the initial sampling radius, connection radius, and the parameter controlling the probability of Gaussian sampling, will be detailed in the subsequent experimental parameter settings section. The experiments will primarily assess the effectiveness, efficiency, path quality, and robustness of the BIT* algorithm, and performance metrics such as success rate, computation time, path length, and the number of expanded nodes will be recorded and analyzed.
As depicted in Figure 5, Figure 6, Figure 7 and Figure 8, after the simulation of the enhanced BIT* algorithm, the smoothness of its path is significantly improved compared to the original BIT* algorithm. To comprehensively evaluate the performance of the enhanced BIT* algorithm and validate its superiority in path planning, this research conducted comparative experiments with A*, RRT*, and the original BIT* algorithm. The comparative data in Table 1 clearly demonstrates that, in terms of optimal path length, the enhanced BIT* algorithm achieved the best result, at only 430.75, representing a reduction of approximately 6.1% in path length compared to the original BIT* algorithm’s 460.23. More notably, compared to the RRT* algorithm (507.17) and the A* algorithm (600), the enhanced BIT* algorithm reduced the path length by approximately 12.1% and 25.7%, respectively, indicating a significant advantage in exploring superior paths. Regarding path smoothness, we employed two metrics, the number of turns and turning cost, for quantitative evaluation. The number of turns directly reflects the tortuosity of the path, while the turning cost, under hydrodynamic model constraints, represents the cost incurred by the vessel for turning operations during path tracking. This metric comprehensively considers both turning angle and energy consumption. The data in Table 1 shows that the enhanced BIT* algorithm has only 24 turns, significantly lower than the original BIT* algorithm’s 26 turns, the RRT* algorithm’s 47 turns, and the A* algorithm’s 447 turns. In terms of turning cost, the enhanced BIT* algorithm exhibits an even more overwhelming advantage, at only 40.95, far lower than the original BIT* algorithm’s 75.43, the RRT* algorithm’s 473.62, and the A* algorithm’s 8586.00. This series of data powerfully demonstrates the superior performance of the enhanced BIT* algorithm in generating smoother paths. By reducing unnecessary turns, it not only decreases the energy consumption of the Unmanned Surface Vehicle (USV) but also enhances the stability and execution efficiency of its motion (Table 2).
Combining with the iterative results in Figure 5, it can be observed that the enhanced BIT* algorithm obtains the optimal path after only 19 iterations. Furthermore, at the 15th iteration, the difference in optimal path length compared to the previous iteration is less than 0.092, indicating a very fast convergence speed of the algorithm. In addition, the effectiveness of the energy equation in suppressing turning is also effectively validated, with a significant reduction in turning cost. This strategy effectively reduces the turning time and energy consumption of the Unmanned Surface Vehicle (USV), not only enabling it to successfully avoid obstacles but also achieving smoother and more efficient path planning. This fully demonstrates that by incorporating the energy equation, the USV can utilize energy more efficiently, reduce unnecessary energy consumption, and maintain path continuity and stability. To further enhance path planning efficiency, this research also employed multi-threading technology to accelerate the computation process. Multi-threading technology, by processing multiple tasks in parallel, completes the path planning process in a shorter time, greatly improving computational speed. This method not only enhances the response speed of the USV but also provides strong support for real-time path planning and dynamic environment adaptation. Comprehensive experimental results demonstrate that the enhanced adaptive parameter optimization BIT* algorithm outperforms the original BIT* algorithm, RRT* algorithm, and A* algorithm in terms of optimal path length, path smoothness, and computational efficiency. This validates the effectiveness and superiority of the improvement strategies presented in this paper, providing a more reliable solution for efficient path planning of USVs in complex environments.
The results in Figure 9 indicate that the path planning time decreases with the increase in the number of threads across different map sizes. This suggests that increasing the thread count can enhance computational throughput, and particularly for larger maps, more threads may be required to support more efficient path planning. However, while multi-threading technology is more efficient than the traditional BIT* algorithm, the communication and synchronization costs between threads also increase with the thread count. These additional overheads can negatively impact overall efficiency because thread management, data synchronization, and communication become more complex and time-consuming as the number of threads increases. Furthermore, hardware limitations are also a significant factor affecting multi-threading efficiency. For example, if the processor’s core count is limited or the memory bandwidth is insufficient to support a large number of threads working in parallel, the advantages of multi-threading may not be fully realized. Therefore, when designing multi-threaded path planning algorithms, it is necessary to comprehensively consider the number of threads, hardware resources, and thread management overhead to achieve the optimal performance balance.
Generally, multi-threading technology provides an effective means to improve the efficiency of path planning. However, in practical applications, it is necessary to carefully weigh the choice of thread count, hardware resources, and thread management costs to ensure that the algorithm achieves optimal performance in real-world environments. Increasing the thread count can enhance computational throughput, but excessive multi-threading may increase communication costs and reduce efficiency. Therefore, it is necessary to strike a balance between efficiency and thread count.
In this research, after careful consideration and experimentation, we selected four threads as the optimal number for optimizing the path algorithm. This selection is based on the balance between algorithm performance and resource utilization efficiency. To investigate the algorithm’s effectiveness, we compared single-threaded and multi-threaded processing for different map sizes and recorded their processing times.
Figure 10 clearly illustrates that multi-threading methods are more efficient than single-threading methods when processing large-scale map data. As map size increases, the processing time for both single-threaded and multi-threaded methods increases, but the time required for multi-threading methods remains significantly lower than that for single-threading methods. The faster processing speed achieved by multi-threading is primarily attributed to several factors: 1. Parallel Processing Capability: Multi-threading enables the simultaneous processing of multiple grid cells, and this parallel processing mechanism substantially reduces the overall computation time. 2. Reduced I/O Operation Time: Parallelization reduces the number of disk I/O operations because multiple threads can share data, thereby minimizing the need for disk read and write operations. 3. Full Utilization of Multi-core CPUs: Multi-threading fully leverages the computational power of modern multi-core CPUs by distributing tasks across different processor cores, thus achieving faster processing speeds.

4.1.3. Analysis of Visualized Results for the Improved Algorithm

We utilized a 200,000 × 200,000 map with 100% CPU utilization, offloading the third and fourth thread tasks to CUDA. Consequently, both the CPU and GPU employed a multi-threaded A* algorithm for parallel path searching. The CPU and GPU utilization data before and after optimization in this experiment are presented in Table 3.
As can be observed from Table 1, offloading a portion of the planning workload to the GPU effectively alleviates the burden on the CPU, which is constrained by the limitations of the computer hardware. In conclusion, these results demonstrate that the optimized algorithm exhibits an enhanced capacity for handling large-scale map planning tasks while maintaining a high level of goal attainment performance.

4.2. TD3 Algorithm Optimization

To validate the potential of deep reinforcement learning algorithms in enhancing path quality, this research further explored the application of the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm for BIT* path optimization. This experiment aimed to evaluate the effectiveness of TD3 as a post-processing optimizer in improving paths generated by the BIT* algorithm. In the experiment, we designed and implemented a TD3-based agent to iteratively optimize the initial paths generated by the BIT* algorithm. To ensure the validity of the experiment and the reliability of the results, meticulous hyperparameter tuning was conducted, and a set of parameter configurations that balance algorithm performance, stability, and computational efficiency was finally determined, as shown in Table 4. Table 1 details the specific hyperparameter values used for training the TD3 agent. Key hyperparameters include the discount factor (0.95), which determines the importance of future rewards; the soft update coefficient (0.005), which controls the rate at which the target networks are updated; and the learning rates for both the Actor and Critic networks (0.0002).
These values were chosen based on a combination of prior work in the field of deep reinforcement learning and empirical tuning to achieve a balance between exploration and exploitation, ensuring stable and efficient learning. The focus of the experiment is to observe and analyze the effect of the TD3 algorithm on improving path quality at different training stages and to evaluate the learning performance and convergence of the TD3 algorithm through the variation curve of the reward function.
During the training process, we recorded the average reward for each episode and plotted the reward function convergence curves as shown in Figure 11, Figure 12, Figure 13 and Figure 14. As can be seen from the figures, the average reward exhibits a clear upward trend with the increase in training episodes and gradually converges to a relatively high level after approximately 600 episodes. This indicates that the TD3 agent effectively learns strategies for optimizing paths through interaction with the environment. Notably, the reward increases relatively rapidly in the early stages of training, suggesting that the agent can quickly grasp basic path optimization skills. In the later stages of training, the fluctuations in reward gradually decrease and eventually stabilize around a relatively optimal value, indicating that the agent has learned a relatively stable strategy capable of generating high-quality paths. It is worth noting that some minor fluctuations exist in the reward curve, which may be attributed to the presence of exploration noise in the TD3 algorithm and the inherent stochasticity of the environment. Overall, the convergence curve of the reward function clearly demonstrates the effectiveness of the TD3 algorithm in the path optimization task, validating its feasibility and superiority when combined with the BIT* algorithm.

5. Conclusions

By integrating the enhanced BIT* algorithm with the TD3 deep reinforcement learning algorithm, this research constructed a path planning model tailored for intelligent inland waterway vessels. The model first leverages the BIT* algorithm to generate initial paths in static environments and subsequently employs the TD3 algorithm for dynamic optimization. This dynamic optimization incorporates twin Critic networks and delayed policy update mechanisms to enhance learning efficiency and stability in continuous control tasks. Furthermore, the model introduces a deep reinforcement learning-based reward mechanism designed to minimize path turning angles, smooth speed variations, and shorten path lengths, thereby resulting in paths that better align with actual navigation requirements.
In the experimental design, the paper adopted a grid-based method to construct the map model and innovatively incorporated hydrodynamic calculations to optimize the energy equation for underwater vehicles. A weighting function was designed to evaluate path selection, considering both distance and energy consumption. In addition, multi-threaded parallel computation and a multi-seed-based adaptive parameter optimization method were employed to enhance algorithm performance.
In conclusion, the hybrid algorithm model proposed in this research theoretically demonstrates the potential for performing autonomous vessel navigation tasks and exhibits significant advantages in practical applications. In the complex scenarios of inland waterway navigation, this model can effectively handle crowded, narrow waterways with dynamic obstacles (such as other vessels, buoys, etc.), plan safer and more feasible routes more reliably, reduce collision risks, and improve navigation efficiency. This is crucial for the automation and intelligentization of inland waterway shipping. Simultaneously, by optimizing path length and the number of turns, it significantly reduces the energy consumption of the USV, which not only lowers operating costs but also meets environmental protection requirements, strongly promoting the development of green shipping. Moreover, the online learning capability conferred by the TD3 algorithm enables the USV to adjust its navigation plan in real-time according to environmental changes such as weather and currents, as well as unexpected events (such as temporary traffic control), ensuring the safety and punctuality of navigation. While ensuring safety, the model can also plan shorter routes to reduce navigation time and improve transportation efficiency, bringing significant economic benefits to commercial shipping. A high degree of autonomy also means reducing the demand on manual operations, which reduces risks brought by human factors and increases the safety of the shipping. These advantages collectively indicate that the model has broad application prospects in the field of intelligent inland waterway navigation.
The research findings of this paper are entirely consistent with the proposed evidence and arguments. Through experimental verification, our proposed BIT*+TD3 hybrid algorithm significantly outperforms traditional path planning algorithms (such as A*, RRT*, and the original BIT* algorithm) in terms of path planning success rate, path quality, environmental adaptability, and computational efficiency, fully demonstrating the effectiveness and superiority of the algorithm. These findings not only support the research hypothesis we proposed in the introduction, that is, the combination of BIT* and TD3 algorithms can effectively solve the path planning problem of Unmanned Surface Vehicles in complex inland river environments, but also provide a practical and feasible solution to this problem. Specifically, the introduction of the BIT* algorithm solves the problem of quickly generating initial paths in static environments, the TD3 algorithm solves the problem of optimizing paths and adapting to changes in dynamic environments, and the introduction of the energy consumption equation further improves the practicality and economy of path planning. The combination of the three successfully solves the main problem raised in the introduction, that is, how to plan a safe, efficient, and energy-saving navigation path for Unmanned Surface Vehicles in complex and dynamic inland river environments.
Although the proposed BIT*+TD3 hybrid algorithm has achieved remarkable results, it is important to acknowledge its advantages and disadvantages in comparison with other published procedures. On the one hand, the hybrid approach offers a higher planning success rate, particularly in complex and dynamic environments, outperforming traditional sampling-based or search-based methods. The TD3 component contributes to superior path quality, generating smoother, shorter, and more energy-efficient trajectories. The online learning capability of TD3 provides strong environmental adaptability. Furthermore, computational efficiency is improved by leveraging multi-threading and parameter optimization strategies. On the other hand, the hybrid algorithm introduces a higher level of complexity in terms of implementation and parameter tuning compared to simpler methods. The TD3 algorithm, as a deep reinforcement learning technique, inherently requires a longer training time to reach optimal performance, and it also places demands on computing resources. The performance is also limited by the accuracy of the simulation environment. Future research can improve these shortcomings, such as exploring more lightweight deep reinforcement learning algorithms or using methods such as transfer learning to shorten training time.
In conclusion, the hybrid algorithm model proposed in this research theoretically demonstrates the potential for performing autonomous vessel navigation tasks and exhibits advantages in practical applications. As illustrated in Figure 15, the combination of BIT* and TD3 has potential applications beyond inland waterway vessels, spanning domains such as mobile robotics, autonomous vehicles, industrial automation, and virtual environments. This breadth of potential applications underscores the versatility and adaptability of the proposed hybrid approach. This study provides new insights and technical support for the development of intelligent shipping technology and identifies future research directions, including further algorithm optimization to adapt to more complex environmental conditions and the validation of algorithm stability and reliability in real-world applications. Future work could also explore the implementation of BIT*+TD3 in the aforementioned application areas, tailoring the approach to the specific challenges and requirements of each domain.

Author Contributions

Conceptualization, Y.X.; Methodology, Y.X.; Formal analysis, Y.M. and Y.C.; Investigation, Y.C.; Resources, Y.M. and Z.L.; Visualization, Z.L. and X.L.; Supervision, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study is funded by the Shanghai Maritime University Research Fund “Development and Application of New Technologies for Intelligent Shipping and Safety Management of Shipping Companies” (H20230487).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhang, Y.; Zhu, Y.; Li, H.; Wang, J. A hybrid optimization algorithm for multi-agent dynamic planning with guaranteed convergence in probability. Neurocomputing 2024, 592, 127764. [Google Scholar]
  2. Liu, Y.; Gao, X.; Wang, B.; Fan, J.; Li, Q.; Dai, W. A passage time-cost optimal A* algorithm for cross-country path planning. Int. J. Appl. Earth Obs. Geoinf. 2024, 130, 103907. [Google Scholar]
  3. Gammell, J.D.; Srinivasa, S.S.; Barfoot, T.D. BIT*: Batch Informed Trees for Optimal Sampling-based Planning via Dynamic Programming on Implicit Random Geometric Graphs. arXiv 2014, arXiv:1405.5848. [Google Scholar]
  4. Li, P.; Wang, Y.; Gao, Z. Path planning of mobile robot based on improved td3 algorithm. In Proceedings of the 2022 IEEE International Conference on Mechatronics and Automation (ICMA), Guilin, China, 7–10 August 2022. [Google Scholar]
  5. Zhao, F.; Li, D.; Wang, Z.; Mao, J.; Wang, N. Autonomous localized path planning algorithm for UAVs based on TD3 strategy. Sci. Rep. 2024, 14, 763. [Google Scholar]
  6. Choudhury, S.; Gammell, J.D.; Barfoot, T.D.; Srinivasa, S.S.; Scherer, S. Regionally Accelerated Batch Informed Trees (RABIT*): A Framework to Integrate Local Information into Optimal Path Planning. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016. [Google Scholar]
  7. Zhang, L.; Bing, Z.; Chen, K.; Chen, L.; Cai, K.; Zhang, Y.; Wu, F.; Krumbholz, P.; Yuan, Z.; Haddadin, S.; et al. Flexible Informed Trees (FIT*): Adaptive Batch-Size Approach in Informed Sampling-Based Path Planning. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023. [Google Scholar]
  8. Gammell, J.D.; Srinivasa, S.S.; Barfoot, T.D. Batch Informed Trees (BIT*): Sampling-based Optimal Planning via the Heuristically Guided Search of Implicit Random Geometric Graphs. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015. [Google Scholar]
  9. Cao, Z. A Novel Dynamic Motion Planning Based on Error Tolerance Batch Informed Tree. In Proceedings of the 5th WRC Symposium on Advanced Robotics and Automation, Beijing, China, 19 August 2023. [Google Scholar]
  10. Zheng, D.; Tsiotras, P. IBBT: Informed Batch Belief Trees for Motion Planning Under Uncertainty. arXiv 2023, arXiv:2304.10984. [Google Scholar]
  11. Luo, X.; Wang, Q.; Gong, H.; Tang, C. UAV path planning based on the average TD3 algorithm with prioritized experience replay. IEEE Access 2024, 12, 38017–38029. [Google Scholar]
  12. Liu, J.; Yap, H.J.; Khairuddin, A.S.M. Path Planning for the Robotic Manipulator in Dynamic Environments Based on a Deep Reinforcement Learning Method. J. Intell. Robot. Syst. 2025, 111, 3. [Google Scholar]
  13. Kang, D.; Yun, J.Y.; Myeong, N.; Park, J.; Kim, P. Robust Path Planning Using Adaptive Reinforcement Learning in Simulation Environment. In Proceedings of the 2024 13th International Conference on Control, Automation and Information Sciences (ICCAIS), Ho Chi Minh City, Vietnam, 26–28 November 2024. [Google Scholar]
  14. Fan, Y.; Dong, H.; Zhao, X.; Denissenko, P. Path-Following Control of Unmanned Underwater Vehicle Based on an Improved TD3 Deep Reinforcement Learning. IEEE Trans. Control. Syst. Technol. 2024, 32, 1904–1919. [Google Scholar]
  15. Zhou, Y.; Gong, C.; Chen, K. Adaptive Control Scheme for USV Trajectory-Tracking under Complex Environmental Disturbances via Deep Reinforcement Learning. IEEE Internet Things J. 2025. [Google Scholar] [CrossRef]
  16. Wu, X.; Wei, C. DRL-Based Motion Control for Unmanned Surface Vehicles with Environmental Disturbances. In Proceedings of the 2023 IEEE International Conference on Unmanned Systems (ICUS), Hefei, China, 13–15 October 2023. [Google Scholar]
  17. Jiang, D.; Yuan, M.; Xiong, J.; Xiao, J.; Duan, Y. Obstacle avoidance USV in multi-static obstacle environments based on a deep reinforcement learning approach. Meas. Control 2024, 57, 415–427. [Google Scholar]
  18. Tranos, T.; Chaysri, P.; Spatharis, C.; Blekas, K. An Advanced Deep Reinforcement Learning Framework for Docking Unmanned Surface Vessels in Variable Environmental Conditions and Amid Moving Ships. In Proceedings of the 13th Hellenic Conference on Artificial Intelligence, Piraeus, Greece, 11–13 September 2024. [Google Scholar]
  19. Wang, Y.; Cao, J.; Sun, J.; Zou, X.; Sun, C. Path Following Control for Unmanned Surface Vehicles: A Reinforcement Learning-Based Method with Experimental Validation. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 18237–18250. [Google Scholar]
  20. Xi, M.; Yang, J.; Wen, J.; Li, Z.; Lu, W.; Gao, X. An Information-Assisted Deep Reinforcement Learning Path Planning Scheme for Dynamic and Unknown Underwater Environment. IEEE Trans. Neural Netw. Learn. Syst. 2023, 36, 842–853. [Google Scholar] [CrossRef] [PubMed]
  21. Gu, Y.; Wang, X.; Cao, X.; Zhang, X.; Li, M.; Hong, Z.; Chen, Y.; Zhao, J. Multi-USV Formation Control and Obstacle Avoidance Under Virtual Leader. In Proceedings of the 2023 China Automation Congress (CAC), Chongqing, China, 17–19 November 2023. [Google Scholar]
  22. Xu, J.; Huang, F.; Cui, Y.; Du, X. Multi-objective path planning based on deep reinforcement learning. In Proceedings of the 2022 41st Chinese Control Conference (CCC), Hefei, China, 25–27 July 2022. [Google Scholar]
Figure 1. Gridded training map.
Figure 1. Gridded training map.
Applsci 15 03446 g001
Figure 2. Multi-threaded path planning flowchart.
Figure 2. Multi-threaded path planning flowchart.
Applsci 15 03446 g002
Figure 3. Flowchart of the TD3 algorithm.
Figure 3. Flowchart of the TD3 algorithm.
Applsci 15 03446 g003
Figure 4. Results of the traditional BIT* algorithm.
Figure 4. Results of the traditional BIT* algorithm.
Applsci 15 03446 g004
Figure 5. Iterative results of the enhanced BIT*.
Figure 5. Iterative results of the enhanced BIT*.
Applsci 15 03446 g005
Figure 6. Path planning results for A*.
Figure 6. Path planning results for A*.
Applsci 15 03446 g006
Figure 7. Detailed view of enhanced BIT* results.
Figure 7. Detailed view of enhanced BIT* results.
Applsci 15 03446 g007
Figure 8. Detailed path view of A*.
Figure 8. Detailed path view of A*.
Applsci 15 03446 g008
Figure 9. Impact of different numbers of threads on planning efficiency.
Figure 9. Impact of different numbers of threads on planning efficiency.
Applsci 15 03446 g009
Figure 10. Path planning time comparison: single-thread vs. four-thread.
Figure 10. Path planning time comparison: single-thread vs. four-thread.
Applsci 15 03446 g010
Figure 11. Reward function variation over 200 episodes.
Figure 11. Reward function variation over 200 episodes.
Applsci 15 03446 g011
Figure 12. Reward function variation over 500 episodes.
Figure 12. Reward function variation over 500 episodes.
Applsci 15 03446 g012
Figure 13. Reward function variation over 1000 episodes.
Figure 13. Reward function variation over 1000 episodes.
Applsci 15 03446 g013
Figure 14. Reward function variation over 2000 episodes.
Figure 14. Reward function variation over 2000 episodes.
Applsci 15 03446 g014
Figure 15. Potential application areas of the BIT*+TD3 hybrid algorithm.
Figure 15. Potential application areas of the BIT*+TD3 hybrid algorithm.
Applsci 15 03446 g015
Table 1. Comparative results of enhanced BIT algorithm and other path planning algorithms *.
Table 1. Comparative results of enhanced BIT algorithm and other path planning algorithms *.
AlgorithmOptimal Path LengthNumber of TurnsTurning Cost
A *6004478586.00
RRT*507.1747473.62
Original BIT*460.232675.43
Enhanced BIT*430.752440.95
Table 2. Parameter settings for the improved BIT* algorithm.
Table 2. Parameter settings for the improved BIT* algorithm.
Parameter NameDescriptionValue
Start PointInitial position(0,0)
Goal PointTarget position(299,299)
Number of SamplesNumber of new sample points per iteration100
Gaussian StdParameter controlling the Gaussian sampling distribution10
Initial RadiusInitial search radius150
Turning WeightWeighting coefficient for turning cost0.5
Tangent Point DistanceDistance to tangent point for arc path calculation3
Table 3. CPU and GPU utilization data.
Table 3. CPU and GPU utilization data.
Map Size (Grid Cells)Reached Destination?Obstacle Avoidance Successful?CPU Usage
(Clock Speed)
GPU Utilization (%)
Before Optimization2 × 104YesYes4.25 GHz0
After Optimization2 × 104YesYes3.59 GHz56%
Table 4. Hyperparameter settings for training the TD3 decision-making model.
Table 4. Hyperparameter settings for training the TD3 decision-making model.
Hyperparameter NameValueDescription
Maximum Steps per Episode1000The maximum number of steps allowed within each episode.
Episodes200The total number of episodes for training.
Discount Factor0.95The discount factor used for calculating future rewards.
Soft Update Coefficient0.005The parameter controlling the soft update of the target networks.
Policy Noise0.1The standard deviation of the noise added to the target actions.
Noise Clip0.3The range to which the noise is clipped.
Policy Update Delay4The frequency at which the policy network is updated relative to the value network.
Actor Network Learning Rate0.0002The learning rate for the Actor network.
Critic Network Learning Rate0.0002The learning rate for the Critic networks.
Replay Buffer Size2E6The size of the experience replay buffer.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xie, Y.; Ma, Y.; Cheng, Y.; Li, Z.; Liu, X. BIT*+TD3 Hybrid Algorithm for Energy-Efficient Path Planning of Unmanned Surface Vehicles in Complex Inland Waterways. Appl. Sci. 2025, 15, 3446. https://doi.org/10.3390/app15073446

AMA Style

Xie Y, Ma Y, Cheng Y, Li Z, Liu X. BIT*+TD3 Hybrid Algorithm for Energy-Efficient Path Planning of Unmanned Surface Vehicles in Complex Inland Waterways. Applied Sciences. 2025; 15(7):3446. https://doi.org/10.3390/app15073446

Chicago/Turabian Style

Xie, Yunze, Yiping Ma, Yiming Cheng, Zhiqian Li, and Xiaoyu Liu. 2025. "BIT*+TD3 Hybrid Algorithm for Energy-Efficient Path Planning of Unmanned Surface Vehicles in Complex Inland Waterways" Applied Sciences 15, no. 7: 3446. https://doi.org/10.3390/app15073446

APA Style

Xie, Y., Ma, Y., Cheng, Y., Li, Z., & Liu, X. (2025). BIT*+TD3 Hybrid Algorithm for Energy-Efficient Path Planning of Unmanned Surface Vehicles in Complex Inland Waterways. Applied Sciences, 15(7), 3446. https://doi.org/10.3390/app15073446

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop