Path Planning of Mobile Robot Based on A Star Algorithm Combining DQN and DWA in Complex Environment

Zhang, Yilin; Cui, Chang; Zhao, Qiang

doi:10.3390/app15084367

Open AccessArticle

Path Planning of Mobile Robot Based on A Star Algorithm Combining DQN and DWA in Complex Environment

by

Yilin Zhang

,

Chang Cui

^* and

Qiang Zhao

School of Information and Control Engineering, Liaoning Petrochemical University, Fushun 113001, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(8), 4367; https://doi.org/10.3390/app15084367

Submission received: 27 February 2025 / Revised: 22 March 2025 / Accepted: 11 April 2025 / Published: 15 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

The path planning algorithm not only ensures that the mobile robot can avoid obstacles to reach the target point at a safe speed but also ensures that the mobile robot can quickly adapt to the complex changing environment. In this paper, the existing path planning algorithms of mobile robots are analyzed, and then the fusion path planning algorithm is studied. The main work is summarized as follows: A* algorithm is used to complete global path planning and path smoothing, the basic principle of dynamic window method (DWA) is studied, and the dynamic constraints of mobile robots are discussed. The shortcomings of the dynamic window method, i.e., that the dynamic window method does not have the ability to self-learn and self-adapt in the dynamic and unknown environment, are analyzed through simulation experiments. In addition, by studying the basic principle of deep reinforcement learning, the essence and characteristics of DWA algorithm and DQN algorithm are analyzed, which provides ideas for the fusion path planning algorithm based on DQN and DWA. Finally, to cope with the complex and changeable environment and improve the real-time obstacle avoidance ability of mobile robots, a fusion path planning algorithm based on DQN and DWA is proposed. First, the dynamic window method is used to limit the driving of the mobile robot directly to the velocity space. Then, a deep Q network is designed and trained to approximate the state-action value function of the mobile robot, then dynamically interact with the environment to adjust the robot’s moving trajectory in real time, and finally, find the optimal path for the robot. The simulation results show that the fusion path planning algorithm proposed in this paper ensures that the mobile robot has strong generalization ability and robustness under the complex, variable, and dynamic unknown environment. Compared with the existing DWA and DQN algorithms, the proposed fusion path planning algorithm achieves better path planning performance with less training times, shorter computation time, and faster convergence speed.

Keywords:

path planning; A* algorithm; DQN; DWA;

ε

-greedy policy; hybrid algorithm

1. Introduction

Mobile robots have been employed extensively worldwide in recent years because of their independent and adaptable qualities, which are widely used in industrial production, life services, medical and health services, and national defense-related science and technology. Path planning has emerged as the main issue that requires immediate attention due to the growing popularity of mobile robots [1,2,3].

Depending on the different working circumstances, there are two forms of route planning: global static path planning and local dynamic path planning. Only the route planning issue related to mobile robots in static surroundings may be resolved by global static path planning. The Dijkstra algorithm, the A* algorithm, the rapid random tree search algorithm, and others are examples of common techniques [4,5]. Dijkstra’s algorithm and A* algorithm are both used in heuristic search methods in the field of path planning [6,7].The traditional A* algorithm has been frequently studied due to its defects, such as node duplication and a large number of inflection points [8,9,10]. Manhattan-Euclid distance hybrid technology improves the heuristic function, reduces the search time, and improves the search efficiency of the basic A* algorithm [11]. In order to improve search efficiency, the bidirectional algorithm searches in both positive and negative directions at the same time [12]. Dynamic window technique is a local dynamic path planning method. Although the path is quite smooth, it is susceptible to local optimization, and the global optimal path does not allow it to reach the ideal position [13,14,15]. Although genetic algorithm is reliable and easy to integrate with other algorithms, its search time is long, and its convergence speed is slow [16,17,18]. Mohamed Elhoseny and his colleagues proposed a path planning method based on the Bezier curve and Modified Genetic Algorithm (MGA) to solve the problems of traditional path planning algorithms in dynamic environments. This method combines the smoothness of Bezier curve with the global search ability of genetic algorithm and can plan the optimal path for a mobile robot in dynamic environment [19].

A hybrid approach that combines the artificial potential field technique with the A* algorithm to plan routes in a dynamic environment was presented in the literature [20,21,22,23]. However, overall efficiency was reduced by the artificial potential field method’s inability to more effectively design the locally optimum route [24,25].

Reinforcement learning [26,27] was proposed by American scholars SuttonRS and BartoAG in 1998. The advantage of a reinforcement learning algorithm in path planning is that it can learn the optimal strategy without knowing the details of the environment, and it can also adapt to changes in the dynamic environment. However, when the environment model is too large, slow algorithm convergence as well as the problem of dimensionality disaster, which is the non-optimal solution caused by no environment model, will occur because the model needs to calculate the maximum cumulative reward value.

Many scholars at home and abroad have improved the reinforcement learning algorithm and successfully applied it to the path planning of mobile robots. Ding et al. [28] proposed an asynchronous reinforcement learning algorithm based on parallel particle swarm optimization, applying the particle swarm optimization algorithm to the asynchronous reinforcement learning algorithm to find the optimal solution of the algorithm strategy. Li et al. [29] proposed a whale optimization algorithm, which can effectively initialize the Q-table and dynamically adjust greedy strategy values through nonlinear functions, thus greatly improving the convergence speed and route planning accuracy of the algorithm.

Deep reinforcement learning [30] is a method that combines deep learning and reinforcement learning to solve complex decision problems, so it can be used to solve the path planning problem of mobile robots in unknown complex environments. Deep learning (DL) [31,32,33] and reinforcement learning (RL) are both important branches of machine learning. Deep learning realizes efficient feature extraction and pattern recognition by constructing deep neural networks. Reinforcement learning is a method that involves learning how to make decisions by interacting with the environment. Deep Mind [34,35], Google’s artificial intelligence research team, combined a convolutional neural network with a Q-Learning algorithm in reinforcement learning and proposed a Deep Q-Network (DQN) algorithm. This algorithm successfully solves the dimension disaster problem in Q-Learning by using the neural network approximation value function. Scholars have since improved this algorithm. Tai et al. [36] applied the DQN algorithm to the path planning of mobile robots, successfully realized the path planning of mobile robots in complex environments, and improved the rapid adaptability of mobile robots in unknown environments. Lv et al. [37] used the experience value evaluation network in the early stage of DQN algorithm learning to evaluate the value of various actions taken in the current state, to increase the proportion of deep experience, acquire environmental information faster, speed up path planning and improve path accuracy. Xing et al. [38] proposed an incremental map training method for the path planning task of deep reinforcement learning algorithm in a complex unknown environment with the aim of improving the convergence performance of agent path planning. Liu [39] proposed a hierarchical path planning framework integrating greedy algorithm and improved negative feedback ant colony algorithm to solve the problem of path redundancy and obstacle avoidance in the scanning scene of outdoor mobile robots and divided regional priorities (greedy exploration in areas with high exploration value and multi-trajectory decision-making in general areas) by constructing topo grams. The dual mechanism of guidance/warning pheromone dynamic balance path efficiency and obstacle avoidance is introduced, and the effectiveness of the algorithm is verified based on point cloud image and raster simulation (scanning range is increased by 4%). Jiang Sun et al. [40] proposed an improved IDDQN algorithm to solve the problem of slow convergence and low accuracy of path planning in dual-depth Q network (DDQN) under multi-obstacle environment and optimized the efficiency of action evaluation through second-order time-difference method. The outdoor mobile robot experiment was used to verify the anti-local optimal ability in multi-obstacle environment (convergence speed increased by 33.22% in complex environment), providing a new paradigm of path planning with both efficiency and robustness for high-density obstacle scenes.

Aiming at the problems of poor A* robustness, poor DWA adaptive ability, and local optimization in complex environment, this paper combines the above algorithms with the DQN algorithm, which not only strengthens the adaptability of the robot to complex environments, improving the real-time obstacle avoidance ability of the robot, but also achieves better optimization effect compared with the traditional algorithm.

2. Improved A* Algorithm

2.1. Raster Map Model

As shown in Figure 1, the grid approach, which separates the environment space into identical, continuous, discontinuous grids with determined granularity, is used in route planning to generate an environment model. The grid is divided into free and occupied grids based on the real environment data in the route planning; the occupied grid is shown in black, while the free grid is displayed in white. The coordinate origin is selected in the lower left corner of the two-dimensional plane. Cartesian coordinate system: p (x_i, y_i), (i, j = 1, 2, 3, ····, n) indicates each grid’s precise location in the raster diagram. The X-axis represents the grid’s horizontal coordinate axis, and its value increases from left to right; the Y-axis represents its vertical coordinate axis, and its value increases from bottom to top.

The core element of the grid approach is the grid granularity size. The route search procedure will be more challenging, require a significant amount of time and computational power, and not accomplish the desired result if the mesh particles are too tiny. The environmental model and the real environment will vary if the mesh particles are too big. If the route search algorithm is too large, it will not be able to avoid obstacles or even complete the required path planning. Thus, grid granularity is essential for environmental model development and route planning. To facilitate programming, we made the grid side length = 1.

2.2. Conventional Algorithm A*

A* algorithm is a heuristic search algorithm, which has the characteristics of high search efficiency, fast planning speed, overcoming the precocious phenomenon formed in the search process, etc., and is widely used in solving the optimal path. The search principle of A* algorithm is to start from the initial grid point, search the subgrid points around the initial grid point from the surrounding subgrid points, and select the point with the lowest evaluation function as the next search node, that is, the current node. The subgrid points near the current node are regenerated, the lowest point of the evaluation function of the new current node is researched, and the search is conducted successively until the current node is the target location. The evaluation function affects the size of the search space, and the size of the search space and the number of visits affect the speed of the A* algorithm. Therefore, the evaluation function of A* algorithm directly affects the result of path planning and search efficiency. f(n) is the evaluation function of A* algorithm. The formula is as follows:

f (n) = g (n) + h (n)

(1)

The current node is denoted by n, the cost function of the current node is represented by

f (n)

, where n is the next node on the path,

g (n)

is the cost of the path from the start node to

n

, and

h (n)

is a heuristic function that estimates the cost of the cheapest path from n to the goal. The D1 Manhattan distance, D2 Chebyshev distance, and D3 Euclidean distance are often used techniques to compute the produced values. as shown in Figure 2 below.

Since the A* algorithm searches for eight neighboring nodes in each expansion, we choose the Euclidean distance function—which indicates the direct distance between two coordinate nodes—as the heuristic function in this study. Compared to the other functions, the route determined using the Euclidean distance function is more accurate. Consequently, the

h (n)

cost function used in this study is Euclidean distance, and its calculation formula is as follows:

h (n) = \sqrt[2]{{(x_{m} - x_{n})}^{2} + {(y_{m} - y_{n})}^{2}}

(2)

where (

x_{n}

,

y_{n}

) indicates the coordinates of the current path node B, and (

x_{m}

,

y_{m}

) indicates the coordinates of the target node A.

2.3. Adaptive Step Size Adjustment Algorithm

One of the crucial factors influencing mobile robots in the A* algorithm is the step size. As a result, our work suggests an adaptive step size adjustment mechanism. When there are many obstacles in the surrounding region, lowering the step size will increase the number of nodes evaluated each time and make the search path safer and more thorough. When there are fewer obstacles in the immediate vicinity, increasing the step size will speed up and improve the effectiveness of the search. The step size is automatically adjusted according to the distribution of obstacles in order to enhance the robot’s adaptability. When step size-influencing aspects were considered, the distribution of obstacles in the surrounding environment was separated into two categories based on whether the obstacles were dynamic obstacles and the distribution of obstacles’ quantity and placement within a certain range.

Figure 3 shows the robot model with eight motion directions in lieu of the mobile robot, and the robot adopts an eight-neighborhood search. The graphic shows that there are d dynamic obstacles in the moving direction. The dark yellow section has

x_{1}

static obstacles, whereas the bright yellow area has

x_{2}

static obstacles. The closer the static obstacle is to the mobile robot, the higher the threat to the robot. The threat function

f (x_{1}, x_{2})

is defined as:

f (x_{1}, x_{2}) = \begin{matrix} \{\begin{cases} \frac{1}{k_{1} x_{1} + k_{2} x_{2} + c} & , d = 0 \\ 1 & , d \neq 0 \end{cases} \end{matrix}

(3)

The formula uses c as the self-regulating constant and

k_{1}

and

k_{2}

as the static obstacle’s danger factors, and the number of dynamic obstacles in the direction of motion is

d

. Since

x_{1}

is more dangerous to the car, the value

k_{1}

set in this paper is twice that of

k_{2}

,

k_{1}

∈ (1, 2),

k_{2}

∈ (0.5, 1),

c

∈ (0, 1). Then the adaptive adjustment steps are:

l = \{\begin{matrix} f (x_{1}, x_{2}) \cdot l_{m a x}, d = 0 \\ f (x_{1}, x_{2}) \cdot l_{m i n}, d \neq 0 \end{matrix}

(4)

where

l_{m i n}

≤

l

≤

l_{m a x}

,

l_{m i n}

= 0.1 m,

l_{m a x}

= 0.2 m, and

l

is the safe distance between the mobile robot and the obstacle.

The route planning outcomes employing the adaptive adjustment step size method and the conventional A* algorithm are shown in Figure 4 and Figure 5, respectively. The variations in planning time, route node count, and planning time reduction rate are shown in Table 1, as well as the path node number reduction rate before and after the improvement of the method. The two figures demonstrate how the improved A* algorithm considerably lowers the number of search nodes and confirms the adaptive adjustment step size method’s optimization impact.

The performance metrics of the adaptive step size adjustment method and the conventional A* algorithm are contrasted in Table 1. By cutting the running time by 14.06% and the number of nodes by 34.78%, the improved approach increases operational efficiency.

2.4. Path Smoothing Optimization

The Bezier curve is mainly applied to the smooth processing of two-dimensional plane line segments, as shown in Figure 6. The figure is composed of

P_{i}

(i = 0, 1, 2, 3……) six nodes and connecting line segments between them;

P_{0}

is the starting point,

P_{6}

is the end point, and

P_{i}

is the control point. Taking B(t) to represent the coordinates at time t

\in

[0, 1], the Bezier curve formula is:

B (t) = \sum_{i = 0}^{6} (\begin{matrix} 6 \\ i \end{matrix}) P_{i} {(1 - t)}^{6 - i} t^{i}

(5)

Usually, the planned path has many inflection points, which makes it difficult for the robot to move, and poses a huge challenge to the motor’s load. To satisfy the nonholonomic constraints of mobile robots, it is necessary to smooth the motion trajectory. The trajectory smoothing process can reduce the frequency and amplitude of motor start and stop, thus increasing the service life and safety of the robot. Therefore, this paper uses the Bezier curve to optimize the trajectory, The optimized path diagram is shown in Figure 7.

3. Fusion DQN DWA Algorithm

To realize the efficient navigation of robots in dynamic environments, we combine the improved A* algorithm with the DWA algorithm to make full use of the advantages of both algorithms. The core of this fusion strategy is to combine the global path planning ability of A* algorithm with the local obstacle avoidance ability of DWA algorithm. At the same time, the DQN algorithm is integrated to solve the local optimal problem of DWA, the Q value table is updated immediately, and the local path planning is reperformed. Through this collaborative work mode, the robot can enhance the obstacle avoidance ability in complex dynamic environments while ensuring the efficiency of path planning.

3.1. Kinematic Model

To simulate its path and avoid obstacles in real time, the robot’s speed must be spatially sampled. The linear velocity and the rotational velocity are often used to measure the robot’s mobility condition. The evaluation function then chooses the best trajectory from among all trajectories, assuming that the robot’s velocity in unit time is (Vt,

ω_{t}

). The arc trajectory’s kinematics model is as follows, and it may be conceptualized as a straight-line motion at intervals of 1t per unit of time:

\{\begin{matrix} x = x + v_{x} ∆ t \cos θ_{t} - v_{y} ∆ t \sin θ_{t} \\ y = y + v_{x} θ t \sin θ_{t} - v_{y} ∆ t \cos θ_{t} \\ θ_{t} = θ_{t} + ω_{t} ∆ t \end{matrix}

(6)

Define the pose of the robot in the environment q = [x, y, θ, φ] t, and enter its own linear and angular velocity u = [v, w]

In Figure 8,

v

stands for linear velocity,

ω

stands for angular velocity of the front wheel,

u

stands for input (

v

,

ω

),

q

stands for pose, φ stands for the Angle between the longitudinal and X-axis of the body, θ stands for steering angle of the front wheel, and

L

stands for length of body.

DWA can avoid both static and dynamic impediments while designing a local route, but it is prone to local optimality and is unable to reach the desired point. Using the DWA technique, it is simple to choose from the left when the beginning point is set to (4.5, 1.5) and the endpoint is set to (30.5, 1.5), as shown in Figure 9.

3.2. DQN Algorithm

DWA algorithm is used for local path planning in path planning, but it will fall into local optimal situations in complex environments. DQN is a deep reinforcement learning algorithm that combines convolutional neural networks and Q-learning algorithms. DQN uses convolutional neural networks to represent environmental information and then outputs discrete action-value functions. Because it uses neural networks to approximate Q-value functions, DQN can process many states and actions without explicitly storing the Q values corresponding to all possible state-actions. The agent chooses the corresponding action according to the Q-value function to obtain the maximum cumulative reward.

The loss function L(w) of the dual convolutional neural network constructed by DQN through Q-learning algorithm is calculated as follows:

L (w) = E_{π_{w}} [(r + γ m a x^{a^{'}} Q (s^{'}, a^{'}, w^{'}) - Q (s, a, w))^{2}]

(7)

Among them,

w

is the weight of convolutional neural network;

r + γ m a x^{a^{'}} Q (s^{'}, a^{'}, w^{'})

is the target Q value;

Q (s, a, w)

is the predicted Q value.

Then calculate the gradient of the network weight as follows:

\nabla_{w} L (w) = (r + γ m a x^{a^{'}} Q (s^{'}, a^{'}, w^{'}) - Q (s, a, w)) \nabla Q (s, a, w)]

(8)

where the initial weights

w

and

w^{'}

of the double convolutional neural network are the same.

The target Q value function network weight

w^{'}

update formula is:

w^{'} \leftarrow w - α \nabla_{w} L (w)

(9)

where,

α

is the learning step length or learning rate.

The agent explores the environment, such as a grid defined by coordinate points, to generate empirical data. The experience data is stored in the experience playback pool (highest memory unit). The data is randomly sampled from the pool, the gradient is calculated by the loss function and the Q network parameters are updated. The target network parameters are synchronized every N iterations to ensure that the target Q value is stable. The training flow chart of the DQN algorithm is shown in Figure 10.

3.3. Fusion Path Planning Algorithm Framework Based on DQN and DWA

Based on the traditional path planning algorithm and deep Q network, a fusion path planning algorithm based on deep reinforcement learning and dynamic window method is proposed. The fusion path planning algorithm has the following advantages:

(1): With the “dynamic window” feature, the movement of the robot is directly acted on by the speed space, and the movement of the robot is controlled by the linear speed, angular speed, and rotation direction of the robot, so that the robot can maintain a safe speed and avoid collision when it is moving at a speed that is too fast or slow, or is moving at too steep of an angle.
(2): Process the map environment with a too large state-action space on the convolutional neural network model, set the $ε$ -greedy strategy at the same time, and train the neural network. The simulation results show that the fusion algorithm can effectively avoid the problems of dimensionality disaster and slow convergence, ensuring that the robot has strong robustness in real-time in the complex and changeable environment.

To enable the robot to complete the task of path planning in the complex and changeable environment and make the robot’s movement show the following characteristics: dynamic window, deep reinforcement learning, and dynamic window method. These are integrated to carry out the path planning of the mobile robot.

Figure 11 shows a schematic diagram of the fusion path planning framework based on deep reinforcement learning and dynamic window method. First, the robot’s moving environment is input in the form of RGB images. The convolutional neural network is used to process the image to reduce computational complexity: Then, in the current state s, the robot uses the

ε

greedy strategy to select an action by interacting with the environment (dynamic window part). After that, the robot will enter the next state

s^{'}

, and a reward r will be generated in the interaction process: Finally, the robot moves by selecting the action corresponding to the maximum Q value. The training samples can be obtained through the mechanism of experience storage, and then the network parameters are updated using stochastic gradient descent and backpropagation algorithms. The main idea of the experience storage mechanism is that during training, the movement information of the robot at each time point is stored in an experience ancestor U, U = (

s

,

s^{'}

,

r, a

). Then, these experience progenitors are saved in the experience storage pool, and the weight of DQN is updated by gradient descent method to train the data.

The specific calculation steps of the fusion path planning algorithm based on DQN and DWA are described as follows:

Step 1: Initialization: the initial state of the robot, including the position information of the starting point and the initial speed.

Step 2: During the interaction between the robot and the environment (that is, the dynamic window part), the speed corresponding to the action with the largest Q value (

v, ω

) is selected, and the corresponding reward value r is obtained, thus entering the next state

s^{'}

.

Step 3: If the distance between the robot and the nearest obstacle in the next state

s^{'}

is kept within the safe range, that is, dist (

v, ω

) > t, then perform the next step; otherwise, return to the previous state and select the action again.

Step 4: Save the mobile status information of the trained robot in the experience tuple U = (

s, s^{'}

,

r, a

), the random gradient descent method is used to update the network parameter 0 of DQN.

Step 5: Determine whether the final state information of the robot has reached the target point of the path planning experiment. If the target point is reached, the wheel path planning experiment ends; if the target point is not reached, return to step 2 until the target point is reached.

The flow diagram of the fusion path planning algorithm is shown in Figure 12.

3.4. ε-Greedy Policy Settings

In the process of training iteration using deep reinforcement learning, the action with the largest Q value in the current iteration is generally selected, but this will cause some actions that are good but not performed to be missed. Therefore, when selecting the optimal action, the

ε

-greedy policy is used to make the agent have a certain probability of not selecting the action with the largest Q value in the current iteration, and instead select other actions, so that all of the agent’s actions have a probability of selection. However, the selection is too random and has too many uncertain factors, so the traditional

ε

-greedy policy is not necessarily conducive to the agent selecting the optimal action with a high probability.

The

ε

-greedy policy considers the function of balancing development and exploration in the path planning process of mobile robots, which can also be called utilization and exploration. Development means that the robot chooses the action with the largest action value function in the binary distribution of all known state-action spaces, that is, the robot chooses from the known actions, which is called development. Exploration means that the robot chooses actions outside the known state-action space distribution. Development can be understood as getting the most out of the present, while exploration is about making long-term plans, both good and bad. However, the robot can only perform one action in a certain state, which means that either it only looks at the immediate interests or focuses on the future, and the two absolutely cannot be achieved at the same time.

ε

-greedy policy is a strategy designed to balance the conflict between agent development and exploration.

In order to test the performance of the fusion path planning algorithm based on DQN and DWA for mobile robot path planning, we have designed a new

ε

-greedy policy to evaluate robot performance during 200 rounds of path planning training for randomly selected starting points and target points. Compared with the traditional

ε

-greedy policy, we optimize the action selection strategy by creating a unique function to better adapt to each parameter change. The

ε

-greedy policy can be expressed as Formula (10):

ε = ε_i n i ({0.99}^{E})

(10)

where

ε_i n i

represents the

ε

value specified in the original

ε

-greedy policy; E represents the number of rounds the robot goes through in path planning, that is, the number of times the robot successfully travels from the starting point to the target point. Obviously, this function has an exponential decline, that is, with the increasing number of times the robot reaches the target point, the

ε

probability value will continue to decrease; As the number of training times increases, the robot will choose the action corresponding to the maximum Q value with an increasing probability of 1 −

ε

. Finally, the probability of the robot choosing the optimal action in the final training process is greatly increased.

3.5. Hyperparameter Tuning

This section mainly introduces the selection of learning rate.

As shown in Figure 13, smaller learning rates (e.g., 0.0001) converge slowly but steadily. A large learning rate (e.g., 0.01) decreases rapidly in the early stage but fluctuates eventually. The medium learning rate (0.001–0.005) shows the best convergence. The best learning rate (0.005) reached the minimum at about 30 epochs. Obvious training oscillations occurred when the learning rate was >0.005, and the convergence rate was significantly slowed down when the learning rate was <0.001. The early stop strategy effectively avoided ineffective training with too small learning rate. Considering the convergence and loss value problems, the final learning rate is 0.01

3.6. DWA+DQN Simulation Verification

Static and dynamic barriers are the two kinds of impediments that robots may cross during route search. The impacts of simulation are present in both scenarios to confirm the efficacy of robot route planning in response to dynamic impediments, and the outcomes are contrasted to confirm the benefits of the algorithm’s effort to avoid dynamic obstacles. Multiple hyperparameters need to be set during DQN algorithm simulation, and the setting of hyperparameters will affect the algorithm’s running effect. In this paper, the initial value of motion exploration strategy ε is set to 1, multiplied by 0.99 after each training iteration, and the final value is 0.01, so that the mobile robot can fully explore the unknown environment in the early stage of training and quickly plan the path in the later stage of training. In view of this, the hyperparameter settings for the DQN algorithm are shown in Table 2.

The DQN+DWA algorithm simulation diagram is shown in Figure 14. (30.5, 1.5) is the beginning point, while (2.5, 29.5) is the ending. When there are many static obstacles in the area, the robot will dodge them all, arrive at its destination, and finish simulating the static environment. The robot will alter its initial route when it encounters a dynamic impediment in the environment that is situated along the intended path. As shown by the solid blue line in Figure 14, the robot successfully avoids the dynamic barriers and avoids running into the local optimal issue, arriving at the goal spot.

The data in the real environment selected in this paper are obtained by the odometer of the robot, and there are two ways to reduce the error. 1. Self-calibration, calibration and odometer error calibration by inertial measurement unit and LiDAR. 2. Kalman filter is used to smooth and predict odometer data to reduce the impact of errors on the final result. The prediction diagram of Kalman filter is shown in Figure 15 below. The motion state information of the mobile inspection robot is shown in Figure 16 and Figure 17.

In a 200-round path planning experiment with random selection of starting point and target point, the paper evaluates the performance of the trained mobile robot using the

ε

-greedy strategy. The randomness of the experimental environment tests the generalization ability of the trained mobile robot to the complex and variable environment and verifies its robustness and real-time performance. In this section, we measure the performance of the fusion path planning algorithm through four aspects: training results, success rate, expected reward, and average Q max value change of robot path planning, all of which are conducted in real time.

(1): Training results: In each round of path planning experiment, the total number of training times and training failures achieved by the mobile robot from the starting point to the target point.
(2): Success rate: In each round of path planning experiment, the probability that the mobile robot always keeps a safe distance and finally reaches the target point.
(3): Expected reward: The total discount reward received by the mobile robot during the training process.
(4): Average Q max value: In each round of path planning experiment, the mobile robot will enter the next state from the action with the largest Q value output from the current state.

Figure 18 shows the training results of mobile robot path planning. To realize the collision-free path planning from the starting point to the target point, the robot needs to be trained through continuous interaction with the environment to transfer from the current state to the next state and then reach the target point.

In the process of training, the robot may be too close to the obstacle, then the robot may collide with the obstacle, and the robot needs to adjust the direction at this position to keep a safe distance. Therefore, we consider this state a failure state. As can be seen from Figure 18, in the early stage of training, the robot is unfamiliar with the training environment and is in the exploration stage, resulting in large fluctuations in the training of the robot. Therefore, the total number of training rounds and the number of training failures in each round of path planning of the robot change irregularly. With the increase of the number of training rounds, the robot has a full understanding of the environment, and the training can be maintained in a stable state, which indicates that the fusion algorithm can ensure that the robot has a strong generalization ability in the complex and changeable environment.

In addition, Figure 19 plots the variation curve of the success rate of path planning for each round of the mobile robot in the training process. As can be seen from Figure 19, the success rate of robot path planning using this fusion algorithm can be maintained at about 99.20%. At the same time, the change in success rate is closely related to the total number of training rounds and the number of training failures.

Figure 20 shows the change curve of the expected reward of the mobile robot during the training process. Because the robot knows nothing about the environment in the early stage, the probability of the robot’s choice of any action is too large, which leads to a large change in the expected reward obtained in each round of path planning experiment. With the continuous understanding and analysis of the surrounding environment, the probability of the robot choosing the optimal action increases greatly, and the expected reward obtained by each training gradually tends to the stable value. In addition, the change curve of the expected reward is close to the curve of success rate, which means that the change of expected reward is closely related to the results of each round of path planning experiment. In general, the expected reward of each round of the path planning experiment of the robot is a positive reward value, which means that the fusion algorithm can make the robot safely reach the target point while avoiding obstacles.

Figure 21 shows the variation curve of the average Q max value of the mobile robot in the training process. As can be seen from Figure 21, in the early stage of exploration, the robot’s environment is unknown, and the random probability of selecting an action is relatively large, meaning that the action performed by the robot in the current state may not be the action with the largest Q value. Therefore, the average Q max value in the early training process is low. With the increase in the number of training rounds, the robot has a full understanding of the environment and thus enters the stage of using prior knowledge. The robot can choose the action with the largest Q value in the current state to perform, so the average Q max value shows an upward trend. Finally, the average max value can be stabilized at about 67, and the algorithm converges. Therefore, using the fusion path planning algorithm, the mobile robot not only has smooth learning ability during training, but also has robustness under complex and changeable environment. At the same time, combined with the changes of training times in Figure 18, when the algorithm reaches convergence, the training times basically do not change, but the average Q max value changes in a small amplitude, which means that after the convergence of the fusion algorithm, the mobile robot still has self-learning and adaptive capabilities. The most appropriate actions in different path planning experiments are adjusted by constantly adjusting the changes in Q max values.

Figure 22 shows the comparison of training times for each round of path planning experiment of the mobile robot. As Figure 22 shows, in 200 rounds of path planning experiments, the training times of each round show a trend of irregular change, which means that the traditional DWA algorithm cannot adapt to the complex and changeable environment. Obviously, compared with the DWA algorithm and DQN algorithm, the fusion path planning algorithm based on DQN and DWA has fewer training times, the corresponding calculation time is greatly shortened, and the convergence is faster.

Table 3 shows a comparison table of path planning performance. All data are the mean values of multiple path planning experiments. According to the data in Table 3, compared with the success rates of 90.36% and 93.47% of DWA and DQN algorithms, the fusion path planning algorithm based on DQN and DWA can increase the success rate by about 9% and 6% in complex and changing environments. At the same time, the expected reward obtained by the fusion algorithm is also higher.

4. Hybrid Algorithm Based on DQN and DWA

4.1. Hybrid Algorithms

The following is the pseudocode for the A* hybrid algorithm’s route planning process based on DQN and DWA: Algorithm 1.

Algorithm 1. (The Hybrid Algorithm(S,G))

1.: Initialization parameter
2.: Learning rate α = 0.01
3.: Exploration strategy initial ε = 1
4.: Training times = 1000
5.: Def adaptive A*(start,goal,map):
6.: If d = 0
7.: $f (x_{1}, x_{2}) = \frac{1}{k_{1} x_{1} + k_{2} x_{2} + c}$
8.: Else
9.: $f (x_{1}, x_{2}) = 1$
10.: Def DWA(current_pose,velocity_window,map):
11.: Def Hybrid A*(start,goal,map):
12.: For i in range(Training times):
13.: current_pose = start
14.: global_path = adaptive A*(start,goal,map)
15.: While Current_pose! = goal:
16.: State = get_state(current_pose, global_path,map)
17.: If random.random < ε:
18.: action = random.choice(available_actions)
19.: else:
20.: Action = dqn.act(state)
21.: If action == 0:
22.: new_pose = follow_global_path(global_path,current path)
23.: elif: action == 1:
24.: velocity_window = generate_velocity_window()
25.: new_velocity = dwa(current_pose, velocity_window, map)
26.: new_pose = update_pose(current_pose, new_velocity)
27.: reward = calculate_reward(new_pose, goal, map)
28.: next_state = get_state(new_pose, global_path, map)
29.: done = (new_pose == goal)
30.: dqn.remember(state, action, reward, next_state, done)
31.: memory.append((state, action, reward, next_state, done))
32.: If len(memory) > memory_size:
33.: memory.pop(0)
34.: If len(memory) >= batch_size:
35.: dqn.reply(batch_size)
36.: current_pose = new_pose
37.: return final_path

The structure of the whole algorithm is as follows: First, determine whether there is A dynamic obstacle to the application of the global adaptive A* algorithm state. If not, the threat function of the global A* is calculated through the seventh line of algorithm 1, which is formula 3 in this paper. If there is, the threat function is taken as 1. Second, for the local dynamic programming, set DQN and DWA algorithm; if it falls into the local optimal, DQN is used to select the next state action, and if it moves normally, the iterative planning continues, and finally returns to the planning path.

The research indicates that the robot’s route planning should take both static and dynamic obstacle interference into account. The A* algorithm’s global route planning only considers the static barriers in the surroundings; it ignores the impact of dynamic obstacles, which might cause the robot to collide with them. However, the robot enters the local optimum and is unable to reach the goal location since the DWA algorithm’s local route planning only considers the barriers in the immediate area and ignores global path planning. The suggested fusion approach, which combines the enhanced A* algorithm, DWA algorithm, and DQN algorithm to overcome this issue, guarantees not only obstacle avoidance but also the smoothness and optimality of route planning.

The route planning system, which primarily consists of global path planning and local path planning, is designed by combining the enhanced A* algorithm and DWA algorithm, as shown in Figure 23. The global route of the planning goal is formed once global path planning is completed using the enhanced A* algorithm. The information gathered from the sensors in the immediate vicinity is then used to update the local map. The DWA and DQN algorithms are used to complete the local motion path planning, which enables the robot to avoid dynamic obstacles, reach the local goal point, continually update the route, and eventually reach the target position.

The temporary objective points for each stage of the optimization dynamic window technique are derived from the key points of the global route that the enhanced A* algorithm proposes. Together with the DWA and DQN algorithms, the enhanced A* algorithm may address each method’s flaws in real time, finish global route planning, and successfully avoid dynamic barriers. The local target point is generated by the local route planning algorithm in conjunction with the global path, as seen in Figure 24. The global target point is eventually reached after a continual update of the previous local target point and the subsequent local target point. In addition to ensuring the global path’s optimality. In local planning, the fusion algorithm ensures efficient movement impact and obstacle avoidance.

To verify the feasibility and effectiveness of the previously proposed hybrid algorithm’s route planning, the local optimal state of the DWA algorithm may be directly realized. The appropriate simulation is run under identical simulation settings. (30.5, 1.5) remains the beginning position, while (2.5, 29.5) is the ending. The trajectory of global route planning using the conventional A* method is shown by the blue path in the picture. The green route in the picture shows the hybrid algorithm’s search result, which is based on the integration of the DWA algorithm. Figure 25 displays the outcome.

In contrast to the traditional A* algorithm, Figure 25 shows how the fusion approach proposed in this study avoids long paths and unnecessary turning angles. Furthermore, as Table 4 illustrates, the enhanced hybrid approach increases planning efficiency by cutting the route length by 3.62% and the conventional A* algorithm by 50%.

Above is a description of the hybrid algorithm’s trajectory. By accounting for the DWA algorithm’s failure to fall into optimal errors in local route planning and addressing the A* algorithm’s incapacity to avoid dynamic obstacles, the combined advantage of smoothing curves enables tracking to fulfill motor and angular smoothing requirements.

4.2. Improved Hybrid Algorithm Simulation Analysis

Aiming at the proposed algorithm, this paper makes a comparative test of the following five algorithms:

As shown in Figure 26, the path planning times of A*, DLN, DQN, DDPG and Hybrid A* in the same environment are compared.
As shown in Figure 27 the number of A*, DLN, DQN, DDPG and Hybrid A* path nodes in the same environment are compared.
As shown in Figure 28, in the same environment, A*, DLN, DQN, DDPG and Hybrid A* successfully reached the target point comparison.

Five groups were set up in the comparison experiment, which were carried out independently in the same environment (Figure 26, Figure 27 and Figure 28). The five sets of experimental results show that, compared with the other four methods, the hybrid A* algorithm significantly reduces the path planning time and the number of nodes, and improves the precision. The performance comparison of the five algorithms includes path planning time, number of nodes, and accuracy, as shown in Table 5.

The selected values in Table 5 above are the average values of the five groups of experimental results, and the optimization rate is a mix of A* compared with other algorithms. From Table 5, the planning time of the mixed A* algorithm is shortened by 26.28%, the number of nodes is reduced by 40%, and the accuracy is increased by 20% compared with the A* algorithm in the same environment.

Further, this paper also uses the number of training steps of the algorithm plot to evaluate the performance of the training process of the above two algorithms, and the results are shown in the figure below.

In Figure 29, the horizontal coordinate represents the number of algorithm training iterations, and the vertical coordinate represents the number of algorithm training steps (step). As Figure 29 shows, in the early training period, the DQN algorithm had a high number of training steps during 0–60 iterations. The convergence effect of the algorithm was not obvious during 0–60 iterations, and the network model oscillations occurred near sections 80, 100 and 120 of the algorithms. If the environment model was more complex, the algorithm could not converge and could not complete the path planning. As Figure 29 shows, the improved hybrid algorithm has a high number of training steps (between 0 and 45 iterations) at the initial stage of training, and the convergence effect of the algorithm after 45 sections is more obvious. The algorithm will converge rapidly even if there is a short oscillation and tends to be flat, and almost no network model oscillation occurs.

In summary, compared with the DQN algorithm, the improved hybrid algorithm not only speeds up the algorithm convergence speed, but also improves the sample utilization rate, alleviates the phenomenon of network model oscillation, and has better algorithm performance.

5. Experimental Results and Analysis

5.1. Experimental System Platform

To further verify the effectiveness of the proposed fusion algorithm in the real environment, a practical scenario was built to realize the path planning task. ROS was used as an experimental platform (Melodic). The operating system was 64-bit Ubuntu 18.04, and the RAM was 4 GB. Table 6 displays the hardware setup. The robot gathered information about the external world using the LiDAR sensor. The Gmapping method, AMCL, and AMCL mapping modules were then used to finish the positioning and two-dimensional graph creation. Lastly, global and local route planning were implemented using the global and local planners in the move_base module, respectively. Figure 30 shows the overall process, and Figure 31a–c shows the Gmapping process.

Given that the real mobile robot measures 281 × 306 × 641 mm³, a test scene with 12 squares on the long side and 10 squares on the short side was constructed, with a 25 cm square as the unit in the experiment. The mapping of the environment was finished using the AMCL and Gmapping modules, as shown in Figure 32a. As seen in Figure 32b, the theoretically generated map was a 10 × 10 grid map since the robot generated a 25 × 25 cm grid. The impediment is represented by black, and its growth area is represented by the blue region around it. The obstacle’s inflating radius was chosen at 25 cm, considering the real mobile robot’s size.

5.2. Comparison of Experimental Results and Analysis

There is some discrepancy between the theoretical and real maps because the ground in the experimental setting is quite smooth, meaning there is less friction, and the mobile robot will slide a little as it advances. Three distinct goal sites and the same beginning point are selected for the experiment. The starting point of path planning is the upper left corner of the map, the target point G1 is the lower left corner, G2 is the lower right corner, and G3 is the upper right corner. The final path planned by the A* algorithm and the mixed A* algorithm is shown in Figure 33.

The A* algorithms and the Hybrid A* algorithm’s route planning times under various target point circumstances are shown in Table 7. As can be seen, the Hybrid A* algorithm’s route planning time is smoother and shorter than the original A* algorithm’s, resulting in smoother robot movement and a lower need for precise robot control.

The conventional A* algorithm’s route has several turning segments and significant turning degrees, which increases path redundancy, significantly lowers motor life and operating efficiency, and prevents the mobile robot from walking. The robot avoids obstacles in the general direction of both local and global route planning to complete navigation tasks as quickly as feasible. The smoothness of the route is guaranteed, the turning angle and redundancy spots are efficiently decreased, and the path’s length and smoothness are maximized by using the fusion technique suggested in this research.

It is necessary to include dynamic barriers in the route that the fusion algorithm plans since the mobile robot may run into them while walking. Figure 34 displays the route that the mobile robot has planned. The mobile robot’s path-planning procedure at the start, midpoint, and end of obstacle avoidance is shown in the figure. The dynamic obstacle route and dynamic obstacle are represented in the figure by the yellow curve and yellow square.

The mobile robot will go along the prior course until it comes across a dynamic obstacle, as shown in Figure 34a. To complete local route planning, the mobile robot employs radar positioning for emergency obstacle avoidance and steers clear of dynamic impediments when they are present on the road.

The mobile robot encounters intricate obstacles, employs its own sensor to determine its status, and completes its autonomous navigation using the hybrid route planning algorithm and AMCL positioning algorithm, as seen in Figure 34b. The fusion algorithm adds dynamic barriers to the route it plans since it recognizes that the mobile robot may run into them while walking. Before running into a dynamic obstacle, the mobile robot continues its prior course. If the road contains dynamic barriers, the radar will be used by the mobile robot to detect and steer clear of emergency obstacles. Figure 34c presents the experimental findings.

Table 8 displays the outcomes of the experiment. The results demonstrate that the suggested hybrid approach may effectively finish the route planning depending on how smooth the experimental location is. The mobile robot will be impacted by the early marching and navigation. However, this has no effect on the final route planning and obstacle avoidance procedures. As Table 8 shows, under the same environment, the average time consumption of the hybrid algorithm in this paper is reduced by 10.11%, and the number of path inflection points is reduced by 37.5%.

Compared with the traditional algorithm, the safe arrival rate is 50% higher than the traditional algorithm. The results of the research further support the fusion algorithm’s superiority since it includes a dynamic obstacle avoidance capability. It can avoid new dynamic impediments in the route in a timely and reliable manner and has high application and security in actual and complicated dynamic settings.

To verify the experimental results more accurately, this paper draws the simulation environment using the example of a real factory environment. The actual factory environment is shown in Figure 35, and the actual factory simulation environment drawn in the end is shown in the Figure 36b. The whole factory is surrounded by walls. Figure 36a is a top view of the factory environment. The simulation environment is a complex simulation of multiple components and devices, including rooms, oil storage tanks, vehicles, trees, grass, roads, and fences.

As Figure 37 shows, the fusion A* algorithm can plan the global safe path, avoid obstacles in the path, and ensure that the inspection robot reaches the target point smoothly. Compared with the path planned by the A*, DWA, and DQN algorithms, the path length of the improved fusion algorithm is shorter, smoother, and the obstacle avoidance effect is better, which is more suitable for robot motion.

To ensure the accuracy of the experiment, this paper randomly published 20 target points through the host computer and statistically analyzed the relevant data. The final experimental results are shown in Table 9, which can effectively avoid deviation and error in the experiment, to ensure the reliability and effectiveness of the experimental results.

6. Conclusions

This paper introduces the basic principles of A* and DWA. The implementation of the algorithm is mainly divided into three steps: model building, velocity sampling, and trajectory evaluation. Through the implementation of these three aspects, the driving of the mobile robot is directly limited to the velocity space and controlled in a feasible dynamic range. The problems of A* and DWA are pointed out: timeliness and robustness are not strong, and it is easy to fall into local optimization. The DQN algorithm is also introduced in detail. DQN algorithm is the fusion of Q-earning algorithm and convolutional neural network. DQN algorithm contains target network and experience storage mechanism and can deal with large-scale state-action space. This paper proposed a fusion path planning algorithm based on DQN, DWA and A*. The fusion algorithm not only integrates the characteristics of the dynamic window method directly acting on velocity space, it can also handle large-scale state-action space pairs, so that the mobile robot can quickly adapt to the complex changing environment in the training process. The basic framework of the fusion algorithm is constructed, and the DWA algorithm and DQN algorithm are fused effectively. Then, by setting the reward function and

ε

-greed strategy, we can improve the action selection mechanism of mobile robots, so that the probability of the robot selecting the optimal action in the training process increases. The simulation results show that the fusion path planning algorithm has strong generalization ability and robustness, and it can quickly adapt to complex changing environment, meaning that its practical value is significantly improved. In addition, compared with the traditional algorithm, the fusion algorithm obtains more prominent and better-quality path planning performance with fewer training times, shorter computation time, and faster convergence speed.

Although the fusion algorithm has achieved good performance in simulation experiments, the proposed method still needs to be further improved and optimized due to the limited research ability and time of the author. Therefore, the future research direction can be roughly divided into two aspects. First, there are many variants of algorithms in deep reinforcement learning, as well as the deep transformation of network structure, so we still need to continue learning. Deep learning algorithm fusion is an important research topic in mobile robot path planning. On the other hand, experimental verification is limited to a single robot at present, and group experiments on multiple agents are not carried out. The future value of this algorithm needs to be evaluated further.

Author Contributions

Conceptualization, Y.Z. and Q.Z.; methodology, C.C.; software, Y.Z.; validation, Q.Z. and C.C.; formal analysis, Y.Z.; investigation, Y.Z.; resources, Q.Z.; data curation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, Y.Z. and Q.Z.; visualization, C.C.; supervision, C.C.; project administration, Q.Z.; funding acquisition, Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Basic Scientific Research Project of Liaoning Provincial Department of Education, JYTMS20231434.

Data Availability Statement

The experimental data in this paper are calculated from MATLABR2020a simulation platform under different environmental conditions.

Acknowledgments

Thanks to the School of Artificial Intelligence of Liaoning Petrochemical University for providing the physical small robot and the experimental site in this paper.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

DQN	Deep Q-Learning Network
DWA	Dynamic Window Approach
QLN	Q-Learning Network
DDPG	Deep Deterministic Policy Gradient

References

Han, J.; Seo, Y. Mobile robot path planning with surrounding point set and path improvement. Appl. Soft Comput. 2017, 57, 35–47. [Google Scholar] [CrossRef]
Wang, Y.; Liang, X.; Li, B.; Yu, X. Research and Implementation of Global Path Planning for Unmanned Surface Vehicle Based on Electronic Chart; Springer: Cham, Switzerland, 2017. [Google Scholar] [CrossRef]
Wang, Z.; Gao, F.; Zhao, Y.; Yin, Y.; Wang, L. Improved A* algorithm and model predictive control- based path planning and tracking framework for hexapod robots. Ind. Robot. Int. J. Robot. Res. Appl. 2022, 50, 135–144. [Google Scholar] [CrossRef]
Ni, Y.; Zhuo, Q.; Li, N.; Yu, K.; He, M.; Gao, X. Characteristics and Optimization Strategies of A* Algorithm and Ant Colony Optimization in Global Path Planning Algorithm. Int. J. Pattern Recognit. Artif. Intell. 2023, 37, 2351006. [Google Scholar] [CrossRef]
Lai, X.; Wu, D.; Wu, D.; Li, J.H.; Yu, H. Enhanced DWA algorithm for local path planning of mobile robot. Ind. Robot. Int. J. Robot. Res. Appl. 2022, 50, 186–194. [Google Scholar] [CrossRef]
Shi, X.; Liu, H.; Li, Y.; Zhu, B.; Liang, J. Location Planning of Field Ammunition Depotfor Multi-stage Supply Based on Dijstra Algorithm. J. Phys. Conf. Ser. 2021, 2068, 012015. [Google Scholar] [CrossRef]
Kuffner, J.; LaValle, S. RRT-Connect: An Efficient Approach to Single-Query Path Planning. In Proceedings of the 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation, San Francisco, CA, USA, 24–28 April 2000; IEEE: New York, NY, USA, 2000. [Google Scholar] [CrossRef]
Li, Y.; Qi, S.; Zhang, C. Mobile robot path planning based on improved A-star_DWA fusion algorithm. In Proceedings of the 2022 4th International Conference on Robotics, Intelligent Control and Artificial Intelligence, Dongguan, China, 16–18 December 2022. [Google Scholar] [CrossRef]
Wang, S. Application of Computer Artificial Intelligence Technology in Digital Twin Intelligent Traffic Control Planning System. In Proceedings of the 2023 International Conference on Internet of Things, Robotics and Distributed Computing (ICIRDC), Rio De Janeiro, Brazil, 29–31 December 2023. [Google Scholar] [CrossRef]
Li, C.; Huang, X.; Ding, J.; Song, K.; Lu, S. Global path planning based on a bidirectional alternating search A* algorithm for mobile robots. Comput. Ind. Eng. 2022, 168, 108123. [Google Scholar] [CrossRef]
Wei, W.; Dong, P.; Zhang, F. The shortest path planning for mobile robots using improved A~* algorithm. J. Comput. Appl. 2018, 38, 1523–1526. [Google Scholar]
Zhang, Y.; Xia, Q.; Xie, P. Research and Implementation of Path Planning for Mobile Robot in Unknown Dynamic Environment. In Proceedings of the IEEE International Conference on Artificial Intelligence and Computer Applications, Dalian, China, 28–30 June 2021; IEEE: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
Zou, A.; Wang, L.; Li, W.; Cai, J.; Wang, H.; Tan, T. Mobile robot path planning using improved mayfly optimization algorithm and dynamic window approach. J. Supercomput. 2023, 79, 8340–8367. [Google Scholar] [CrossRef]
Dobrevski, M.; Skocaj, D. Adaptive Dynamic Window Approach for Local Navigation. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; IEEE: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
Zhang, J.H.; Feng, Q.; Zhao, A.; He, W.; Hao, X. Local path planning of mobile robot based on self-adaptive dynamic window approach. J. Phys. Conf. Ser. 2021, 1905, 012019. [Google Scholar] [CrossRef]
Cao, J. Robot Global Path Planning Based on an Improved Ant Colony Algorithm. J. Comput. Commun. 2016, 4, 11–19. [Google Scholar] [CrossRef]
Yingqi, X.; Wei, S.; Wen, Z.; Jingqiao, L.; Qinhui, L.; Han, S. A real-time dynamic path planning method combining artificial potential field method and biased target RRT algorithm. J. Phys. Conf. Ser. 2021, 1905, 012015. [Google Scholar] [CrossRef]
Cui, Y.; Ren, J.; Zhang, Y. Path Planning Algorithm for Unmanned Surface Vehicle Based on Optimized Ant Colony Algorithm. IEEJ Trans. Electr. Electron. Eng. 2022, 17, 1027–1037. [Google Scholar] [CrossRef]
Elhoseny, M.; Tharwat, A.; Hassanien, A.E. Bezier Curve Based Path Planning in a Dynamic Field using Modified Genetic Algorithm. J. Comput. Sci. 2018, 25, 339–350. [Google Scholar] [CrossRef]
Yao, Y.; Zhou, X.-S.; Zhang, K.-L.; Dong, D. Dynamic trajectory planning for unmanned aerial vehicle based on sparse A* search and improved artificial potential field. Control. Theory Appl. 2010, 27, 953–959. [Google Scholar]
Xing, S.; Chen, X.; He, W.; Cai, T. An autonomous obstacle avoidancemethod based on artificial potential field and improved A* algorithm for UAV. In Proceedings of the 2022 2nd International Conference on Computer. Communication, Control, Automation and Robotics, Shanghai, China, 29–30 March 2022. [Google Scholar]
Chen, J.; Tan, C.; Mo, R.; Zhang, H.; Cai, G.; Li, H. Research on path planning of three-neighbor search A* algorithm combined with artificial potential field. Int. J. Adv. Robot. Syst. 2021, 18, 17298814211026449. [Google Scholar] [CrossRef]
Jinghui, S.; Yi, Z.; Jun, L.U. Research on Obstacle Avoidance Path Planning of Manipulator based on Artificial Potential Field Method and A* Algorithm. J. Chengdu Univ. Inf. Technol. 2019, 34, 263–266. [Google Scholar] [CrossRef]
Jin, Q.; Tang, C.; Cai, W. Research on Dynamic Path Planning Based on the Fusion Algorithm of Improved Ant Colony Optimization and Dynamic Window Method. IEEE Access 2021, 10, 28322–28332. [Google Scholar] [CrossRef]
Liu, L.; Wang, X.; Yang, X.; Liu, H.; Li, J.; Wang, P. Path planning techniques for mobile robots: Review and prospect. Expert Syst. Appl. 2023, 227, 120254. [Google Scholar] [CrossRef]
Jin, S.; Wang, X.; Meng, Q. Spatial memory-augmented visual navigation based on hierarchical deep reinforcement learning in unknown environments. Knowl.-Based Syst. 2024, 285, 111358. [Google Scholar] [CrossRef]
Fang, J.; Zhang, W.; Ge, L. Path Planning of Mobile Robot Based on Obstacle Avoidance Switching Control. J. Liaoning Petrochemical Univ. 2017, 37, 65–69. [Google Scholar]
Ding, S.; Du, W.; Zhao, X.; Wang, L.; Jia, W. A new asynchronous reinforcement learning algorithm based on improved parallel PsO. Appl. Intell. 2019, 49, 4211–4222. [Google Scholar] [CrossRef]
Li, Y.; Wang, H.; Fan, J.; Geng, Y. A novel Q-learning algorithm based on improved whale optimization algorithm for path planning. PLoS ONE 2022, 17, e0279438. [Google Scholar] [CrossRef] [PubMed]
Botvinick, M.; Wang, J.X.; Dabney, W.; Miller, K.J.; Kurth-Nelson, Z. Deep reinforcement learning and its neuro scientific implications. Neuron 2020, 107, 603–616. [Google Scholar] [CrossRef] [PubMed]
Krizhevsky, A.; Sutskever, L.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Wu, S.; Zhong, S.; Liu, Y. Deep residual learning for image steganalysis. Multimed. Tools Appl. 2017, 77, 10437–10453. [Google Scholar] [CrossRef]
Wang, Y.; He, H.; Tan, X. Truly proximal policy optimization. In Proceedings of the 35th Uncertainty in Artificial Intelligence Conference, PMLR, Tel Aviv, Israel, 22–25 July 2019; pp. 113–122. [Google Scholar]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with Deep Reinforcement Learing. arXiv 2013, arXiv:1312.5602. [Google Scholar]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
Tai, L.; Liu, M. Towards cognitive exploration through deep reinforcement learning for mobile robots. arXiv 2016, arXiv:01610-01733. [Google Scholar]
Lv, L.; Zhang, S.; Ding, D.; Wang, Y. Path planning via an improved DQN-based learning policy. IEEE Access 2019, 7, 67319–67330. [Google Scholar] [CrossRef]
Xing, X.; Ding, H.; Liang, Z.; Li, B.; Yang, Z. Robot path planner based on deep reinforcement learning and the seeker optimization algorithm. Mechatronics 2022, 88, 102918. [Google Scholar] [CrossRef]
Liu, S. Multi-Track Path Planning of Outdoor Scanning Robot in Unknown Scene. Innov. Sci. Technol. 2023, 2, 29–44. [Google Scholar] [CrossRef]
Shuhai, J.; Shangjie, S.; Cun, L. Path Planning for Outdoor Mobile Robots Based on IDDQN. IEEE Access 2024, 12, 51012–51025. [Google Scholar] [CrossRef]

Figure 1. Raster map model.

Figure 2. Nodal distance graph.

Figure 3. Threat weight graph.

Figure 4. Traditional path planning.

Figure 5. Adaptive step size path planning.

Figure 6. Cubic Bessel curve.

Figure 7. Bezier curve optimization path graph.

Figure 8. Kinematic model of mobile robot.

Figure 9. The DWA algorithm is trapped in local optimality.

Figure 10. Flowchart of DQN algorithm.

Figure 11. Schematic diagram of fusion path planning framework based on DQN and DWA.

Figure 12. Flowchart of fusion path planning algorithm based on DQN and DWA.

Figure 13. Learning rate selection parameter graph.

Figure 14. DWA+DQN algorithm simulation diagram.

Figure 15. Kalman filtering diagram.

Figure 16. Mobile robot attitude angle diagram.

Figure 17. Schematic diagram of linear and angular velocity of mobile robot.

Figure 18. Training result of mobile robot path planning.

Figure 19. The change curve of the success rate of mobile robots in the training process.

Figure 20. The change curve of the expected reward of mobile robot in the training process.

Figure 21. The change curve of average Q max value of mobile robot in the training process.

Figure 22. Comparison diagram of training times of each round path planning experiment of mobile robot.

Figure 23. Hybrid algorithm diagram.

Figure 24. Diagram for path planning.

Figure 25. Hybrid algorithm path planning simulation diagram.

Figure 26. A*, DLN, DQN, DDPG, Hybrid A* Comparison of path planning time.

Figure 27. A*, DLN, DQN, DDPG, Hybrid A* Comparison of the number of nodes in path planning.

Figure 28. A*, DLN, DQN, DDPG, Hybrid A* Comparison of the number of times the target point was successfully reached.

Figure 29. Algorithm training comparison graph.

Figure 30. Mini Robot Planning flow chart.

Figure 31. Robot mapping process (a) Initial stage of drawing (b) Middle period of drawing (c) Middle and later stages of drawing construction.

Figure 32. The test scene’s map. (a) The actual setting. (b) The actual environment and the theoretical environment match.

Figure 33. The performance of the path planning task by the robot in a real environment via different algorithms: (a) A* algorithm to goal 1; (b) A* algorithm to goal 2; (c) A* algorithm to goal 3; (d) Hybrid A* algorithm to goal 1, (e) Hybrid A* algorithm to goal 2; (f) Hybrid A* algorithm to goal 3.

Figure 34. Hybrid algorithm dynamic environment path planning.

Figure 35. Realistic factory diagram.

Figure 36. Top view of the factory and the overall construction diagram.

Figure 37. Path comparison map.

Table 1. Performance metrics are compared before and after algorithm tuning.

Algorithm	Planning Time	Number of Nodes	Time Reduction Rate	Node Reduction Rate
A*	0.932S	23	—	—
Improved A*	0.801S	15	14.06%	34.78%

Table 2. Hyperparameter setting of DQN algorithm.

Parameter	Value
Learning rate α	0.010
Exploration strategy ε initial value	1.00
Exploration strategy ε final value	0.01
Training times	1000.00

Table 3. Path planning performance comparison table.

Algorithm	Success Rate	Expected Reward
DWA	0.9036	NA
DQN	0.9347	210.289
DWA+DQN	0.9936	248.125

Table 4. Comparison of algorithm performance indicators.

Algorithm	Transition Node	Smoothness	Avoid Dynamic Obstacles	Path Length	Variance	Number of Experiments
A*	7	×	×	43.079	7.963	20
Improved A*	5	√	×	41.072	2.369	20
DWA	-	√	√	-	9.006	20
Hybrid A* algorithm	4	√	√	40.603	1.045	20

Table 5. Comparison of algorithm performance indicators.

Algorithm	Path Planning Time	Optimization Rate	Path Planning Node	Optimization Rate	Number of Successful Arrivals	Optimization Rate	Number of Experiments
A*	0.9428	26.28%	10	40%	10	20%	5
QLN	0.8694	20.68%	8	25%	10	20%	5
DQN	0.8331	16.61%	8	25%	11	9.09%	5
DDPG	0.8163	14.95%	8	25%	10	20%	5
Hybrid A*	0.695	-	6	-	12	-	5

Table 6. Robot hardware parameters.

Hardware	Parameter
Lidar	Angle of scan: 0–360°
	Scan range: 0.1–25 m
	Sampling frequency: 16,000
Camera	FPS: 15
Camera	Scope of detection: 0.8–6.0 m
IMU	MPU9250
Controller	Control period: 0.01S
Kalman filter	Kalman filter gain: 0.9
	Kalman filtering process noise covariance: 0.01
	Kalman filter observed noise covariance: 1
Master control	Stm32

Table 7. The mean ± standard deviation for ten executions is used to compare the route planning timings of the two algorithms under various target point circumstances.

	A* (ms)	Hybrid A* (ms)
Goal 1	$332.9 \pm 13.04$	$129.0 \pm 10.75$
Goal 2	$336.9 \pm 17.46$	$161.2 \pm 13.28$
Goal 3	$465.2 \pm 21.30$	$217.6 \pm 18.59$

Table 8. Traditional and hybrid algorithms are compared.

Number of Experiments (20)	A*	Hybrid A*	Variance (A*)	Variance (HA*)
Path planning time	5.54S	4.98S	3.012	0.969
Path planning node	8	5	2	1
Target reached without collision	12	18	2	0.5

Table 9. Experimental data of global planning outside the factory.

Algorithm	Average Planning Time (s)	Average Path Length (m)	Average Time Spent (s)
A*	3.98	109.73	859.73
DWA	4.76	120.45	900.10
DQN	2.73	100.03	864.03
Hybrid A*	1.78	94.22	778.23

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Y.; Cui, C.; Zhao, Q. Path Planning of Mobile Robot Based on A Star Algorithm Combining DQN and DWA in Complex Environment. Appl. Sci. 2025, 15, 4367. https://doi.org/10.3390/app15084367

AMA Style

Zhang Y, Cui C, Zhao Q. Path Planning of Mobile Robot Based on A Star Algorithm Combining DQN and DWA in Complex Environment. Applied Sciences. 2025; 15(8):4367. https://doi.org/10.3390/app15084367

Chicago/Turabian Style

Zhang, Yilin, Chang Cui, and Qiang Zhao. 2025. "Path Planning of Mobile Robot Based on A Star Algorithm Combining DQN and DWA in Complex Environment" Applied Sciences 15, no. 8: 4367. https://doi.org/10.3390/app15084367

APA Style

Zhang, Y., Cui, C., & Zhao, Q. (2025). Path Planning of Mobile Robot Based on A Star Algorithm Combining DQN and DWA in Complex Environment. Applied Sciences, 15(8), 4367. https://doi.org/10.3390/app15084367

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Path Planning of Mobile Robot Based on A Star Algorithm Combining DQN and DWA in Complex Environment

Abstract

1. Introduction

2. Improved A* Algorithm

2.1. Raster Map Model

2.2. Conventional Algorithm A*

2.3. Adaptive Step Size Adjustment Algorithm

2.4. Path Smoothing Optimization

3. Fusion DQN DWA Algorithm

3.1. Kinematic Model

3.2. DQN Algorithm

3.3. Fusion Path Planning Algorithm Framework Based on DQN and DWA

3.4. ε-Greedy Policy Settings

3.5. Hyperparameter Tuning

3.6. DWA+DQN Simulation Verification

4. Hybrid Algorithm Based on DQN and DWA

4.1. Hybrid Algorithms

4.2. Improved Hybrid Algorithm Simulation Analysis

5. Experimental Results and Analysis

5.1. Experimental System Platform

5.2. Comparison of Experimental Results and Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI