LSTM-DQN-APF Path Planning Algorithm Empowered by Twins in Complex Scenarios

Lu, Ying; Wang, Xiaodan; Yang, Yang; Ding, Man; Qu, Shaochun; Fu, Yanfang

doi:10.3390/app15084565

Open AccessArticle

LSTM-DQN-APF Path Planning Algorithm Empowered by Twins in Complex Scenarios

by

Ying Lu

,

Xiaodan Wang

,

Yang Yang

,

Man Ding

,

Shaochun Qu

and

Yanfang Fu

^*

School of Computer Science and Engineering, Xi’an Technological University, Xi’an 710021, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(8), 4565; https://doi.org/10.3390/app15084565

Submission received: 18 March 2025 / Revised: 14 April 2025 / Accepted: 16 April 2025 / Published: 21 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

In response to the issues of unreachable targets, local minima, and insufficient real-time performance in drone path planning in urban low-altitude complex scenarios, this paper proposes a fusion algorithm based on digital twin, integrating LSTM (long short-term memory), DQN (Deep Q-Network), and APF (artificial potential field). The algorithm relies on a twin system, integrating multi-sensor fusion technology and Kalman filtering to input obstacle information and UAV trajectory predictions into the DQN, which outputs action decisions for intelligent obstacle avoidance. Additionally, to address the blind search problem in trajectory planning, the algorithm introduces exploration rewards and heuristic reward components, as well as adding velocity and acceleration compensation terms to the attraction and repulsion functions, reducing the path deviation of UAVs during dynamic obstacle avoidance. Finally, to tackle the issues of insufficient training sample size and simulation accuracy, this paper leverages a digital twin platform, utilizing a dual feedback mechanism from virtual and physical environments to generate a large number of complex urban scenario samples. This approach effectively enhances the diversity and accuracy of training samples while significantly reducing the experimental costs of the algorithm. The results demonstrate that the LSTM-DQN-APF algorithm, combined with the digital twin platform, can significantly improve the issues of unreachable goals, local optimality, and real-time performance in UAV operations in complex environments. Compared to traditional algorithms, it notably enhances path planning speed and obstacle avoidance success rates. After thorough training, the proposed improved algorithm can be applied to real-world UAV systems, providing reliable technical support for applications such as smart city inspections and emergency rescue operations.

Keywords:

digital twins; UAV; path planning; LSTM-DQN-APF

1. Introduction

The urban environment contains dense static and dynamic obstacles, which pose significant challenges to the real-time path planning of drones. Drones need to deal with complex building layouts, frequently changing obstacles, and the occlusion effects between obstacles. Therefore, achieving efficient trajectory planning in urban environments is particularly important. Currently, path planning algorithms are typically divided into three categories: traditional path optimization algorithms, heuristic path optimization algorithms, and swarm intelligence-based biomimetic path optimization algorithms. Traditional algorithms rely on accurate environmental information and commonly include GE-APF’s algorithm, artificial potential field (APF) [1] method, A* algorithm [2], and rapidly exploring random tree (RRT) [3,4]. These algorithms perform well in static and simple scenarios but face many issues in dynamic environments. For example, the GE-APF [5,6] algorithm determines the shortest path by traversing the global map, but its discretization process leads to a lack of smoothness in the path and performs poorly in dynamic environments; the APF [7] algorithm provides smooth paths, but it can easily get stuck in local minima or make the target unreachable; the RRT algorithm may fail to find a feasible path in some complex scenarios. Heuristic path optimization algorithms, such as the A* algorithm [8] and simulated annealing algorithm [9], improve search efficiency but still encounter problems like local optima or unreachable goals. Furthermore, swarm intelligence-based biomimetic algorithms, such as genetic algorithms [10], ant colony algorithms [11], and particle swarm algorithms [12], perform well in path planning but also have limitations. Genetic algorithms are prone to premature convergence, and particle swarm algorithms often fall into local optima due to insufficient velocity adjustments. Therefore, existing algorithms struggle to address the challenges of path planning in dynamic and complex environments.

In view of this, researchers have gradually shifted their focus to machine learning, particularly reinforcement learning techniques. Reinforcement learning [13,14], through the interaction between the agent and the environment, can gradually improve path planning strategies. In situations where information is incomplete, reinforcement learning [15,16] can help drones plan the optimal path to avoid dangerous areas.

Currently, path planning methods based on reinforcement learning mostly rely on the Q-learning algorithm [17]. However, this algorithm is prone to getting stuck in local optima, has low learning and computational efficiency, and performs poorly in dynamic obstacle environments [18]. To address this issue, the DQN algorithm [19] introduces neural networks, which can handle high-dimensional state and action spaces. However, there are still issues with training efficiency in complex environments, especially when the environment exhibits temporal dependencies, where traditional algorithms struggle to effectively capture the temporal relationships. To solve these problems, digital twin technology can be incorporated into the path planning process to acquire real-time obstacle information through virtual–real data interaction. Additionally, integrating LSTM [20] into the reinforcement learning framework can leverage its ability to process sequential data, helping the agent better predict future state changes in multi-step decision making and adjust its trajectory in real time.

To address the issues of local minima, unreachable goals, and poor adaptability to dynamic environments in the aforementioned path planning methods, this paper proposes the LSTM-DQN-APF algorithm. This algorithm improves the issues of unreachable goals, local minima, and real-time performance of UAVs by introducing adaptive learning rates, adjusting exploration rates, and designing attraction and repulsion functions, among other methods. Additionally, this paper utilizes a digital twin platform [21] as a sample generator to compensate for the shortcomings of training in real environments, thereby improving the accuracy of path planning. The technical validation of UAVs is typically conducted through initial simulations, but pure simulation environments have low credibility [22,23]. By combining virtual and real-world validation [24,25,26], the digital twin system can comprehensively assess the UAV’s path planning efficiency and obstacle avoidance capability in complex urban environments, ensuring the successful completion of tasks. Furthermore, based on the aforementioned research, this paper verifies the LSTM-DQN-APF algorithm using the digital twin system. The innovations of this paper are as follows:

(1): By combining DQN and APF, and through adjusting the exploration rate, designing reward functions, and optimizing strategies, the issues of target inaccessibility, local minima, and the difficulty of real-time updates in dynamic environments for UAVs are effectively addressed, improving the algorithm’s effectiveness.
(2): To enhance the UAV’s ability to respond to sudden dynamic threats in complex urban environments, this paper proposes a twin-enabled LSTM-DQN-APF algorithm. The twin platform, driven by real-time data interaction and real-world data, enables the UAV to make controllable and predictable decisions in complex environments. Additionally, by incorporating an adaptive learning rate mechanism, the algorithm avoids premature convergence and local optimality issues, thereby improving prediction accuracy and trajectory planning robustness.
(3): Given that conducting real-world tests in urban environments is costly and the data samples from real scenarios are limited, this paper integrates real-world data in real-time based on a twin platform, using it as a virtual sample generator to increase sample diversity. Building on this work, the proposed LSTM-DQN-APF algorithm was validated through virtual–real integration on the constructed twin platform. Experimental results show that the algorithm can effectively address issues such as local minima, unreachable targets, poor adaptability to dynamic environments, and the issue of insufficient real-time path planning accuracy.

The framework structure of this paper is as follows.

The introduction of Section 1 addresses the problem of UAV path planning in complex urban environments and proposes the LSTM-DQN-APF algorithm empowered by digital twins, which improves issues such as local minima, unreachable goals, and real-time performance in traditional algorithms. Section 2 analyzes the limitations of traditional APF and DQN and constructs the LSTM-DQN-APF fusion algorithm, which enhances planning performance through LSTM time-series prediction, DQN decision making, and APF potential field constraints. Section 3 presents an improved algorithm design, enhancing adaptability and decision efficiency in dynamic environments through multi-sensor fusion, dynamic potential field function optimization, and reward mechanism design. Section 4 focuses on the modeling of the twin system and algorithm validation, building a digital twin platform that integrates virtual and physical elements. Three-tier experiments are conducted to verify the algorithm, and results show significant improvements in path planning speed and obstacle avoidance success rates in complex scenarios. Section 5 summarizes the research findings, highlighting the use of virtual–physical integrated digital twin technology to enhance the realism and credibility of the simulation environment while reducing the difficulty and experimental costs associated with transferring from simulation to the real world.

2. Preliminary

In the dynamic and complex urban low-altitude environment, path planning algorithms need to balance real-time performance, safety, and global optimization. Traditional APF methods are difficult to adapt to dynamic obstacles due to their tendency to get trapped in local minima and unreachable goals. While DQN can optimize global strategies, it lacks the ability to model temporal dynamics, leading to lag in trajectory prediction. To address this, this section proposes the LSTM-DQN-APF fusion framework, which combines three modules: “temporal prediction, global decision making, and physical constraints”, to provide theoretical support for autonomous obstacle avoidance in urban low-altitude environments.

2.1. APF

The artificial potential field method, with its intuitive physical model and strong adaptability, has been widely used in UAV path planning for a long time. This method constructs a composite potential field by superimposing the attractive field of the target point and the repulsive field of obstacles, guiding the UAV to move along the gradient direction of the potential field, thus achieving efficient obstacle avoidance in simple scenarios. However, as UAV applications extend into complex urban low-altitude environments, the limitations of traditional APF gradually become apparent. Dense buildings and narrow alleys can easily fail to reach the target point due to the complicated layout of obstacles, as shown in Figure 1a, or cause UAVs to get stuck in local minima, as shown in Figure 1b.

Although the APF algorithm provides an intuitive physical driving framework for UAV obstacle avoidance, its static potential field model struggles to meet the real-time decision-making needs in dynamic obstacle environments. This is particularly evident in complex urban scenarios, where the local optimum traps of traditional potential field functions can lead to the global failure of path planning. To address the challenge of coupling dynamics and uncertainty, this paper introduces the Deep Q-Network algorithm from deep reinforcement learning. By constructing a state-action value function mapping, the DQN enables autonomous strategy optimization for UAVs in complex environments.

2.2. DQN

DQN is an algorithm that combines reinforcement learning and deep learning, and it can effectively handle large-scale state spaces. In traditional methods, the value functions of states and actions are typically stored in a table, but when the state space becomes too large, the table-based approach becomes ineffective. DQN addresses this by introducing deep neural networks to approximate the Q-function, enabling it to handle complex environments.

2.2.1. Q-Learning

The Q-Learning algorithm interacts with the environment and evaluates feedback values and actions based on the current state

s

action

a

. The

Q

table is formed by using

Q (s, a)

to denote the state and

Q (s^{'}, a^{'})

the evaluation function for action

(s, a)

, which corresponds to the q-values corresponding to different state actions. Repeat action a in all states to reach a new state

s

, update the Q-table, and then select the action that maximizes the Q-value in that table to learn the optimal policy. Q-value is calculated as shown in Equation (1):

Q (s, a) = r + γ M a x_{a^{'}} Q (s^{'}, a^{'})

(1)

where

r

is the immediate reward value for the current state

s

and action

a

, and γ is the discount factor indicating the value of

Q

generated by executing a possible future action

a

in the current state

s

to move to the next state. The

Q

values are updated as shown in Equation (2):

Q (s, a) \leftarrow Q (s, a) + α [γ M a x_{a^{'}} Q (s^{'}, a^{'}) - Q (s, a)]

(2)

where

α

is the learning rate and

Q (s, a)

is the evaluation function for performing the action

a

in the

s

state.

2.2.2. The Working Principle of DQN

The

Q - L e a r n i n g

algorithm interacts with the environment and evaluates feedback values and actions based on the current state

s

action

a

. The

Q

table is formed by using

Q (s, a)

to denote the state and

Q (s^{'}, a^{'})

the evaluation function for action

(s, a)

, which corresponds to the q-values corresponding to different state actions. Repeat action a in all states to reach a new state

s

, update the Q table, and then select the action that maximizes the Q-value in that table to learn the optimal policy.

2.3. LSTM-DQN-APF

In addressing the problem of UAV trajectory prediction and obstacle avoidance in complex environments, traditional DQN and APF methods often face challenges such as trajectory prediction delays, local minima traps, and insufficient safety in policy exploration due to a lack of temporal prediction capabilities and adaptability to dynamic environments. This paper introduces the LSTM prediction algorithm, which enables more accurate real-time trajectory predictions for UAVs, helping to avoid premature convergence or falling into local optima caused by environmental changes. The LSTM-DQN-APF integrated algorithm model is shown in Figure 2. Its core architecture collaborates through a “temporal prediction—global decision-making—physical constraints” mechanism, constructing a closed-loop obstacle avoidance system for adaptive complex urban scenarios. Firstly, the LSTM temporal prediction module, based on multi-source sensor data, uses the temporal modeling capabilities of bidirectional long short-term memory networks to precisely capture the motion patterns of dynamic obstacles and predict their future trajectories, effectively solving the prediction delay issue caused by environmental dynamics in traditional APF algorithms. Secondly, DQN, driven by dynamic reward functions and utilizing experience replay and target network separation mechanisms, generates global optimal path strategies, overcoming the local minima dilemma caused by potential field gradient conflicts in APF. At the same time, the physical potential field constraints of APF are introduced as safety boundaries to ensure the obstacle avoidance reliability of DQN during the policy exploration process. The collaborative interaction of these three components constructs a closed-loop system that combines environmental adaptability, decision-making intelligence, and safety constraints, significantly improving the obstacle avoidance performance of UAVs in complex urban environments.

Based on the above model, Figure 3 illustrates the data flow of LSTM-DQN-APF. The specific process is as follows:

For the UAV dynamic obstacle avoidance path planning algorithm based on LSTM-DQN-APF, we first need to define the inputs and outputs, as well as set the initial conditions. The following is a detailed description of Algorithm 1.

Algorithm 1: UAV Dynamic Obstacle Avoidance Path Planning Based on LSTM-DQN-APF

Input: Virtual sensor data, real sensor data (timestamp, raw coordinates, velocity, obstacle point cloud);

Output: UAV action decisions (heading angle, velocity), updated environmental state;

Initialize twin platform and physical UAV entity;

Initialize extended Kalman filter, LSTM network, DQN policy network, and APF potential field parameters;

while goal point is not reached do

Obtain virtual and real sensor data, and perform multi-source data fusion using the extended Kalman filter;

LSTM sequence prediction, input current pre-processed data and previous feedback state, output predicted trajectory;

Based on the current obstacle distribution, perform APF potential field calculation;

Input LSTM and APF outputs into DQN for decision making;

Action execution and feedback

Execute action, update UAV state (new coordinates, velocity);

Collect environmental feedback (new obstacle distribution, reward signal);

if dynamic obstacle avoidance is triggered then

Update hidden state using LSTM;

Backpropagate to adjust DQN network parameters;

else

Maintain current policy network;

end if

end while

end

Through this algorithm, LSTM can capture temporal information, DQN can make dynamic decisions, and the APF potential field provides effective obstacle avoidance capabilities. The combination of the three achieves real-time obstacle avoidance and path planning for UAVs in dynamic environments.

3. Method

This section optimizes the performance of the LSTM-DQN-APF algorithm in three aspects: First, the extended Kalman filter (EKF) is used to fuse LiDAR and IMU data, combined with virtual and physical sensor data from a digital twin platform, enhancing dynamic environment perception and state estimation accuracy; second, the repulsive force function of the artificial potential field method is improved by incorporating velocity and acceleration compensation terms to address path oscillations and response lag in dynamic obstacle scenarios; finally, a multi-dimensional reward function is constructed, integrating goal reward, exploration reward, and heuristic reward. This function quantifies the deviation between the actual and the theoretical optimal paths to achieve dynamic evaluation and optimization of the DQN-APF collaborative decision making.

3.1. Extended Kalman Filter Data Fusion Algorithm

To address the issues of perception accuracy and data reliability that exist with a single sensor in dynamic environments and to enhance the decision-making accuracy of LSTM-DQN-APF, multi-sensor fusion is needed to achieve precise environmental perception and state estimation. This paper adopts LiDAR (light detection and ranging) and IMU for data fusion. LiDAR is typically used for environmental perception (such as mapping, detecting obstacles, etc.), while IMU provides motion state information (such as acceleration, angular velocity, etc.). Through the virtual–real data fusion mechanism, by combining virtual sensor data (from the twin platform’s simulated environment) with real sensor data (IMU and LiDAR measurements from the physical UAV) and using an extended Kalman filter for multi-source data fusion, the system’s ability to respond to environmental changes can be effectively enhanced, thus improving path planning accuracy. The Algorithm 2 steps are as follows:

Algorithm 2: Extended Kalman Filter Data Fusion Algorithm

Input: Virtual sensor data, real sensor data (timestamps, raw coordinates, velocity, obstacle point clouds);

Output: Final fused data;

Initialize state variables: Position information (x, y, z), Velocity (vx, vy, vz), Attitude angles and angular velocity (roll, pitch, yaw, p, q, r);

Initialize state covariance matrix

P (k | k)

;

Initialize process noise matrix Q;

Initialize observation noise matrix R;

Initialize observation information, H is the Jacobian matrix of the observation model, R is the observation noise matrix;

While UAV has not reached the target point:

Predict the IMU state,

\hat{x} (k + 1 ∣ k) = F \hat{x} (k ∣ k) + B u (k)

;

Predict the IMU covariance, P (k + 1 ∣ k) = F P (k ∣ k) F^{T} + Q

;

Perform Lidar observation update, compute Kalman gain, K (k) = P (k + 1 ∣ k) H^{T} {(H P (k + 1 ∣ k) H^{T} + R)}^{- 1}

;

Update state, \hat{x} (k + 1 ∣ k + 1) = \hat{x} (k + 1 ∣ k) + K (k) (z (k) - h (\hat{x} (k + 1 ∣ k)))

;

Update covariance, P (k + 1 ∣ k + 1) = (I - K (k) H) P (k + 1 ∣ k)

;

end while

end

Throughout the process, the IMU continuously provides motion state information, while the LiDAR provides distance information about the environment. The extended Kalman filter continuously fuses the IMU and LiDAR data through prediction and update steps to obtain more accurate state estimates. The result of this virtual–real fusion can further be used in the lower-level APF module, supporting the collaborative decision-making ability of the DQN-APF, thereby improving obstacle avoidance accuracy.

3.2. Improved Gravity and Repulsion Functions

In contrast to traditional artificial potential field methods, which only drive the UAV’s motion through static attractive and repulsive fields without considering the UAV’s own motion state, this leads to path oscillations or delayed responses. Especially when obstacles and UAVs are moving towards each other, traditional repulsive fields cannot predict or avoid collision risks in advance, failing to effectively handle changes in a dynamic environment. Therefore, the improved method introduces velocity and acceleration compensation terms, combining the effects of attractive and repulsive forces. This optimization enhances the traditional artificial potential field method, and the improved function provides support for the collaborative decision-making ability of DQN-APF, enabling better adaptation to changes in dynamic environments.

(1): Gravitational potential field and gravitational function

In the dynamic environment, the gravitational potential field should not only consider the relative position factor between the UAV and the target point, but also the relative velocity factor, which yields the improved equation as:

{\vec{U}}_{att} (X_{R}, V_{R}) = {\vec{U}}_{att} (X_{R}) + {\vec{U}}_{att} (V_{R}),

(3)

where

\vec{U_{a t t}} (X_{R})

is the gravitational force generated by the location factor, expressed as a tendency towards the location of the target point:

U_{att} (V_{R}) = \frac{1}{2} K_{v} {(\vec{V_{R}} - \vec{V_{C}})}^{2}

(4)

where

K_{v}

is the gravitational potential field velocity gain parameter;

V_{R}

is the UAV velocity and

V_{c}

is the dynamic obstacle velocity, then the combined gravitational force is:

\{\begin{array}{l} {\vec{F}}_{a t t} (X_{R}, V_{R}) = {\vec{F}}_{a t} (X_{R}) + {\vec{F}}_{a t t} (V_{R}) \\ {\vec{F}}_{a t t} (X_{R}) = K_{v} d (X_{R}, X_{G}) \\ {\vec{F}}_{a t t} (V_{R}) = K_{v} ∥ {\vec{V}}_{R} - {\vec{V}}_{G} ∥ \end{array},

(5)

where the direction of

{\vec{F}}_{a t t} (X_{R})

is the direction of the UAV pointing to the dynamic obstacle, and the direction of

{\vec{F}}_{a t t} (V_{R})

is the direction of the relative velocity between the UAV and the target point.

(2): Repulsive potential field and repulsive function

In the dynamic environment, the repulsive effect of the obstacle on the UAV should not only consider the position factor, but also the velocity factor and acceleration factor, and the repulsive potential field function applicable to the dynamic environment is:

{\vec{U}}_{r e p} (X_{R}, V_{R}, A_{R}) = \{\begin{array}{l} {\vec{U}}_{r e p} (X_{R}) + {\vec{U}}_{r e p} (V_{R}) + {\vec{U}}_{r e p} (A_{R}) & d (X_{R}, X_{O}) \leq d_{n} \cap V_{R O} \geq 0 \\ {\vec{U}}_{r e p} (X_{R}) & d (X_{R}, X_{O}) \leq d_{n} \cap V_{R O} < 0 \\ 0 & d (X_{R}, X_{O}) > d_{n} \end{array},

(6)

where

{\vec{U}}_{r e p} (X_{R}, V_{R}, A_{R})

is the repulsive potential field used in the dynamic environment;

{\vec{U}}_{r e p} (V_{R})

is the repulsive potential field generated by the velocity factor of the obstacle;

{\vec{U}}_{r e p} (A_{R})

is the repulsive potential field generated by the acceleration factor of the obstacle;

d_{n}

is the critical value of the range of the repulsive potential field;

{\vec{V}}_{R O} = | {\vec{V}}_{R} - {\vec{V}}_{O} | {\vec{e}}_{R O}

is the component of the relative velocity of the UAV and the obstacle on the line between the two;

{\vec{V}}_{RO} > 0

is the obstacle approaching to the UAV;

{\vec{V}}_{RO} < 0

is the obstacle leaving the UAV. Solving for the negative gradient for the repulsive potential field yields a repulsive force of:

{\vec{F}}_{r e p} (X_{R}, V_{R}, A_{R}) = \{\begin{array}{l} {\vec{F}}_{r e p} (X_{R}) + {\vec{F}}_{r e p} (V_{R}) + {\vec{F}}_{r e p} (A_{R}) & d (X_{R}, X_{O}) \leq d_{n} \cap V_{R O} \geq 0 \\ {\vec{F}}_{r e p} (X_{R}) & d (X_{R}, X_{O}) \leq d_{n} \cap V_{R O} < 0 \\ 0 & d (X_{R}, X_{O}) > d_{n} \end{array},

(7)

where

{\vec{F}}_{r e p} (V_{R})

is the repulsive force associated with the velocity and acceleration factors;

{\vec{F}}_{r e p} (A_{R})

is the repulsive force associated with the acceleration, oriented from the obstacle to the UAV, and the scalar expression is:

{\vec{F}}_{rep} (V_{R}) = η_{v} ∥ {\vec{V}}_{RO} ∥,

(8)

{\vec{F}}_{rep} (A_{R}) = η_{A} ∥ {\vec{A}}_{RO} ∥,

(9)

where

{\vec{A}}_{RO}

is the component of the relative acceleration between the UAV and the obstacle on the line between the two, which is in the same direction as

{\vec{V}}_{RO}

;

η_{v}

is the velocity gain parameter of the repulsive potential field;

η_{A}

is the acceleration gain parameter.

(3): Combined potential field and combined force

The integrated potential field applicable to the dynamic environment is the vector sum of the dynamic gravitational potential field and the dynamic repulsive potential field, i.e.,

\vec{U} (X_{R}) = {\vec{U}}_{att} (X_{R}, V_{R}) + {\vec{U}}_{rep} (X_{R}, V_{R}, A_{R})

(10)

The combined force applicable to the dynamic environment is the vector sum of the gravitational and repulsive forces for:

\vec{F} (X_{R}) = {\vec{F}}_{att} (X_{R}, V_{R}) + {\vec{F}}_{rep} (X_{R}, V_{R}, A_{R})

(11)

3.3. Reward Function Design

In the training process of DQN, the agent evaluates the quality of actions through the reward function. The improved reward function combines goal rewards, exploration rewards, and heuristic rewards, working in deep synergy with the enhanced gravity-repulsion potential field function to collectively strengthen the decision-making ability of DQN-APF. This reward function not only encourages the agent to explore unvisited or less-visited states but also guides the agent towards the target point through heuristic rewards, avoiding the risk of getting stuck in obstacles by blindly pursuing rewards. The exploration reward provides feedback by calculating the difference in visit counts, while the heuristic reward is based on the change in the actual path length. By calculating the deviation between the drone’s actual flight path and the theoretical optimal path, as well as considering factors such as energy consumption during the flight, the agent can quantify the effectiveness of obstacle-avoidance actions and assign corresponding reward values. The reward function table is shown in Table 1.

R = C 1 \times r 1 + C 2 \times r 2 + C 3 \times r 3 + C 4 \times (\frac{1}{\max (N (s), ε)} - \frac{1}{\max (N (s^{'}), ε)}) + C 5 \times (D (s, g) - D (s^{'}, g)) + C 6 \times F (s)

(12)

F (s) = k_{r e p} (\frac{1}{d (x, x_{t})} - \frac{1}{d_{0}}) \frac{\nabla d (x, x_{t})}{d^{2} (x, x_{t})} d^{α} (x, x_{g})

(13)

where C1, C2, C3, C4, C5 are weights. n(s) is the number of visits to state s, N(s′) is the number of visits to state s′, D(s, g) is the distance from the current state to the goal point, and D(s′, g) is the distance from the next state to the goal point. Positive rewards are given when the distance to the goal point is closer and no rewards are given when the distance is farther.

4. Twin System Modeling and LSTM-DQN-APF Algorithm Validation Analysis

To verify the performance of the LSTM-DQN-APF algorithm in complex urban scenarios, a “perception-simulation-decision-control” closed-loop virtual–physical twin platform was first constructed, integrating physical drones, digital city models, and a twin service layer. Next, real-time synchronization of virtual and physical data was achieved through the UDP/MAVLink [27] protocol and the ROS framework. Finally, a three-level progressive experiment validated that, under the empowerment of the twin platform, the “prediction-driven-decision optimization” architecture effectively enhanced the algorithm’s adaptability and robustness in complex environments.

4.1. System Setup

To verify the effectiveness of LSTM-DQN-APF in complex urban scenarios, a hybrid physical–virtual twin platform needs to be built. Through twin technology, a full-process closed loop of environmental perception, trajectory prediction, and autonomous decision making is enabled. The first step is to build an accurate and dynamically adjustable experimental digital twin, creating a hybrid physical–virtual testing environment. As shown in Figure 4a, this hybrid platform consists of the physical subsystem, virtual subsystem, data fusion, twin service layer, and the connections between them, forming a “perception-simulation-decision-control” closed-loop architecture. Figure 4b shows the sensor monitoring interface of the twin system. This system uses dynamic perception and information fusion to create a digital virtual image space of the experimental field environment, generating diversified samples that closely approximate the real physical world to verify the LSTM-DQN-APF algorithm. In the construction of the physical system, the physical UAV is responsible for real-time environmental perception and data collection. In the virtual system, the focus is on building the digital model of the urban experimental environment and the UAV being tested. Additionally, the twin service layer performs operational control based on task requirements and provides algorithmic support for specific tasks. Through the combination of physical and virtual systems, the experimental field achieves dynamic feedback and continuous optimization processes from environmental perception to multi-sensor information fusion and task decision making. Through a data-driven closed-loop system, the task strategy and behavior parameters are continuously optimized during the experimental process, thereby improving path planning accuracy.

4.2. Virtual and Real Communications

The method of achieving interactive communication between the virtual and physical parts of the constructed twin system is as follows: As shown in Figure 5, the flight control system of the physical UAV is connected to the onboard computer via a serial port. The flight control system communicates with the local ground station computer through the UDP protocol, allowing the ground station to obtain real-time flight data and send control commands for remote monitoring and operation. The onboard computer communicates with the flight control system using the MAVLink protocol, and after processing and analyzing the data using the modular ROS framework, the data are sent to the local computer. The local computer then transmits the processed data to the virtual platform via TCP/UDP data pass-through, controlling the virtual UAV to achieve virtual–physical synchronization.

4.3. Algorithm Validation Test and Analysis

4.3.1. Test Ideas

This paper is based on the technical framework of the digital twin platform “Twin Empowerment—Prediction-Driven—Decision Optimization” and designs a three-level progressive experimental validation of the performance of the LSTM-DQN-APF algorithm. Experiment 1 focuses on verifying the consistency of the virtual–physical mapping and the effectiveness of the interface communication. A twin city model is generated through virtual space simulation, and real-time perception data from the physical UAV is synchronized, thereby constructing an accurate data foundation for twin empowerment. Experiment 2 builds upon the previous research, performing a horizontal comparison of the trajectory prediction errors on the x, y, and z axes between the LSTM-DQN-APF model and the LSTM, BP, and CNN models. This is combined with an analysis of the axis-by-axis matching degree of the actual trajectory of the physical UAV to validate the proposed algorithm’s advantage in trajectory prediction in multi-dimensional space. It also reveals how the twin data-driven LSTM module improves the accuracy of dynamic obstacle trajectory prediction, thus supporting the collaborative decision-making ability of Dqn-Apf. Experiment 3 further simulates scenes of static obstacles, dynamic obstacles with random motion, and wind field intensity gradient changes within the twin system. The algorithm’s adaptability in urban environments is comprehensively evaluated in terms of average flight time, average path, task completion rate, and other dimensions. Through the three-level experimental loop, the feasibility and robustness of the “Prediction-Decision” link under twin platform empowerment in complex urban scenarios are comprehensively verified.

4.3.2. Experiment 1: Verification of Consistency Between Virtual and Real Environments and Verification of Interface Interoperability

Regarding the consistency between the virtual and real aspects established in the UAV digital twin verification platform and the availability of communication interfaces between the real and virtual domains, this paper will conduct experiments designed to reflect reality through virtual simulations. The designed scenario is a single UAV cruising in a quadrilateral, with waypoints being (0,0,4), (0,5,4), (5,5,4), (5,0,4), and (0,0,0) in sequence. The experiment will control the flight of the virtual UAV in AirSim 1.5.0 [28] using the control functions of the physical UAV and send waypoint information to the virtual UAV via a socket communication protocol. By comparing the flight trajectory data of the real and virtual UAVs, a consistency evaluation and analysis of the alignment between the virtual and real UAV flight conditions will be conducted. As shown in Figure 6, (a) represents a comparison of the three-dimensional flight trajectories of both UAVs, and (b) shows the comparison of the two-dimensional trajectories.

Using the actual UAV’s flight trajectory as the reference, the average errors of the virtual UAV’s position data in the X, Y, and Z directions were calculated to be 0.24 m, 0.27 m, and 0.23 m, respectively. The mean error is relatively small, and the trajectory closely aligns with the preset one, which meets the simulation requirements of the virtual UAV on the UAV real–virtual integrated verification platform. Based on the consistency of the virtual–real mapping and the effectiveness of the interface communication, subsequent tests were conducted.

4.3.3. Experiment 2: Verify the Accuracy of LSTM-DQN-APF Prediction

(1): LSTM-DQN-APF

The configuration of parameters in the LSTM-DQN-APF network has a significant impact on both prediction accuracy and speed. To optimize these parameters, this paper employs an LSTM architecture with two hidden layers, each containing 50 neurons. Additionally, the DQN component includes three fully connected hidden layers, each containing 128 neurons, which enhances the decision-making and prediction capabilities. During the training process, the learning rate is dynamically adjusted to ensure effective convergence at different stages, and a batch size of 40 is used for each training iteration. The model selects five main features: latitude, longitude, altitude, horizontal velocity, and vertical velocity, and generates time-series data for each variable. Therefore, the actual input data for each batch are represented as a three-dimensional tensor with a shape of B × 40 × N, where B represents the time steps and N = 5 denotes the number of features. To improve prediction accuracy, we perform a comparative analysis of the prediction errors across time steps from 5 to 50, thereby determining the optimal time step for the LSTM-DQN-APF network in quadrotor UAV trajectory prediction. To evaluate the performance of the LSTM-DQN-APF model, we use mean absolute error (MAE) as the evaluation metric. MAE quantifies the closeness between the predicted UAV trajectory points and their actual values; a lower MAE indicates that the LSTM-DQN-APF-based quadrotor UAV trajectory prediction model has higher accuracy.

As can be seen from Table 2, when the step size is set to 40, the prediction errors for both flight altitude and latitude are minimized. The optimal step size for predicting longitude is 45, which yields the smallest measurement error. Therefore, the optimal step sizes for predicting the flight height of a quadrotor UAV, latitude, and longitude are 40, 40, and 45, respectively. Table 1 also indicates that as the time step length increases, the prediction accuracy improves, suggesting that LSTM-DQN-APF networks perform better with longer time-series data. However, beyond a certain step size, increasing the step size further leads to a decrease in prediction accuracy.

As shown in Table 3, under the given network structure and time step conditions for the LSTM-DQN-APF model, the multi-step prediction results for the quadrotor UAV trajectory demonstrate that one-step and two-step predictions fall within an acceptable error range, while the error for three-step predictions increases significantly.

(2): verifies the prediction errors of the LSTM-DQN-APF, LSTM, BP, and CNN models along the x, y, and z axes

As illustrated in Figure 7, the predictions of the four models for the three coordinate axes (x, y, z) generally follow the trend of the actual trajectory. However, the prediction curves of the BP (backpropagation algorithm) and CNN (convolutional neural network) models exhibit significant deviations from the actual trajectory. In contrast, the LSTM and LSTM-DQN-APF models demonstrate smaller prediction errors. Notably, the prediction error for the z-axis is marginally higher compared to the x and y axes across all models, with the BP and CNN models showing considerable fluctuations at various positions. Although the LSTM model’s prediction error for the z-axis height remains within an acceptable range, it is still greater than that of the LSTM-DQN-APF model. The figure clearly indicates that the trajectory predicted by the LSTM-DQN-APF model most closely aligns with the actual trajectory, exhibiting the minimum error and the highest accuracy.

4.3.4. Experiment 3: Algorithm Validation Test for Urban Warfare Scenario

In this paper, we simulate a complex urban battlefield scenario that includes roads, trees, buildings, landscape trees, street lights, built-up areas, and walking pedestrians. As shown in Figure 8, it is a panoramic dynamic threat map of the urban scene.

The main obstacle information in the urban scenario is shown in Table 4.

In order to increase the sample diversity, this paper conducts three different experiments: (1) the presence of static obstacles; (2) the presence of dynamic obstacles; and (3) the introduction of the effect of wind. The setup of these experiments helps to provide richer training data for the intelligences, thus enhancing their adaptive ability under different environmental conditions. The size of the experience pool in this paper is 10,000, and these experiments can make the samples stored in the experience pool more diversified, thus improving the performance of the model in various complex scenes. In order to further verify the performance of this method, the LSTM-DQN-APF path planning method was compared with APF and GE-APF [29,30,31] under the same testing environment, where the attraction and repulsion of the GE-APF algorithm can be defined as:

F_{a} = \frac{q_{t, m}}{| q_{t, m} |} [k_{a 1} (1 - e^{- b_{1} q_{t, m}^{2}}) + k_{a 2} (1 - e^{- b_{2} v_{t, m}^{2}})]

(14)

F_{r} = - \frac{q_{0, m}}{| q_{0, m} |} [k_{r 1} e^{- b_{3} q_{0, m}^{2}}] + k_{r 2} (1 - e^{- b_{4} v_{0, m}^{2}})] ν_{o, m} < 0

(15)

K_{a 1}

and

K_{a 2}

are the gain factors of the attractive force, where their summation produces the maximum attractive force. The gain factors of the repulsive force are denoted as

K_{r 1}

and

K_{r 2}

, where their summation gives the maximum repulsive force.

b_{1}

,

b_{2}

,

b_{3}

, and

b_{4}

are positive constants that contribute to identifying the required minimum velocity and minimum displacement that generate the maximum attractive and repulsive Forces.

(1): Static obstacles

In order to ensure the comprehensiveness and accuracy of the experiment, 2 task points were set up in the scene, each of which was independently experimented for testing the navigation performance of the UAV in different situations. During the experiment, the UAV adopts the traditional artificial potential field method and the improved artificial potential field method, respectively, and is considered to have completed a flight mission when it starts from the starting position (0,0,0) and successfully reaches the task point while ensuring that it avoids all the obstacles in the scene. In order to ensure the reliability of the experimental results, 10 independent experiments were conducted for each task point. The performance of the three algorithms is evaluated by comparing the flight trajectories, flight times, and flight distances of the three algorithms in accomplishing the tasks.

The detailed information of the setup task points is shown in Table 5. The task points are set from near to far, covering different spatial locations and obstacle distributions, aiming to comprehensively test the navigation performance and adaptability of the artificial potential field method.

As shown in Figure 9, the experimental trajectory data of the traditional artificial potential field method and the improved artificial potential field method in the UAV virtual–real integration verification platform for Task 1 is presented. The blue represents LSTM-DQN-APF, the green represents Ge-Apf, and the red represents APF. It can be seen that compared to APF and GE-Apf, the LSTM-DQN-APF algorithm trajectory is smoother and does not exhibit severe fluctuations or abrupt changes.

From Figure 10, it can be seen that the flight time of LSTM-DQN-APF, represented by blue, is significantly shorter in all 10 experiments compared to APF, represented by red, and Ge-Apf, represented by green. The average flight time of the LSTM-DQN-APF algorithm is 12.87 s, while the average flight times of Ge-Apf and APF are 16.65 s and 19.16 s, respectively. The difference in flight times reflects the significant improvement in flight efficiency of LSTM-DQN-APF. The shorter flight time not only means that the UAV can complete flight missions faster but also reduces energy consumption, thereby extending the UAV’s endurance. Figure 10 compares the performance of the three algorithms in terms of flight distance. The average flight distance of the LSTM-DQN-APF algorithm to the target point (20,20,5) is 32.89 m, while the Ge-Apf and APF algorithms are 36.73 m and 39.81 m, respectively. The average variance of the flight time for APF is 15.424 s, for Ge-Apf it is 6.72 s, and for LSTM-DQN-APF it is 1.336 s; the average variance of the flight distance for APF is 67.942 m, for Ge-Apf it is 9.41 m, and for LSTM-DQN-APF it is 9.896 m.

Figure 11 shows the analysis of the trajectory data of the UAV flying to the target point (40,20,5) in task 2. From the figure, it can be seen that its main obstacle is the building 1 located at (20,10,0) with a height of 10 m. In the face of this obstacle, LSTM-DQN-APF, Ge-Apf, and APF chose different avoidance directions. Compared to Ge-Apf and APF, LSTM-DQN-APF’s trajectory was smoother, and the distance consumed to avoid the obstacle was smaller.

In Figure 12, it can be seen that LSTM-DQN-APF outperforms Ge-Apf and APF in both flight time and flight distance. The average flight time of LSTM-DQN-APF is 21.15 s, and the average flight distance is 51.23 m, while the average flight time of Ge-Apf is 26.98 s, and the average flight distance is 65.42 m. The average flight time of APF is 30.33 s, and the average flight distance is 71.6 m. For flight time, the average variance of APF is 15.424 s, the average variance of Ge-Apf is 6.87 s, and the average variance of LSTM-DQN-APF is 1.121 s; for flight distance, the average variance of APF is 67.942 m, and the average variance of Ge-Apf is 10 m. The average variance of the flight distance of LSTM-DQN-APF is 9.896 m.

(2): Dynamic obstacles

The starting point coordinates of the UAV are (0,0,0), and the target point coordinates are (10,50,0). The motion parameters of the three spheres are shown in Table 6. As can be seen from Figure 13, there are dynamic obstacles in the scene, including a moving sphere and pedestrian, and the movement direction of the sphere has been marked in the figure. Figure 13a shows the static state; Figure 13b shows the state after 15 s of exercise.

As shown in Figure 14a, the task is segmented into three target waypoints: the first waypoint is located at coordinates (4,20,5), the second at (12,25,5), and the third at (10,50,0). As illustrated in Figure 14a, when the UAV approaches the first dynamic obstacle, it begins to adjust its trajectory due to the perceived relative velocity and the repulsive force exerted by the obstacle. Figure 14b depicts the second phase of trajectory planning in a dynamic environment, demonstrating the UAV’s successful avoidance of both the dynamic sphere and pedestrians as it continues toward the target waypoint. Upon entering the influence zone of the second dynamic obstacle, the flight path planning system initiates adjustments to the UAV’s course based on the obstacle’s impact. Figure 14c illustrates the final stage of trajectory planning in a dynamic environment, where the UAV successfully reaches its designated target.

As shown in Figure 15, the evolution curves of the number of iterations versus the reward value for DQN and LSTM-DQN-APF are demonstrated. From the figure, it can be seen that DQN shows fast learning ability in the initial phase of the experiment, and its reward value rises rapidly, which implies that it can adapt to the environment relatively quickly and can find a suitable path planning strategy quickly. However, in the middle and late stages, the reward value of DQN shows a more obvious decreasing trend, which may indicate that the algorithm has some problems in the subsequent iterations, such as falling into the local optimal solution or deteriorating its adaptability to the environmental changes.

In contrast, LSTM-DQN-APF is able to quickly rise to a positive reward value at the beginning of the experiment, which fully demonstrates that the algorithm has a stronger fast learning ability and is able to find an effective strategy for avoiding obstacles faster than the DQN algorithm. LSTM-DQN-APF not only performs well at the beginning of the learning process but also has better stability with relatively small fluctuations in its reward value throughout the iteration process, which further demonstrates the superior performance of the LSTM-DQN-APF algorithm in handling this task. This further indicates that the LSTM-DQN-APF algorithm has better performance in handling this task.

(3): Wind effects

As shown in Figure 16, a horizontal westward wind field with a wind speed of 5 m/s is added to the simulation environment, and the mission target point is set to (10,0,5) for the virtual UAV to fly towards the target point with a speed of 1 m/s. The green line is the flight path of the virtual UAV without disturbing wind, and the purple line is the actual flight trajectory of the virtual UAV under the influence of wind.

As shown in Figure 17, the UAV starts from (0,0,0), the target point location is (5,25,0), and the flight speed is 1m/s. The red represents LSTM-DQN-APF moving towards the target point; Ge-Apf and APF initially keep circling under the influence of wind. In real situations, such flying behavior of the UAV is very dangerous, and the APF falls into a local minimum after 15 s.

Table 7 compares the average flight time, average path length, and task completion rate of APF, Ge-Apf, and LSTM-DQN-APF under windy and non-windy conditions. The LSTM-DQN-APF algorithm shows significant advantages in average flight time, average path length, and task completion rate under both windy and non-windy conditions. In contrast, the APF algorithm has many issues in both environments. Under windless conditions, when applying APF, the UAV reached an unreachable target state after flying for 20 s, resulting in increased experimental time and path length. In windy conditions, the APF algorithm causes the UAV to be affected by the wind field, deviating from the target point. After flying for 15 s, it gets trapped in a local minimum. Under the influence of the wind field, the virtual UAV in the simulation environment experiences multiple collisions and fails to find the correct path for an extended period. This not only increases the experimental time and flight path length but also potentially introduces safety risks in practical applications. While the Ge-Apf algorithm outperforms APF in certain aspects, its overall performance is still inferior to LSTM-DQN-APF. This indicates that the LSTM-DQN-APF algorithm has better performance and adaptability in path planning and flight control for UAVs, especially in complex environments (such as wind fields), providing more reliable algorithmic support for the application of UAVs in real-world complex environments.

5. Discussion

In this paper, by means of virtual–real combination, the proposed LSTM-DQN-APF is verified to be able to effectively solve the local minima, target unreachability, and time-varying problems in complex battlefield environments faced by the traditional artificial potential field method in UAV trajectory planning. The virtual–reality interaction system therein presents the physical entity in digitalization with a multi-dimensional, multi-space, multi-scale model and takes the multi-source heterogeneous data as a link, connects the physical entity with the virtual space operation in real time, maps it in real time, and refreshes it in real time; constantly improves the model and parameters of the simulation environment; gradually improves the realism of the simulation environment; improves the credibility of the simulation results; and reduces the difficulty of migrating from the simulation environment to the real environment. The final goal is to improve the simulation environment with the help of the virtual space simulation. The ultimate goal is to further improve the physical entity verification capability based on digital twin technology by means of simulation and visualization in virtual space.

Author Contributions

Conceptualization, Y.L. and Y.F.; methodology, X.W.; software, X.W.; validation, Y.L.; formal analysis, Y.L.; investigation, S.Q.; resources, Y.F. and Y.Y.; data curation, Y.Y. and M.D.; writing—original draft preparation, X.W.; writing—review and editing, Y.L.; visualization, M.D.; supervision, Y.L.; project administration, Y.F.; funding acquisition, Y.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Fang, Y.; Lu, L.; Zhang, B.; Liu, X.; Zhang, H.; Fan, D.; Yang, H. Motion control of obstacle avoidance for the robot arm via improved path planning algorithm. J. Braz. Soc. Mech. Sci. Eng. 2024, 46, 727. [Google Scholar] [CrossRef]
Dong, Z.; Chen, Z.; Zhou, R.; Zhang, R. A hybrid approach of virtual force and A∗ search algorithm for UAV path re-planning. In Proceedings of the 2011 6th IEEE Conference on Industrial Electronics and Applications, Beijing, China, 21–23 June 2011; pp. 1140–1145. [Google Scholar]
Zhuge, C.; Wang, Q.; Liu, J.; Yao, L. An Improved Q-RRT* Algorithm Based on Virtual Light. Comput. Syst. Sci. Eng. 2021, 39, 107–119. [Google Scholar] [CrossRef]
Ye, L.; Li, J.; Li, P. Improving path planning for mobile robots in complex orchard environments: The continuous bidirectional Quick-RRT* algorithm. Front. Plant Sci. 2024, 15, 1337638. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Cheng, C.; Yang, D.; Pan, G.; Zhang, F. Path planning in localization uncertaining environment based on Dijkstra method. Front. Neurorobotics 2022, 16, 821991. [Google Scholar] [CrossRef]
Çakır, E.; Ulukan, Z.; Acarman, T. Time-dependent Dijkstra’s algorithm under bipolar neutrosophic fuzzy environment. J. Intell. Fuzzy Syst. 2022, 42, 227–236. [Google Scholar] [CrossRef]
Wang, J.; Wu, Z.; Yan, S.; Tan, M.; Yu, J. Real-time path planning and following of a gliding robotic dolphin within a hierarchical framework. IEEE Trans. Veh. Technol. 2021, 70, 3243–3255. [Google Scholar] [CrossRef]
Zhang, H.; Tao, Y.; Zhu, W. Global Path Planning of Unmanned Surface Vehicle Based on Improved A-Star Algorithm. Sensors 2023, 23, 6647. [Google Scholar] [CrossRef]
Wang, J. A Simulated Annealing Algorithm and Grid Map-Based UAV Coverage Path Planning Method for 3D Reconstruction. Electronics 2021, 10, 853. [Google Scholar] [CrossRef]
Geng, R.; Ji, R.; Zi, S. Research on task allocation of UAV cluster based on particle swarm quantization algorithm. Math. Biosci. Eng. 2023, 20, 18–33. [Google Scholar] [CrossRef]
Yang, Y.; Chen, Z. Optimization of Dynamic Obstacle Avoidance Path of Multirotor UAV Based on Ant Colony Algorithm. Wirel. Commun. Mob. Comput. 2022, 2022, 1299434. [Google Scholar] [CrossRef]
Ma, W.; Fang, Y.; Fu, W.; Liu, S.; Guo, E. Cooperative localisation of UAV swarm based on adaptive SA-PSO algorithm. Aeronaut. J. 2023, 127, 57–75. [Google Scholar] [CrossRef]
Guo, T.; Jiang, N.; Li, B.; Zhu, X.; Wang, Y.; DU, W. UAV navigation in high dynamic environments: A deep reinforcement learning approach. Chin. J. Aeronaut. 2021, 34, 479–489. [Google Scholar] [CrossRef]
Nian, R.; Liu, J.; Huang, B. A review On reinforcement learning: Introduction and applications in industrial process control. Comput. Chem. Eng. 2020, 139, 106886. [Google Scholar] [CrossRef]
Li, Y.; Wang, Y.; Zhou, Y. Multiagent Deep Reinforcement Learning Algorithms in StarCraft II: A Review. IEEE Access 2024, 12, 167452–167470. [Google Scholar] [CrossRef]
Tang, X.; Li, X.; Yu, R.; Wu, Y.; Ye, J.; Tang, F.; Chen, Q. Digital-Twin-Assisted Task Assignment in Multi-UAV Systems: A Deep Reinforcement Learning Approach. IEEE Internet Things J. 2023, 10, 15362–15375. [Google Scholar] [CrossRef]
Watkins, C.J.C.H. Learning from Delayed Rewards. Doctoral Dissertation, University of Cambridge, Cambridge, UK, 1989. [Google Scholar]
Lee, J.; Kim, D.W. An effective initialization method for genetic algorithm-based robot path planning using a directed acyclic graph. Inf. Sci. 2016, 332, 1–18. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with Deep Reinforcement Learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Sun, Y.; Fesenko, H.; Kharchenko, V.; Zhong, L.; Kliushnikov, I.; Illiashenko, O.; Morozova, O.; Sachenko, A. UAV and IoT-based systems for the monitoring of industrial facilities using digital twins: Methodology, reliability models, and application. Sensors 2022, 22, 6444. [Google Scholar] [CrossRef]
Yoon, S.; Lee, S.; Kye, S.; Kim, I.-H.; Jung, H.-J.; Spencer, B.F. Seismic fragility analysis of deteriorated bridge structures employing a UAV inspection-based updated digital twin. Struct. Multidiscip. Optim. 2022, 65, 346. [Google Scholar] [CrossRef]
Shen, G.; Lei, L.; Li, Z.; Cai, S.; Zhang, L.; Cao, P.; Liu, X. Deep Reinforcement Learning for Flocking Motion of Multi-UAV Systems: Learn From a Digital Twin. IEEE Internet Things J. 2022, 9, 11141–11153. [Google Scholar] [CrossRef]
Wang, B.; Sun, Y.; Jung, H.; Nguyen, L.D.; Vo, N.-S.; Duong, T.Q. Digital Twin-Enabled Computation Offloading in UAV-Assisted MEC Emergency Networks. IEEE Wirel. Commun. Lett. 2023, 12, 1588–1592. [Google Scholar] [CrossRef]
Grigoropoulos, N.; Lalis, S. Simulation and Digital Twin Support for Managed UAV Applications. In Proceedings of the 2020 IEEE/ACM 24th International Symposium on Distributed Simulation and Real Time Applications (DS-RT), Prague, Czech Republic, 14–16 September 2020. [Google Scholar]
San, O. The digital twin revolution. Nat. Comput. Sci. 2021, 1, 307–308. [Google Scholar] [CrossRef] [PubMed]
Koubaa, A.; Allouch, A.; Alajlan, M.; Javed, Y.; Belghith, A.; Khalgui, M. Micro Air Vehicle Link (MAVlink) in a Nutshell: A Survey. IEEE Access 2019, 7, 87658–87680. [Google Scholar] [CrossRef]
Shah, S.; Dey, D.; Lovett, C.; Kapoor, A. AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles. In Field and Service Robotics; Hutter, M., Siegwart, R., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 621–635. [Google Scholar]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
Zhang, X.; Li, Y. Data-Driven Learning for H∞ Control of Adaptive Cruise Control Systems. IEEE Trans. Intell. Transp. Syst. 2023, 24, 1234–1245. [Google Scholar]

Figure 1. (a) UAV target unreachability problem. (b) UAV caught in local minima.

Figure 2. The trajectory prediction model of quadrotor UAV based on LSTM.

Figure 3. LSTM-DQN-APF FlowchartIn.

Figure 4. (a) Twin platform monitoring interface and (b) diagram of the digital-realistic system.

Figure 5. Digital twin platform communication map.

Figure 6. (a) 3D and (b) 2D trajectory diagrams of physical and virtual UAVs flying a quadrilateral.

Figure 7. Comparison of 2D (a) and 3D (b) real trajectory and trajectory prediction results of quadrotor UAV data set.

Figure 8. (a) Panoramic and (b) dynamic threat map of urban scenes.

Figure 9. (a) 3D and (b) 2D obstacle avoidance trajectory maps of the UAV for Task 1 (blue: LSTM-DQN-APF, green: GE-APF, red: APF).

Figure 10. Comparison chart of (a) flight time and (b) flight distance for Task 1’s UAV (blue: LSTM-DQN-APF, green: GE-APF, red: APF).

Figure 11. Task 2’s UAV (a) 3D and (b) 2D obstacle avoidance trajectory maps (blue: LSTM−DQN−APF, green: GE−APF, red: APF).

Figure 12. Comparison chart of (a) flight time and (b) flight distance for Task 2’s UAV (blue: LSTM-DQN-APF, green: GE-APF, red: APF).

Figure 13. Static (a) and dynamic states (b) in a dynamic environment.

Figure 14. Motion trajectory diagram in dynamic environment.

Figure 15. Evolution curve of the number of iterations versus reward value.

Figure 16. The effect of wind on the virtual UAV generated.

Figure 17. UAV flight path.

Table 1. Table of incentive functions.

UAV status	Rewards
r1 is the reward for reaching the target	100
r2 is the penalty for having a collision or going out of bounds	−100
r3 is the negative reward for staying put	−5
Explore rewards	1/max(N(s), ε) − 1/max(N(s′), ε)
Heuristic reward	D(s, g) − D(s′, g)
F(s)	Combined force

Table 2. Comparison of prediction errors MAE (m) of different steps.

Step Length	Height/m	Latitude/°	Longitude/°
5	2.72	8.73	7.99
10	2.90	9.04	8.52
15	2.79	7.98	9.59
20	2.67	7.28	7.37
25	2.99	6.60	6.91
30	2.88	6.83	7.01
35	2.49	8.62	6.82
40	2.33	6.45	7.00
45	2.66	8.42	6.04
50	2.50	7.58	6.61

Table 3. Comparison of prediction errors MAE (m) of different prediction steps.

Predicted Number of Steps	1	2	3
height	2.99	4.40	31.77
Latitude	6.60	20.20	50.82
Longitude	6.91	11.10	59.49

Table 4. Table of obstacle parameters.

Hindrance	Parameters	Position (x,y,z)/m
Building 1	5 m × 5 m × 10 m, building with rectangular collision body	(0,8,0), (10,5,0), (20,10,0), (30,10,0)
Building 2	5 m × 8 m × 30 m, building with rectangular collision body	(30,0,0), (10,40,0), (35,30,0)
Tree 1	1 m × 1 m × 5 m, the collision body is a rectangle	(10,20,0), (20,0,0), (40,10,0)
Tree 2	1 m × 1 m × 10 m, the collision body is a rectangle	(30,40,0), (30,20,0), (40,35,0)
Signage 1	1 m × 1 m × 0.1 m, collision body is a rectangle	(15,5,10), (25,30,10)
Signage 2	1 m × 2 m × 0.1 m, collision body is a rectangle	(25,10,15), (5,10,15)

Table 5. Task point locations.

Mandates	Target Point Position (x,y,z)/m
Task I	(20,20,5)
Task II	(40,20,5)

Table 6. Parameters of dynamic obstacles in the 3D environment.

Sequof	Initial Position (x,y,z)	Radius	X-Axis Step Size	Y-Axis Step Size	Z-Axis Step Size	Exercise Cycles
1	(−1,1,5)	4	0.02	−0.04	0.03	100
2	(3,15,5)	2	0.03	0.06	0.02	100
3	(5,12,5)	4	0.04	0.05	0.02	100

Table 7. Flight performance comparison between APF, Ge-Apf, and LSTM-DQN-APF in windy and windless conditions.

Algorithm Name	Ambient Condition	Mean Time of Flight/s	Average Path Length/m	Task Completion Rate
APF	breezeless	57.27	40.5 m	36%
Ge-Apf	breezeless	49.21	36.4 m	78%
LSTM-DQN-APF	breezeless	26.62	28.2 m	99.2%
APF	windiness	71.1	48 m	23%
Ge-Apf	windiness	65.43	42.3 m	64%
LSTM-DQN-APF	windiness	34.35	35.3 m	97.3%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, Y.; Wang, X.; Yang, Y.; Ding, M.; Qu, S.; Fu, Y. LSTM-DQN-APF Path Planning Algorithm Empowered by Twins in Complex Scenarios. Appl. Sci. 2025, 15, 4565. https://doi.org/10.3390/app15084565

AMA Style

Lu Y, Wang X, Yang Y, Ding M, Qu S, Fu Y. LSTM-DQN-APF Path Planning Algorithm Empowered by Twins in Complex Scenarios. Applied Sciences. 2025; 15(8):4565. https://doi.org/10.3390/app15084565

Chicago/Turabian Style

Lu, Ying, Xiaodan Wang, Yang Yang, Man Ding, Shaochun Qu, and Yanfang Fu. 2025. "LSTM-DQN-APF Path Planning Algorithm Empowered by Twins in Complex Scenarios" Applied Sciences 15, no. 8: 4565. https://doi.org/10.3390/app15084565

APA Style

Lu, Y., Wang, X., Yang, Y., Ding, M., Qu, S., & Fu, Y. (2025). LSTM-DQN-APF Path Planning Algorithm Empowered by Twins in Complex Scenarios. Applied Sciences, 15(8), 4565. https://doi.org/10.3390/app15084565

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LSTM-DQN-APF Path Planning Algorithm Empowered by Twins in Complex Scenarios

Abstract

1. Introduction

2. Preliminary

2.1. APF

2.2. DQN

2.2.1. Q-Learning

2.2.2. The Working Principle of DQN

2.3. LSTM-DQN-APF

3. Method

3.1. Extended Kalman Filter Data Fusion Algorithm

3.2. Improved Gravity and Repulsion Functions

3.3. Reward Function Design

4. Twin System Modeling and LSTM-DQN-APF Algorithm Validation Analysis

4.1. System Setup

4.2. Virtual and Real Communications

4.3. Algorithm Validation Test and Analysis

4.3.1. Test Ideas

4.3.2. Experiment 1: Verification of Consistency Between Virtual and Real Environments and Verification of Interface Interoperability

4.3.3. Experiment 2: Verify the Accuracy of LSTM-DQN-APF Prediction

4.3.4. Experiment 3: Algorithm Validation Test for Urban Warfare Scenario

5. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI