Swarm Maneuver Decision Method Based on Learning-Aided Evolutionary Pigeon-Inspired Optimization for UAV Swarm Air Combat

Sun, Yongbin; Chen, Yu; Wei, Chen; Li, Bin; Fan, Yanming

doi:10.3390/drones9030218

Open AccessArticle

Swarm Maneuver Decision Method Based on Learning-Aided Evolutionary Pigeon-Inspired Optimization for UAV Swarm Air Combat

by

Yongbin Sun

^1,*

,

Yu Chen

¹,

Chen Wei

¹

,

Bin Li

² and

Yanming Fan

³

¹

National Key Laboratory of Aircraft Integrated Flight Control, School of Automation Science and Electrical Engineering, Beihang University, Beijing 100081, China

²

School of Aeronautics and Astronautics, Sichuan University, Chengdu 610017, China

³

AVIC Shenyang Aircraft Design and Research Institute, Shenyang 110035, China

^*

Author to whom correspondence should be addressed.

Drones 2025, 9(3), 218; https://doi.org/10.3390/drones9030218

Submission received: 20 January 2025 / Revised: 5 March 2025 / Accepted: 13 March 2025 / Published: 18 March 2025

(This article belongs to the Special Issue Biological UAV Swarm Control)

Download

Browse Figures

Versions Notes

Abstract

:

Unmanned aerial vehicle (UAV) swarm dynamic combat poses significant challenges due to its complexity and dynamism. This study introduces a novel approach that addresses these challenges through the development of a swarm maneuver decision method based on the Learning-Aided Evolutionary Pigeon-Inspired Optimization (LAEPIO) algorithm. This research proceeds systematically as follows: First, a nonlinear model of fixed-wing UAVs and a decision-making system for swarm air combat are established. Next, a situation function is applied to characterize the battlefield environment and quantify the strategic advantages of each side during the engagement. The LAEPIO algorithm is then advanced to tackle sub-tasks in swarm air combat by incorporating a learning-aided evolutionary mechanism. Building upon this foundation, a swarm maneuver decision method is designed, enabling UAV swarms to select optimal strategies from a library of maneuvers after thoroughly assessing the battlefield scenario. Finally, the efficacy and superiority of the proposed method are demonstrated through comprehensive simulations across diverse air combat scenarios. The results show that the average win rate of the proposed algorithm is 36.7% higher than that of similar algorithms.

Keywords:

UAV swarm; air combat; maneuver decision; LAEPIO algorithm; swarm intelligence

1. Introduction

Recently, UAVs have evolved from solitary operational units into sophisticated swarm formations that are capable of carrying out complex missions either autonomously or semi-autonomously [1]. However, this shift towards swarm warfare requires an unprecedented level of individual intelligence among the UAV components and complex swarm control methods, which pose significant technological challenges and continue to expand the boundaries of modern drone capabilities.

The swarm intelligence decision-making approach of UAV swarm pertains to the decision-making technology that enables multiple UAVs to autonomously collaborate for the accomplishment of complex tasks without centralized control and plays a vital role in UAV swarm air combat. In order to solve the difficulty of search space explosion in a cluster air combat decision problem, researchers have made a lot of attempts that could be broadly divided into the following three categories of methods: first, the maneuver decision-making method based on an expert system [2,3,4]; second, swarm autonomous decision making based on machine learning with deep neural networks [5,6,7]; last, autonomous decision making for UAV swarm in an air combat approach based on swarm bionic intelligence [8]. The solution based on an expert system is generally reliable. Actually, it has become the mainstream solution to a UAV swarm air combat problem in the past few years. However, this scheme relies too much on the priori information from experts, and this kind of theoretical knowledge is often difficult to describe all decision-making scenarios in air combat decision making, so a general decision-making expert system could hardly be constructed [9]. As regards swarm autonomous decision making based on deep learning, it has a certain degree of universality after the successful training of a deep neural network, but the difficulty of network training and the amount of computing power consumption make it difficult to leave the laboratory and be truly applied to practical application scenarios [10]. In the realm of the theory of bionic intelligent algorithm, UAV swarms attain diverse combat capabilities by controlling individual UAV decisions and manifesting complex behaviors at the group level. During this process, the swarm intelligence algorithm derived by simulating various swarm behaviors in nature is extensively employed in the decision making of UAV swarm game confrontation due to its excellent scalability, parallelism, and straightforward implementation [11]. There are numerous attempts of bionic intelligent algorithms to explore the UAV swarm air combat game problem and find a practical swarm intelligence control method from different institutions and scholars. For example, the attention-enhanced bidirectional gate recurrent unit based on the tuna swarm optimization algorithm is proposed to identify the intention of enemy UAVs in the beyond visual range air combat situation in [12]. Inspired by the hierarchical structure of wolves’ social division of labor, Zhou and Chen et al. [13] improve the traditional wolf colony optimization algorithm by imitating the information transmission mechanism of wolves with a different division of labor. It is applied to the UAV swarm target assignment problem. Among swarm intelligence algorithms, the pigeon-inspired optimization (PIO) algorithm is adopted to satisfy the rapid convergence requirement of a maneuver decision problem for its swift convergence in this paper. The competitive learning pigeon-inspired optimization algorithm mentioned in [14] is used to search for the optimal decision in the air combat game. Duan et al. [15] propose an autonomous maneuver decision for an unmanned aerial vehicle via improved pigeon-inspired optimization, in which the PIO algorithm is improved by creating a new individual evolutionary form. Although these improved methods indeed perform better than then original PIO algorithm in some cases, the architecture of an optimization algorithm is not really changed, which means that it is difficult for these methods to substantially outperform the traditional PIO algorithms.

Summarily, this paper contains the following contributions:

By imitating the process of human cognition and learning behavior, a new optimization algorithm structure named LAEPIO algorithm is proposed, which combines learning-aided evolution for an optimization (LEO) mechanism and a PIO algorithm. Compared with the previous algorithm improvement [16], the LAEPIO algorithm, combined with an artificial neural network, shows certain advantages in complex decision tree search problems, which greatly increases the robustness of the algorithm while improving the convergence speed to a certain extent.
Considering the complex conditions in the battlefield, the precise dynamic model of the UAV is adopted in this paper, and a complex and comprehensive situation function is established to describe the battlefield advantage, and the results of the situation function are used as the key basis for the autonomous decision making of the UAV.
The swarm maneuver decision method based on the LAEPIO algorithm is proposed to deal with the dynamic performance requirement of a complex battlefield environment. In an air combat, the UAV used a swarm maneuver decision method based on the LAEPIO algorithm, which is able to predict the enemy’s next action mode and quickly adopt an optimal strategy to stay in a better situation.

The remainder of this paper is organized as follows: Section 2 presents the problem statement. The LAEPIO algorithm is proposed in Section 3. The UAV swarm confrontation game method and the autonomous maneuver decision method based on LAEPIO are designed in Section 4. Simulation results and analysis compared with [17,18,19,20] are presented in Section 5. A detailed discussion about the method is in Section 6, and the paper concludes in Section 7.

2. Problem Statements

The objective of air combat decision making is to identify the optimal action strategy within a fleeting decision window to create superior offensive advantages for UAVs, ultimately securing victory in the entire engagement. Therefore, the real-time requirement of the air combat decision-making method is extremely high, and the conventional algorithm is difficult to directly meet the requirements. Consequently, a more precise nonlinear dynamic model and a decision system capable of responding promptly are indispensable.

2.1. Dynamic Model of UAV

During the simulation of air combat decision scenarios, it relies on finely detailed UAV models to achieve real-time decision making and enhance combat capability. Therefore, a UAV nonlinear model based on aircraft dynamics is proposed. We describe the fixed-wing model used in this paper in a body coordinate system [21].

The controlled variables have 12 state variables as the following Equation (1):

S^{T} = [x_{g}, y_{g}, z_{g}, ϕ, θ, γ, u, v, w, ω_{x}, ω_{y}, ω_{z}]

(1)

where

x_{g}, y_{g}, z_{g}

are the position state of the UAV;

ϕ, θ, γ

are the roll angle, pitch angle, and yaw angle; u, v, and w are the components of the velocity along the axis; and

ω_{x}, ω_{y}, ω_{z}

are the angle velocity of the Euler angle. The variables

u, v, w

represent the velocity component. The dynamic model is given as follows:

\{\begin{matrix} {\dot{x}}_{g} = u cos θ cos ψ \\ + v (sin θ sin ϕ cos ψ - cos ϕ sin ψ) \\ + w (sin ϕ sin ψ + cos ϕ sin θ cos ψ) \\ {\dot{y}}_{g} = u cos θ sin ψ \\ + v (sin θ sin ϕ sin ψ + cos ϕ sin ψ) \\ + w (- sin ϕ cos ψ + cos ϕ sin θ sin ψ) \\ {\dot{z}}_{g} = u sin θ - v sin ϕ cos θ - w cos ϕ cos θ \end{matrix}

(2)

\{\begin{matrix} \dot{u} = v ω_{z} - w ω_{y} - g s i n θ + \frac{F_{x}}{m} \\ \dot{v} = - u ω_{z} + w ω_{x} + g c o s θ s i n ϕ + \frac{F_{y}}{m} \\ \dot{w} = u ω_{y} - v ω_{x} + g c o s θ c o s ϕ + \frac{F_{z}}{m} \end{matrix}

(3)

where

F_{x}, F_{y}

, and

F_{z}

are the force components along the axis; g is the gravity; and m is the mass of the model.

\{\begin{matrix} \dot{ϕ} = ω_{x} + (ω_{z} cos ϕ + ω_{y} sin ϕ) tan θ \\ \dot{θ} = ω_{y} cos ϕ - ω_{z} sin ϕ \\ \dot{γ} = \frac{1}{cos θ} (ω_{z} cos ϕ + ω_{y} sin ϕ) \end{matrix}

(4)

\{\begin{matrix} {\dot{ω}}_{x} = \frac{1}{I_{x} I_{y} - I_{x y}^{2}} [I_{y} L + I_{x y} M + (I_{z} - \\ I_{x} - I_{y}) I_{x y} ω_{x} ω_{y} + (I_{y}^{2} - I_{y} I_{z} + I_{x y}^{2}) ω_{y} ω_{z}] \\ {\dot{ω}}_{y} = \frac{1}{I_{x} I_{y} - I_{x y}^{2}} [I_{x y} L + I_{x} M + (I_{x} + \\ I_{y} - I_{z}) I_{x y} ω_{y} ω_{z} + (I_{x} I_{z} - {I_{x}}^{2} - I_{x y}^{2}) ω_{x} ω_{z}] \\ {\dot{ω}}_{z} = \frac{I_{x} - I_{y}}{I_{z}} ω_{x} ω_{y} + \frac{I_{x y}}{I_{z}} ({ω_{x}}^{2} - {ω_{y}}^{2}) + \frac{N}{I_{z}} \end{matrix}

(5)

where

I_{x}, I_{y}, I_{z}

, and

I_{x y}

represent the coordinate components of inertia moment, and

L, M, N

are the moments along the axis in the body-fixed reference frame.

2.2. Systematic Architecture of UAV Swarm Maneuver Decision Method

If the continuous state variables of the UAV are directly used as the action space of a decision-making agent, it is complex and difficult to realize. Many flight actions in the process of air combat, such as serpentine maneuver and high-speed dive, are difficult to be modeled directly. Therefore, it is necessary to analyze and dissect the prior knowledge of the air combat game process so that a series of basic actions (meta-actions) can be obtained, which greatly simplifies the difficulty of modeling and discards the search space of agent decision making, so as to make the application of bionic intelligence possible. The collection of these basic actions is called the maneuver library. The maneuvering action library used in this paper includes 21 basic actions; the specific actions are shown in Table 1.

The autonomous decision-making system of the UAV air combat designed in this paper is composed of the following parts: Sensors obtain the state information of our UAV swarm and adversarial UAV, and then perform attack target allocation and attack resource calculation based on state information gathered before. When the attack conditions are met, the attack mission starts. If the target is destroyed, the target information is returned; otherwise, the reinforcements are requested. Then determine if the task is complete. If the task is finished, output the result and return; otherwise, carry out the next round of target allocation. The air combat decision system is designed as shown in Figure 1.

2.3. Situation Function of UAV Swarm Maneuver Decision Method

The situation function

f_{s}

is used to describe the battlefield environment and the advantages of both sides in the air combat game process determined by the position information of us and the enemy. According to the situation function

f_{s}

, we can predict the next maneuver action of the enemy and provide the basis for the next decision of the agent. The definition of the situation function is given below. The situation function consists of four parts, which respectively represent the influence mechanism of the angle relationship, the position relationship, the speed relationship, and the relative altitude on the air battlefield situation. The situation value is calculated to design the decision objective, which is to maximize the situation advantage of the own side [22,23,24,25]. First, define the state description as Equation (6).

\{\begin{matrix} R = [x_{t} - x_{u}, y_{t} - y_{u}, z_{t} - z_{u}] \\ V_{u} = [\begin{matrix} v_{u} cos γ_{u} cos φ_{u} \\ \begin{matrix} v_{u} cos γ_{u} sin φ_{u} \\ v_{u} sin γ_{u} \end{matrix} \end{matrix}] \\ V_{t} = [\begin{matrix} v_{t} cos γ_{t} cos φ_{t} \\ \begin{matrix} v_{t} cos γ_{t} sin φ_{t} \\ v_{t} sin γ_{t} \end{matrix} \end{matrix}] \\ q_{u} = arccos \frac{R V_{u}}{∥R∥ ∥V_{u}∥}, q_{u} \in [0, π] \\ q_{t} = arccos \frac{R V_{t}}{∥R∥ ∥V_{t}∥}, q_{t} \in [0, π] \end{matrix}

(6)

where R is a vector pointing from the current position to the target position, q is the angle of vision,

γ

is the angle of track,

ϕ

is the angle of track inclination, and V is the vector of velocity.

The situation value is calculated through the relationship between the angle, position, velocity, and altitude of both sides, and the decision objective is designed to maximize the utilization of the situation. The angle factor of the situation function

η_{A}

is as follows:

η_{A} = \{\begin{matrix} (1 - \frac{q_{u}}{180}) {(f_{q} (q_{t}) / f_{q} (0))}^{0.4}, 0 \leq q_{u} \leq k_{θ} θ \\ 1.5 (1 - \frac{q_{u}}{180}) (1 - \frac{q_{t}}{180}), q_{u} > k_{θ} θ \end{matrix}

(7)

where

θ

is the limitation of an angle normally set as 80,

k_{θ}

is a correct coefficient,

k_{θ} = 0.8

, and

f_{q} (q_{t})

is a function missile attack distance. From the formula, we can see that when our angle of vision

q_{u}

is smaller than the limit angle

θ

, the angle advantage factor

η_{A}

is significantly proportional to the attack distance and inversely proportional to the

q_{u}

; when

q_{u}

is larger than the limit angle,

η_{A}

is inversely proportional to both

q_{u}

and

q_{t}

. Therefore, it is necessary to minimize

q_{u}

in the decision-making process.

The distance factor of the situation function

η_{D}

is shown as follows:

η_{D} = \{\begin{matrix} (1 - m_{h}^{2}) e^{\frac{k_{d} f_{q} (q_{t}) - D}{k_{d} f_{q} (q_{t})}}, D > k_{d} f_{q} (q_{t}) \\ (1 - m_{h}^{2}), k_{d} f_{q} (q_{t}) - d_{τ} < D < k_{d} f_{q} (q_{t}) \\ (1 - m_{h}^{2}) e^{\frac{1.2 (D - k_{d} f_{q} (q_{t}))}{k_{d} f_{q} (q_{t})}} R_{min} < D < k_{d} f_{q} (q_{t}) - d_{τ} \\ 0, D < R_{min} \end{matrix}

(8)

where D is the distance between our position and the target position, the stable firing range

d_{τ}

k_{d}

is a coefficient,

m_{h}

is a standard height correct coefficient, and

R_{min}

is a constant equal to 1000 m. As we can see, ignoring the influence of an altitude factor, the situation assessment value is larger when the distance between two aircraft is closer to the limit distance

R_{min}

. Therefore, in actual combat, getting as close as possible to the enemy aircraft outside the enemy warning range will effectively improve the situation assessment value.

The velocity factor of the situation function

η_{V}

is as follows:

v_{d} = \{\begin{matrix} v_{t} + (v_{max} - v_{t}) (1 - e^{\frac{D - k_{d} f_{q} (q_{t})}{k_{d} f_{q} (q_{t})}}), D \geq k_{d} f_{q} (q_{t}) \\ v_{t} + (v_{min} - v_{t}) (1 - e^{\frac{2 (D - k_{d} f_{q} (q_{t}))}{k_{d} f_{q} (q_{t})}}), R_{min} < D < k_{d} f_{q} (q_{t}) \\ v_{t} + 0.8 (v_{min} - v_{t}) e^{{(D / R_{min})}^{2} - 1}, D < R_{min} \\ v_{max} = 300 (2 - e^{- h_{u} / 1000}) \end{matrix}

(9)

η_{V} = (\frac{v_{u}}{v_{d}} e^{- \frac{|2 v_{u} - v_{d}| - v_{d}}{v_{d}}})

(10)

where

v_{max}

and

v_{min}

are the velocity limitation of the UAV,

v_{d}

is the desire velocity,

h_{u}

is the current height, and D is the distance between us and the enemy.

k_{d} f_{d} (q_{t})

represents the actual far boundary of this type of missile when the entry angle is

q_{t}

. Obviously, only when the speed tracking performance of the controller is satisfied that the desired speed is larger and the situation assessment value is higher. However, a very large desired velocity brings great challenges to the robustness of a flight controller. Additionally, the desire velocity is coupled with distance between a friend and a foe and the attack distance.

The height factor of the situation function

η_{H}

is as shown below:

h_{d} = \{\begin{matrix} h_{t} + (h_{max} - h_{t}) (1 - e^{\frac{D - k_{d} f_{q} (q_{t})}{k_{d} f_{q} (q_{t})}}), D \geq k_{d} f_{q} (q_{t}) \\ h_{t} + (h_{min} - h_{t}) (1 - \frac{1}{2} e^{\frac{D - k_{d} f_{q} (q_{t})}{k_{d} f_{q} (q_{t})}}), D < k_{d} f_{q} (q_{t}) \end{matrix}

(11)

η_{H} = \{\begin{matrix} {(\frac{h_{u}}{h_{d}} e^{\frac{|2 h_{u} - h_{d}| - h_{d}}{2 h_{d}}})}^{\frac{1}{2}}, h_{u} \in [h_{min}, h_{max}] \\ 0, h_{u} \notin [h_{min}, h_{max}] \end{matrix}

(12)

where

h_{max}

and

h_{min}

are the height limitation of the UAV,

h_{t}

represents the current height of the target, and

h_{d}

is the desired height. Being similar to the definition of velocity factor, only when the height tracking performance of the controller is satisfied that the desired height is higher and the situation assessment value is larger. Meanwhile, the desired height is also coupled with the distance between a friend and a foe and the attack distance.

The final situation function can be obtained by normalizing and weighting the above four situation assessment functions as illustrated in function (13).

f_{s} = ω_{1} η_{A} + ω_{2} η_{D} + ω_{3} η_{V} + ω_{4} η_{H}

(13)

3. Learning-Aided Evolutionary Pigeon-Inspired Optimization Algorithm

Inspired by the process of human cognition and learning, a novel optimization algorithm structure, designated as the LAEPIO algorithm, is proposed in this section. It combines the learning-aided evolution for optimization (LEO) mechanism and the PIO algorithm.

3.1. Pigeon-Inspired Optimization Algorithm

The pigeon-inspired optimization algorithm is a global optimization method that simulates biological behavior. Duan et al. [15] developed the PIO algorithm inspired by a pigeon swarm’s unique ability to solve an optimal problem that needs to quickly converge. The PIO algorithm mainly uses the map–compass operators and landmark operators to update the position and velocity of the pigeon flock. Pigeons have magnetic induction structures on their beak structures, sense the geomagnetic field using magnetic objects in flight, and then form a map in their mind. In the pigeon-inspired optimization algorithm, a virtual pigeon is used to simulate the navigation process, and the position and velocity of the pigeon are initialized. In the multi-dimensional search space, the respective position and velocity are updated in each iteration as Equation (14). The speed of the i-th pigeon is determined by the speed of its previous generation and the best position and location of the current pigeon.

\{\begin{matrix} V_{i} (t) = V_{i} (t - 1) e^{- R t} + r a n d (X_{g} - X_{i} (t - 1)) \\ X_{i} (t) = X_{i} (t - 1) + V_{i} (t) \end{matrix}

(14)

where R is the map factor, rand is a random number, and t is the number of generations. The position of the i-th pigeon is determined by its previous position and its current speed. The flight of all pigeons is guaranteed by the map, and the best position of pigeons can be obtained by comparison.

The landmark operator is used to model the influence of landmarks on pigeons in the navigation tool. When flying close to the destination, pigeons rely more on nearby landmarks. In the landmark model, the number of half pigeons is recorded in each generation. Those pigeons far from the destination are not familiar with the surface, so they will no longer have the ability to distinguish the path. At this stage, the pigeon flock optimizes its flight direction and speed by looking for the surrounding landmarks. The size of the population becomes half of the size of the last iteration. After the population size is halved, the population center position is calculated, and the individual flight direction is updated based on the center position as Equation (15).

\{\begin{matrix} N_{p} (t) = \frac{N_{p} (t - 1)}{2} \\ X_{c} (t) = \frac{\sum X_{i} (t) f i t n e s s (X_{i} (t))}{N_{p} \sum f i t n e s s (X_{i} (t))} \\ X_{i} (t) = X_{i} (t - 1) + r a n d (X_{c} (t) - X_{i} (t - 1)) \end{matrix}

(15)

where

f i t n e s s (x)

is the adaptability value of every individual,

X_{c}

is the center of the position, and

N_{p}

is the size of the population.

The PIO algorithm has the characteristics of fast search speed and strong evolution ability, but it also has some limitations. For example, the algorithm is easy to fall into a local optimal solution with the increase in the number of iterations. To solve this problem, the learning-aided evolutionary optimization (LEO) framework is introduced into the PIO algorithm.

3.2. Learning-Aided Evolution Pigeon-Inspired Optimization Algorithm

A learning-aided evolutionary optimization framework [26] plus learning and evolution for solving optimization problems is introduced into the PIO algorithm in this paper. The LAEPIO algorithm, as shown in Figure 2, is inspired by the origin mechanism of human intelligence, imitates human cognition and the learning process, obtains information about the objective function from the algorithm operation process to train the neural network, and finally realizes the effect of auxiliary intelligent optimization algorithm evolution.

Following the human learning process, in the early stage of the algorithm, the individual evolution only depends on the update formula of the traditional optimization algorithm. In this process, a lot of information about the objective function will be accumulated, which will not be used in the traditional algorithm process. However, in the LEO mechanism, this information is initially filtered to form a successful evolution pair (SEP), shown as follows:

(X_{g, i}, X_{g, i + 1})

(16)

In order to make use of the knowledge about the objective function accumulated along with the exploration of the objective function during the individual updating process, the previously accumulated SEPs are used to train the ANN network after the pooling operation in the middle of the algorithm, and the loss function

L_{M S E}

of the neural network is taken as follows:

L_{M S E} = \frac{1}{b a t c h} \sum_{X_{i}, X_{i}^{'}} {|L (X_{i}) - X_{i}^{'}|}_{2}

(17)

In the latter stage of the algorithm, the training of the neural network is basically completed, which means that the cognition is basically formed and the learning step is basically completed. At this time, a reasonable way to use the neural network is needed. The LEO mechanism provides two operation methods to assist in the evolution: learning mutation (

L_{M}

) and learning crossover (

L_{C}

) operation. Their definitions are given as follows:

\{\begin{matrix} L_{M} (X_{m 1}, X_{m 2}, X_{m 3}) = L (X_{m 1}) + α (X_{m 2} - X_{m 3}) \\ X_{c 1} = L M (X_{m 1}, X_{m 2}, X_{m 3}) \\ L_{C} (X_{c 1}, X_{c 2}) = β X_{c 1} + (1 - β) X_{c 2} \end{matrix}

(18)

where

α

is the range of the variation rate

(0, 1)

;

β

is the crossover rate calculated by the following Equation (19) in which

k_{β}

is a constant; and

X_{m 2}

,

X_{m 3}

, and

X_{c 2}

are all randomly chosen individual best positions

P_{b e s t}

.

β = \{\begin{matrix} 1, if rand (1) \leq k_{β} \\ 0, otherwise \end{matrix}

(19)

To address the limitations of traditional pigeon-inspired optimization (PIO) algorithms, including premature convergence, insufficient global exploration, and parameter sensitivity, this paper integrates the LEO mechanism to the PIO algorithm to develop a new PIO algorithm based on the learning evolution assistant mechanism. The proposed framework is specifically designed for solving UAV air combat game optimization problems. The LEO mechanism enhances algorithmic intelligence through the following three critical improvements:

Concurrent multi-threaded neural network training preserves the algorithm’s initial rapid convergence characteristics.
Dynamic parameter adaptation eliminates manual tuning requirements.
Guided evolutionary strategies significantly improve convergence accuracy within constrained computational budgets.

Notably, while conventional PIO demonstrates accelerated convergence during initial iterations, its population diversity deteriorates progressively, leading to diminished global search capabilities. The introduced LEO mechanism effectively compensates for these deficiencies with the help of a large number of previous accumulated learning experience about the function to be optimized. This synergistic integration not only reduces local optimum entrapment probability but also enhances algorithmic stability while maintaining computational efficiency advantages inherent to PIO architectures. The pseudocode for LAEPIO is given as Algorithm 1.

Algorithm 1 Algorithm of LAEPIO

Input: Variable X to be optimized X (This is Inputs)
Output: Best variable and best fitness

X_{b} e s t

,

f i t n e s s_{b} e s t

(This is Outputs)

1:: initialization
2:: for each $i \in [1, N_{c 1}]$ do
3:: if the ANN is trained over then
4:: update position as Equation (18)
5:: else
6:: update position as Equation (14)
7:: end if
8:: if current position $X_{i}$ and $X_{g b e s t}$ satisfy the condition of SEPs then
9:: put $(X_{i}, X_{g b e s t})$ into replaybuffer
10:: end if
11:: update $X_{g b e s t}$ , ${f i t n e s s}_{b e s t}$ and train ANN as Equation (17)
12:: end for
13:: for each $i \in [1, N_{c 2}]$ do
14:: if the ANN is trained over then
15:: update position as Equation (18)
16:: else
17:: update position as Equation (15)
18:: end if
19:: if current position and $X_{c}$ satisfy the condition of SEPs then
20:: put $(X_{i}, X_{c})$ into replaybuffer
21:: end if
22:: update $X_{c}$ , ${f i t n e s s}_{b e s t}$ and train ANN as Equation (17)
23:: end for
24:: return Outputs

In the early iteration of the algorithm, the PIO algorithm has fast convergence speed, which is suitable for quickly obtaining a large number of successful evolution samples. By screening the successful evolution pairs in the algorithm iteration process to form the training data set, and using a parallel computing method, the auxiliary network can be trained in the algorithm iteration process. In the latter iteration of the algorithm, the PIO algorithm tends to be stable and the exploration ability is insufficient, so the training auxiliary network is introduced, and the prediction results of the auxiliary network are randomly mixed into the algorithm update link through the method of crossover and mutation to improve the exploration of the algorithm. Regarding the analysis of time complexity, the algorithm can be considered to have the same time complexity

O_{n^{2}}

as the PIO algorithm due to the influence of parallel computing.

4. Swarm Maneuver Decision Method Based on LAEPIO Algorithm

In this section, the swarm maneuver decision method based on the LAEPIO algorithm is proposed, which is able to predict the enemy’s next action mode and quickly adopt an optimal strategy to stay in a better situation. In order to deal with the problem of unmanned aerial vehicle (UAV) swarm air combat, the attacking targets should be dynamically allocated. By constructing the efficiency function (22) of multi-objective allocation, feasible allocation results can be obtained in a very short time by using the LAEPIO algorithm mentioned in this paper. Then, for each UAV in each local battlefield after allocation, the decision chain is obtained by using the decision optimization algorithm proposed in this paper. Finally, we will expand the advantage of our UAV cluster in the overall battlefield.

4.1. UAV Swarm Attack Target Allocation

UAV attack target allocation refers to the reasonable allocation of tasks to each UAV in the scenario of multi-UAV cooperative operations to maximize the effect of cooperative operations. This is one of the key technologies for UAV swarm to realize the efficient execution of combat tasks. Inspired by the group cooperative predation behavior of the gray wolf, in [27], th authors propose a task allocation method based on the concept of bionics. Mapping the coyote group’s hunting behavior to UAVs’ attack allocation, we define total attack resource as follows:

E_{s u m} = \sum_{z = 1}^{n} E_{z}

(20)

where n is maximum quantity of UAVs and

E_{z}

is the combat resource of any UAV. The minimum requirement for a total attack resource is as follows:

E_{m i n} = (1 + ρ) \sum E (u, t)

(21)

where

ρ

is a redundancy coefficient

(ρ > 0)

and

E (u, t)

is the least consumption of the current UAV u for the target t. If the feasible attack formation meets the total attack resource requirements, it will be added to the pre-attack formation until all feasible formations are traversed. All feasible formations that meet the minimum attack resource requirements are searched by the LAEPIO algorithm. When the attack capability of one UAV cannot kill the target, multiple UAVs can coordinate to complete the mission.

When any UAV meets the minimum attack resource requirement, it decides whether to participate in the mission according to the probability function shown as Equation (22).

P_{i k} (t) = \frac{{[R_{k} (t)]}^{2}}{{[R_{k} (t)]}^{2} + α {(R (u, t))}^{2} + {(Δ t_{i k})}^{2}}

(22)

where

R_{k} (t)

is the reward function of the task objective k,

α

is a constant, and

Δ t_{i k}

is the time required for the i-th UAV to fly from the current position to the position of the mission target, which is determined by the following equation:

\{\begin{matrix} R_{k} (t) = \frac{R_{k}^{0}}{1 + φ t} \\ E (i, k) = b_{1} - b_{2} θ_{i k} \\ θ_{i k} = P (i, k), (P (i, k) \geq P_{min}) \\ Δ t_{i k} = ⌈\frac{1}{T_{s}} \frac{Δ d_{i k}}{V}⌉ \\ Δ d_{i k} = \sqrt{{(x_{i} - x_{k})}^{2} + {(y_{i} - y_{k})}^{2} + {(z_{i} - z_{k})}^{2}} \end{matrix}

(23)

where

R_{k}^{0}

is the initial payoff of the task target k, decaying over time depending on the factor

ϕ

;

b_{1}

and

b_{2}

are model parameters; and

θ_{i k}

is the execution of the i-th UAV on the mission target k.

P_{min}

is the minimum allowed execution probability of the task.

Δ d_{i k}

is the distance from the i-th UAV to the mission target k.

Traditional auction algorithms obtain the most optimal solution by one-to-one matching, which, however, ignores the possibility of many to one or one to many in real air combat scenarios. With the help of the powerful search ability of the LAEPIO algorithm, Equation (23) is used as the objective function to search for the possible optimal allocation, which fully considers the complex battlefield environment of unmanned cluster air combat.

4.2. Maneuver Decision Method Based on LAEPIO Algorithm

The UAV maneuver decision method based on the LAEPIO algorithm allows a UAV to execute coordinated actions autonomously or semi-autonomously so that a swarm can quickly complete a siege strangulation on the battlefield. We design the maneuver decision method based on the LAEPIO algorithm, whose architecture is shown in Figure 3, where our UAV is on the red side and the adversarial UAV is on the blue side. Each individual in the formation shares the full battlefield information sensed by the swarm. After collecting the battlefield information, the agent via LAEPIO evaluates the state of the two UAVs and predicts the adversarial UAVs’ maneuvers in the future according to game theory. Finally, a maneuver decision chain is selected from the maneuver decision library according to the minimax rule.

Every time, the future situation is calculated from the beginning and the blue side’s actions are predicted again. Eventually, we will obtain a decision chain, which will make it possible for the red side to win the battle. However, the huge search space makes it an NP-hard problem in theory, so it is necessary to introduce the LAEPIO algorithm to rapidly achieve a search in the limited space.

The whole flowchart of a UAV swarm maneuver decision method via LAEPIO is given in Figure 4.

As Figure 4 shows, first, UAV swarm enters the small air combat link after the attack target allocation based on the LAEPIO algorithm. Then using the method designed in this paper to select the maneuver strategy to achieve the maximum value of the objective function by predicting the enemy maneuver action, the blue side obtains the next action through other methods. Then, the next maneuver of the blue side is predicted by the situation function based on state information. The LAEPIO algorithm is used to rapidly search the optimal decision from a decision tree under the current state information and prediction information. Finally, we need to judge if the task is complete. If the attack mission is successfully completed, exit the combat state; otherwise, request nearby friendly units for support for the next round of attack mission.

5. Simulation Results and Analysis

In this section, a swarm maneuver decision method based on the LAEPIO algorithm and other algorithms are used by both sides in the UAV air combat simulation. Then, we record the simulation results and analyze excellence between different algorithms.

5.1. Comparative Analysis of Algorithm Performance in Air Combat Simulation

In order to test the UAV swarm maneuver decision method of close air combat based on the LAEPIO algorithm, the simulation experiment of UAV close air combat based on a nonlinear dynamic model designed in Section 1 is carried out. The parameter settings of PIO, particle swarm optimization(PSO), genetic algorithm (GA), sparrow search algorithm (SSA), and LAEPIO algorithms according to the need of the decision-making problem in the test are shown in Table 2. The parameters of the different algorithms are all set in a suitable range.

In order to verify the performance of the LAEPIO algorithm, different optimization algorithms are used in a certain maneuver decision process, and their respective fitness values are plotted in Figure 5.

As we can see, in the optimization process of this maneuver decision, the GA and PIO algorithms need a longer iteration time before the fitness value decreases significantly, and the final convergence value is also very large. Although the fitness value of the SSA algorithm decreases quickly and even exceeds that of the LAEPIO algorithm in the initial stage, the final convergence value is not ideal and only lies between the PIO and GA algorithms. The only one that can be compared with the LAEPIO algorithm is the PSO algorithm; however, not only the proposed LAEPIO algorithm converges faster, but also it has a better value of convergence compared with PSO algorithms.

The following is the result of the comparison between LAEPIO and other algorithms tested by benchmark functions.

From the horizontal comparison of information in Table 3 and Figure 6, it can be concluded that the LAEPIO algorithm designed in this paper has excellent optimization ability and strong stability.

5.2. Air Combat Simulation and Result Analysis

The UAV swarm attack target allocation method based on LAEPIO is simulated under three different initial conditions (6V10, 10V10, 15V10), and the results are shown in Figure 7, where the red square used the agent based on the LAEPIO algorithm and the blue square is the agent of other algorithms.

As we can see in Figure 7, the target allocation algorithm based on the LAEPIO algorithm can always quickly find a relatively good allocation scheme, whether under the conditions of advantage (15V10), disadvantage (6V10), or even power (10V10). Combined with the maneuver decision scheme based on the LAEPIO algorithm, the resource advantages are continuously accumulated in the local cooperative combat, and the overall victory in the battlefield is finally achieved.

Further, the UAV using the LAEPIO algorithm and the UAV using the traditional matrix game algorithm and the traditional PIO algorithm are simulated in a Matlab simulation. Table 4 shows four kinds of initial conditions designed for testing the performance of the agent proposed in this paper.

The results of the simulation are shown in Figure 8 and Figure 9. Compared with the traditional matrix game algorithm and the traditional PIO algorithm, the agent based on the LAEPIO algorithm can quickly shoot down the blue side in the dominant or equilibrium situation. Even in the case of the initial state disadvantage, the red side can also achieve complex maneuvers by selecting the correct maneuver strategy to create advantages for itself and escape or even use the terrain to turn back the defeat. Even in the case of the initial state disadvantage, the red side can also achieve complex maneuvers by selecting the correct maneuver strategy to create advantages for itself and escape or even use the terrain to turn back the defeat.

In Figure 8, as the figure shows, on the left side are air combat results in four different initial conditions, while on the right side is a plot of the scores of the red and blue teams over time for each condition. The specific calculation of the score is obtained by averaging the situation function values of the UAVs. As mentioned above, this is an evaluation index that is complex coupled with many factors such as the attack angle of the UAV, the missile launch distance, the speed limit of the UAV, and the height of the UAV. It can clearly indicate the situation of both sides in the process of UAV confrontation.

There are many different results in the simulation. In Figure 8a, the initial conditions of both sides are roughly the same, and the red UAV induces the blue UAV to crash into the obstacle through the dive and jerk pull strategy. In Figure 8c, the red side obtains enough advantage after complicated maneuvers and fires to shoot down the blue-side UAV. In Figure 8g, despite the disadvantage of the initial conditions, the red team gets rid of the lock of the blue team’s UAV through efficient decision making, and finally, the two sides draw and leave the battlefield.

In Figure 9, two other different simulation results are shown. In Figure 9e, the red has a certain advantage in the initial conditions and quickly shoots down the blue UAV. In Figure 9g, the red UAV is shown to escape from the enemy’s attack range through complex maneuvers and induce it to crash into an obstacle in the absolute disadvantage situation.

In order to further verify the adaptability of the method in this paper to the complex battlefield, the simulation experiment of 1Vn and nVn is designed, and the advantage of nV1 is not considered. The agents using the matrix game and the PIO algorithm are simulated against 1Vn and nVn, respectively, and the initial conditions are set as general conditions. The experimental results are shown in Figure 10 and Figure 11.

As we can see in Figure 10 and Figure 11, the simulation combat verifies the effectiveness of multi-UAV collaboration, under the condition of 1V2, the blue UAVs fighting independently so that the red UAVs are easy to get out of the siege situation through a large angle maneuver, resulting in the red side taking advantage of the plane to fight in 1V1 and finally turning the tide. However, under the initial condition of 2V2, the red UAVs can quickly solve a target by forming a scissor-shaped strangling action to create a greater advantage for the swarm.

In order to further verify the efficiency and superiority of the algorithm in the air combat maneuver decision problem, we order the UAV using the maneuver decision method based on LAEPIO to conduct 100 simulated confrontations with the UAV using other methods at the initial conditions of equilibrium. The result of the combat simulation is shown in Table 5. In the combat simulation, each UAV has a special shot-down mark, which is marked as shot down when the UAV is within the enemy’s attack range for a certain amount of time. When the UAV is judged to be shot down, its situation value will clear to zero and reduce the score of its formation. When all the enemy UAVs are judged to be shot down, the simulation stops, and our victory is immediately determined. If there are still UAVs not shot down by both sides until the end of the simulation, the final result will be determined according to the final scores of both sides. Only when the distance between the two sides is large that the winner with the higher score will be judged; otherwise, it will be regarded as a draw.

From the data in the table, we can see that the UAV swarm maneuver decision method via LAEPIO used in this paper has significant advantages in the 1v1 simulation confrontation with the maneuver decision agents based on the PSO, PIO, SSA and GA algorithms.

6. Discussion

Based on the analysis of the experimental outcomes, it is demonstrated that the proposed autonomous decision-making system efficiently addresses the UAV swarm combat issue. Within the identical system framework, when compared with conventional optimization algorithms, the present approach exhibits superior efficiency and robustness. Nonetheless, the swarm maneuver decision problem possesses an extensive solution space and incorporates the dynamic attributes of the real-world environment, thereby presenting a significant challenge to the algorithm’s convergence rate. Consequently, the PIO algorithm is employed in this study due to its rapid convergence characteristic, which meets the stringent requirements of the maneuver decision problem.

However, in the complex electromagnetic environment on the real battlefield, incomplete environmental information is the norm, which is undoubtedly a great test for the robustness of the unmanned system. It is undoubtedly fatal for swarm intelligence algorithms that rely on environmental awareness to construct situation functions. Therefore, we introduce the LEO mechanism into the PIO algorithm to enhance the robustness of a swarm maneuver decision method. All of the traditional attempts try to find a suitable optimization algorithm and make it perform better when applying to their problems by making improvements on the basis of the original algorithm. However, even if this approach is effective most of the time, it always costs researchers a lot of time and effort to design improved combinations of various optimization algorithms and design complex test flows to verify the effectiveness of algorithms. Therefore, we introduced a new optimization algorithm structure with adaptive learning ability into a traditional swarm intelligence decision-making approach to meet the needs of a variety of complex UAV swarm maneuver decision problems in this paper.

Then, although we have found a suitable optimization algorithm to deal with a swarm maneuver decision problem, how to carry out intelligent swarm warfare is still an unknown goal. Therefore, we propose a systematic architecture of a UAV swarm maneuver decision method as shown in Figure 1, where we divide the swarm combat problem into two subproblems, the dynamic allocation of attack targets and the small-scale swarm maneuver decision. For these two subproblems, we respectively refer to [27] and reference [22,23,24,25] to establish the optimization objective function of the problem.

Finally, comprehensive simulations across diverse air combat scenarios are designed to verify the feasibility of the proposed method. In addition, the swarm maneuver decision method based on the LAEPIO algorithm, along with other algorithms, is implemented in the UAV air combat simulation. This implementation aims to further demonstrate the superiority of the employed algorithm.

7. Conclusions and Future Research

Starting from the air combat maneuver decision-making problem of UAVs, this paper combines the bionic concept, incorporates human learning and cognitive methods into the design concept of bionic intelligent computing methods, and establishes a new LEO mechanism and a decision-optimization algorithm named LAEPIO algorithm with the PIO algorithm.

Second, the LAEPIO algorithm is applied to the attack target allocation process and the air combat maneuver decision-making process of UAVs. Compared with the traditional matrix game algorithm, the standard PIO algorithm, the SSA algorithm, the PSO algorithm, and the GA, the efficiency and superiority of the swarm maneuver decision-making method based on LAEPIO are verified. In this experiment, the LAEPIO algorithm outperforms the above-mentioned optimization algorithms.

Finally, a simulated air combat simulation is designed, and the superiority of the proposed algorithm is further verified by simulating confrontations with the matrix game algorithm and the traditional PIO algorithm in 1V1, 1Vn, and nVn scenarios.

Although we have designed many simulation experiments to verify the rationality and superiority of the proposed method, it cannot represent performance in the real battlefield; therefore, more realistic battlefield environment models, more complex tactical options, and a more detailed modeling of UAV reconnaissance and strike capabilities will be established in the future to further improve the method proposed in this paper.

Author Contributions

Conceptualization, Y.S. and Y.C.; methodology, Y.S. and Y.C.; software, Y.C. and C.W.; validation, Y.C., Y.S. and C.W.; formal analysis, Y.C. and Y.F.; investigation, Y.C. and C.W.; resources, Y.C. and C.W.; data curation, Y.C. and C.W.; writing—original draft preparation, Y.S., B.L. and Y.C.; writing—review and editing, Y.S., Y.F., B.L. and Y.C.; visualization, Y.S. and Y.C.; supervision, Y.S. and C.W.; project administration, Y.S. and Y.C.; funding acquisition, Y.S. and C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under grant numbers 62473025, U24B20156, and 62103040.

Data Availability Statement

The parameters used in this paper are given in the paper. If any researchers need to obtain more details about the simulation or want to engage in academic communication, please contact us.

DURC Statement

The current research is restricted in the range of air combat decision, which is beneficial for enhancing technological advancements, increasing efficiency across autonomous maneuver making of UAV swarm, and improving the adaptability of UAV swarm to complex environments. This research does not pose a threat to public health or national security. The authors acknowledge the dual-use potential of research involving UAV swarm and confirm that all necessary precautions have been taken to prevent potential misuse. As an ethical responsibility, the authors strictly adhere to relevant national and international laws concerning Dual Use Research of Concern (DURC). The authors advocate for responsible deployment, ethical considerations, regulatory compliance, and transparent reporting to mitigate misuse risks and foster beneficial outcomes.

Acknowledgments

The authors would like to thank the editors and the reviewers for their constructive comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wu, C.; Guo, Z.; Zhang, J.; Mao, K.; Luo, D. Cooperative Path Planning for Multiple UAVs Based on APF B-RRT* Algorithm. Drones 2025, 9, 177. [Google Scholar] [CrossRef]
Wang, X.; Wang, W.J.; Song, K.P.; Wang, M. UAV Air-Combat Decision-Making Technology Based on Evolutionary Expert System Tree. Ordnance Ind. Autom. 2019, 38, 42–47. [Google Scholar]
Chin, H.H. Knowledge-based system of supermaneuver selection for pilot aiding. J. Aircr. 1989, 26, 1111–1117. [Google Scholar] [CrossRef]
Bechtel, R.J. Air Combat Maneuvering Expert System Trainer; Air ForceSystems Command: San Antonio, TX, USA, 1992. [Google Scholar]
Zhang, J.D.; Yang, Q.M.; Shi, G.Q.; Lu, Y.; Wu, Y. UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning. J. Syst. Eng. Electron. 2021, 32, 1421–1438. [Google Scholar]
Wang, L.; Zheng, S.; Tai, S.; Liu, H.; Yue, T. UAV air combat autonomous trajectory planning method based on robust adversarial reinforcement learning. Aerosp. Sci. Technol. 2024, 153, 109402. [Google Scholar] [CrossRef]
Gao, X.; Zhang, Y.; Wang, B.; Leng, Z.; Hou, Z. The Optimal Strategies of Maneuver Decision in Air Combat of UCAV Based on the Improved TD3 Algorithm. Drones 2024, 8, 501. [Google Scholar] [CrossRef]
Dong, Z.; Zhao, M.; Jiang, L.; Wang, Z. Review of Key Technologies for Autonomous Collaboration in Heterogeneous Unmanned System Clusters. Telem. Telecontrol 2024, 45, 111. [Google Scholar]
Zhang, Y.; Tu, Y.G.; Zhang, L.; Cui, H.; Wang, J.Y. Current Situation and Prospect of Deep Reinforcement Decision-making Methods in Intelligent Air Combat. Aero Weapon. 2024, 31, 21–31. [Google Scholar]
Xu, Y.F.; Zhou, Z.D.; Song, Z.F.; Ji, W.T.; Wang, J.W.; Zhou, Y.F. Research on Improved Maneuvering Decision-making Algorithm of Deep Reinforcement Learning for Close-range Air Combat. In Proceedings of the 7th National Conference on Swarm Intelligence and Cooperative Control in 2023, Nanjing, China, 24–27 November 2023. [Google Scholar]
Li, W.; Huang, S.Y.; Liu, H.M.; Sun, Z.J. Review of Research on UAV Swarm Countermeasure Decision-making Algorithms. Aeronaut. Sci. Technol. 2024, 35, 9–17. [Google Scholar]
Xie, L.; Deng, S.; Tang, S.; Huang, C.; Dong, K.; Zhang, Z. Beyond visual range maneuver intention recognition based on attention enhanced tuna swarm optimization parallel BiGRU. Complex Intell. Syst. 2023, 10, 2151–2172. [Google Scholar]
Zhou, T.L.; Chen, M.; Han, Z.L.; Wang, Q. Multi-UAV Cooperative Multi-target Assignment Based on Improved Wolf Pack Algorithm. Navig. Position. Timing 2022, 9, 46–55. [Google Scholar]
Yu, Y.P.; Liu, J.C.; Chen, W. Hawk and pigeon’s intelligence for UAV swarm dynamic combat game via competitive learning pigeon-inspired optimization. Sci. China (Technol. Sci.) 2022, 65, 10721086. [Google Scholar] [CrossRef]
Duan, H.B.; Lei, Y.Q.; Xia, J.; Deng, Y.; Shi, Y. Autonomous maneuver decision for unmanned aerial vehicle via improved pigeon-inspired optimization. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 3156–3170. [Google Scholar] [CrossRef]
Li, C.; Duan, H.B. Target detection approach for UAVs via improved Pigeon-inspired Optimization and Edge Potential Function. Aerosp. Sci. Technol. 2014, 39, 352–360. [Google Scholar] [CrossRef]
Yao, Z.X.; Li, M.; Chen, Z.J. A Multi-Aircraft Cooperative Counter-Multiple-Target Mission Decision-Making Method Based on Game Theory Model. Aeronaut. Comput. Tech. 2007, 37, 7–11. [Google Scholar]
Yuan, T.Y.; Fang, Y.C. Multi-step decision-making target assignment method for multi-UAV cooperative air combat based on IIP-GA. In Proceedings of the Chinese Control Conference (CCC), Kunming, China, 28–31 July 2024. [Google Scholar]
Zheng, Z.Q.; Duan, H.B. Maneuver decision-making for close-range air combat of unmanned aerial vehicles based on pigeon-inspired optimizer with limited patience. J. Comput. Appl. 2024, 44, 1401–1407. [Google Scholar]
Wang, L.M.; Wang, Y.H.; Chen, M.; Liu, H.T. Research on Incomplete Information Game Strategy Based on Improved Sparrow Algorithm. J. Jilin Univ. (Inf. Sci. Ed.) 2022, 40, 589–599. [Google Scholar]
Li, Y.F.; Lyu, Y.G.; Shi, J.; Li, W. Autonomous Maneuver Decision of Air Combat Based on Simulated Operation Command and FRV-DDPG Algorithm. Aerospace 2022, 9, 658. [Google Scholar] [CrossRef]
Wang, Y.; Ding, D.L.; Zhang, P.; Xie, L.; Zhang, X.W. Research on Adaptive Situation Assessment Method for UCAV Close-Range Air Combat. Unmanned Syst. Technol. 2023, 6, 85–94. [Google Scholar]
Liu, Y.; Wei, X.L.; Qu, H.; Gan, X.S. UAV Air Combat Situation Analysis and Tactical Optimization Based on STPA Method. J. Command. Control 2023, 9, 651–659. [Google Scholar]
Zhao, K.; Huang, C. Air combat situation assessment for UAV based on improved decision tree. In Proceedings of the 2018 Chinese Control And Decision Conference (CCDC), Shenyang, China, 9–11 June 2018. [Google Scholar]
Meng, X.F.; Du, H.W.; Feng, P.W. Study on situation assessment in air combat based on Gaussian cloudy Bayes-ian network. Comput. Eng. Appl. 2016, 52, 249–253. [Google Scholar]
Zhan, Z.H.; Li, J.Y.; Kwong, S.; Zhang, J. Learning-Aided Evolution for Optimization. IEEE Trans. Evol. Comput. 2023, 27, 1794–1808. [Google Scholar] [CrossRef]
Peng, Y.L.; Duan, H.B.; Zhang, D.F.; Wei, C. Dynamic task allocation for unmanned aerial vehicle swarms inspired by grey wolf cooperative predation behavior. Control Theory Appl. 2021, 38, 1855–1862. [Google Scholar]

Figure 1. Systematic architecture of UAV swarm maneuver decision method.

Figure 2. Structure of the LAEPIO algorithm.

Figure 3. Architecture of UAV swarm maneuver strategy library search method, red is our side and blue is the enemy.

Figure 4. Flowchart of the UAV swarm maneuver decision method.

Figure 5. Iteration curves of maneuver decision based on different optimization algorithms.

Figure 6. The convergence curves of the studied techniques for seven benchmark functions.

Figure 7. Air combat simulation of attack object allocation based on LAEPIO.

Figure 8. Air combat simulation between LAEPIO agent and matrix game method.

Figure 9. Air combat simulation between LAEPIO and PIO method.

Figure 10. Air combat simulation between LAEPIO and matrix game method: (a) 1V2 simulation between LAEPIO and matrix game method, (b) comparison of scores, (c) 2V2 simulation between LAEPIO and matrix game method, and (d) comparison of scores.

Figure 11. Air combat simulation between LAEPIO and PIO.

Table 1. Trial maneuver library.

No.	Normal Load Factor	Roll Angle (Deg)
1	$1 + Δ f_{z}$	0
2	$1 + Δ f_{z} - 0.8 Δ f_{c 1}$	0
3	1	0
4	$1 + Δ f_{z} - 0.8 Δ f_{c 1}$	$ϕ$
5	$1 + Δ f_{z} - 0.8 Δ f_{c 2}$	$ϕ$
6	$1 + Δ f_{z} - 0.8 Δ f_{c 1}$	$ϕ + 10^{\circ}$
7	$1 + Δ f_{z} - 0.8 Δ f_{c 1}$	$ϕ - 10^{\circ}$
8	$1 + Δ f_{z} - 0.8 Δ f_{c 1}$	$γ_{c}$
9	1	$γ_{c}$
10	$1 + Δ f_{z} - 0.8 Δ f_{c 1}$	$γ_{c} + 90^{\circ}$
11	$1 + Δ f_{z} - 0.8 Δ f_{c 1}$	$γ_{c} - 90^{\circ}$
12	$1 + Δ f_{z} - 0.3$	$ϕ$
13	$1 + Δ f_{z} - 0.8 Δ f_{c 1}$	$90^{\circ}$
14	$1 + Δ f_{z} - 0.8 Δ f_{c 2}$	$90^{\circ}$
15	$1 + Δ f_{z} - 0.8 Δ f_{c 1}$	$- 90^{\circ}$
16	$1 + Δ f_{z} - 0.8 Δ f_{c 2}$	$- 90^{\circ}$
17	$1 + Δ f_{z} - 0.8 Δ f_{c 1}$	$γ_{c} + 180^{\circ}$
18	$1 + Δ f_{z} - 0.8 Δ f_{c 2}$	$γ_{c} + 180^{\circ}$
19	$1 + Δ f_{z} - 0.8 Δ f_{c 1}$	$γ_{c} - 180^{\circ}$
20	$1 + Δ f_{z} - 0.8 Δ f_{c 2}$	$γ_{c} - 180^{\circ}$
21	$1 + Δ f_{z} + {\dot{f}}_{z}$	0

Table 2. Parameters of PIO, PSO, GA, SSA, and LAEPIO algorithm.

Algorithms	Parameters	Meanings	Values (Dimensionless)
PIO	$N_{c 1, m a x}$	Maximum number of iterations of map and compass operators	150
	$N_{c 2, m a x}$	Maximum number of iterations of landmark operator	50
	$N_{p}$	Population size	100
	R	Map and compass constant	$0.15$
PSO	$c_{1}, c_{2}$	Learning factor	$2.0$
	$N_{c, m a x}$	Maximum number of iterations	200
	$N_{p}$	Population size	100
	$ω$	Mass factor	$0.7$
GA	$α$	Mutation probability	$0.1$
	$β$	Crossover probability	$\frac{f i t n e s s}{s u m (f i t n e s s)}$
	$N_{c, m a x}$	Maximum number of iterations	200
	$N_{p}$	Population size	100
SSA	$S T$	Safety threshold	$0.6$
	$α$	Seeker probability	$0.2$
	$β$	Follower probability	$0.8$
	$N_{c, m a x}$	Maximum number of iterations	200
	$N_{p}$	Population size	100
LAEPIO	$N_{c 1, m a x}$	Maximum number of iterations of map and compass operators	150
	$N_{c 2, m a x}$	Maximum number of iterations of landmark operator	50
	$N_{p}$	Population size	100
	R	Map and compass constant	$0.15$
	$α$	Rate of variation	$0.1$

Table 3. The statistical results of benchmark functions by the LAEPIO algorithm and other recent methods.

Function	Statistic	LAEPIO	PIO	PSO	GA	SSA
F1	best	$2.85 \times 10^{- 37}$	$4.89 \times 10^{- 1}$	$2.12 \times 10^{- 7}$	$8.70 \times 10^{- 2}$	$9.40 \times 10^{- 10}$
	mean	$1.42 \times 10^{- 34}$	$6.21 \times 10^{1}$	$1.64 \times 10^{- 5}$	1.55	$8.11 \times 10^{- 6}$
	median	$3.46 \times 10^{- 35}$	$1.88 \times 10^{1}$	$8.20 \times 10^{- 6}$	1.36	$1.35 \times 10^{- 6}$
	worst	$1.51 \times 10^{- 33}$	$6.08 \times 10^{2}$	$1.86 \times 10^{- 4}$	4.01	$1.64 \times 10^{- 4}$
	std	$8.50 \times 10^{- 68}$	$1.44 \times 10^{4}$	$8.61 \times 10^{- 10}$	1.17	$5.85 \times 10^{- 10}$
F2	best	$4.96 \times 10^{- 10}$	2.38	$2.07 \times 10^{- 2}$	$8.34 \times 10^{- 1}$	$2.11 \times 10^{- 4}$
	mean	$2.07 \times 10^{- 9}$	4.87	$6.46 \times 10^{- 2}$	2.59	$7.42 \times 10^{- 3}$
	median	$1.87 \times 10^{- 9}$	4.66	$6.25 \times 10^{- 2}$	2.76	$5.67 \times 10^{- 3}$
	worst	$7.40 \times 10^{- 9}$	9.53	$1.51 \times 10^{- 1}$	3.36	$2.87 \times 10^{- 2}$
	std	$1.66 \times 10^{- 18}$	2.39	$9.22 \times 10^{- 4}$	$3.09 \times 10^{- 1}$	$4.02 \times 10^{- 5}$
F3	best	$2.84 \times 10^{- 14}$	8.87	$5.37 \times 10^{- 3}$	7.61	$1.11 \times 10^{- 5}$
	mean	$5.18 \times 10^{- 1}$	$2.67 \times 10^{1}$	$1.99 \times 10^{- 1}$	$1.96 \times 10^{1}$	$1.03 \times 10^{- 1}$
	median	$3.62 \times 10^{- 3}$	$2.50 \times 10^{1}$	$1.41 \times 10^{- 1}$	2.04	$2.18 \times 10^{- 3}$
	worst	1.99	$5.29 \times 10^{1}$	1.21	$2.69 \times 10^{1}$	2.70
	std	$3.73 \times 10^{- 1}$	$8.29 \times 10^{1}$	$3.90 \times 10^{- 2}$	$2.48 \times 10^{1}$	$1.97 \times 10^{- 1}$
F4	best	0.00	$4.68 \times 10^{- 2}$	$6.15 \times 10^{- 5}$	$3.26 \times 10^{- 2}$	$1.26 \times 10^{- 8}$
	mean	$2.57 \times 10^{- 2}$	$1.63 \times 10^{- 1}$	$1.33 \times 10^{- 2}$	$6.65 \times 10^{- 2}$	$6.10 \times 10^{- 6}$
	median	$2.46 \times 10^{- 2}$	$1.53 \times 10^{- 1}$	$1.25 \times 10^{- 2}$	$6.65 \times 10^{- 2}$	$2.87 \times 10^{- 6}$
	worst	$7.15 \times 10^{- 2}$	$2.85 \times 10^{- 1}$	$3.45 \times 10^{- 2}$	$1.08 \times 10^{- 1}$	$6.73 \times 10^{- 5}$
	std	$2.13 \times 10^{- 4}$	$3.54 \times 10^{- 3}$	$1.03 \times 10^{- 4}$	$2.88 \times 10^{- 4}$	$1.24 \times 10^{- 10}$
F5	best	$2.05 \times 10^{- 4}$	$1.03 \times 10^{2}$	$1.16 \times 10^{- 1}$	7.56	$3.34 \times 10^{- 4}$
	mean	$5.15 \times 10^{- 1}$	$1.39 \times 10^{3}$	1.71	$9.00 \times 10^{1}$	$9.56 \times 10^{- 1}$
	median	$1.69 \times 10^{- 1}$	$4.71 \times 10^{2}$	1.65	$8.11 \times 10^{1}$	$2.53 \times 10^{- 1}$
	worst	7.67	$1.04 \times 10^{4}$	4.53	$2.66 \times 10^{2}$	3.96
	std	1.58	$5.47 \times 10^{6}$	1.43	$3.98 \times 10^{3}$	1.89

Table 4. Initial state of UAVs in simulation.

Condition	Side	Value of State
general	red side	$s = [500 m, 1000 m, 4000 m, 260^{\circ}, 0^{\circ}, - 45^{\circ}]$
general	blue side	$s = [1000 m, 2000 m, 4000 m, 260^{\circ}, 0^{\circ}, 45^{\circ}]$
balance	red side	$s = [1000 m, 2000 m, 4000 m, 260^{\circ}, 0^{\circ}, - 90^{\circ}]$
balance	blue side	$s = [1000 m, 1000 m, 4000 m, 260^{\circ}, 0^{\circ}, 90^{\circ}]$
advantage	red side	$s = [500 m, 2000 m, 3500 m, 260^{\circ}, 0^{\circ}, - 90^{\circ}]$
advantage	blue side	$s = [1000 m, 1000 m, 4000 m, 260^{\circ}, 0^{\circ}, 0^{\circ}]$
disadvantage	red side	$s = [1000 m, 1000 m, 4000 m, 260^{\circ}, 0^{\circ}, 0^{\circ}]$
disadvantage	blue side	$s = [500 m, 2000 m, 3500 m, 260^{\circ}, 0^{\circ}, - 90^{\circ}]$

Table 5. Engagement statistics of air combat simulations.

Algorithm	Number of Victories	Number of Failures	Number of Draws	Average Score of LAEPIO	Average Score of Others
LAEPIO vs. PIO	50	22	28	0.635	0.427
LAEPIO vs. PSO	51	25	24	0.647	0.413
LAEPIO vs. SSA	62	13	25	0.681	0.390
LAEPIO vs. GA	58	14	28	0.712	0.491

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, Y.; Chen, Y.; Wei, C.; Li, B.; Fan, Y. Swarm Maneuver Decision Method Based on Learning-Aided Evolutionary Pigeon-Inspired Optimization for UAV Swarm Air Combat. Drones 2025, 9, 218. https://doi.org/10.3390/drones9030218

AMA Style

Sun Y, Chen Y, Wei C, Li B, Fan Y. Swarm Maneuver Decision Method Based on Learning-Aided Evolutionary Pigeon-Inspired Optimization for UAV Swarm Air Combat. Drones. 2025; 9(3):218. https://doi.org/10.3390/drones9030218

Chicago/Turabian Style

Sun, Yongbin, Yu Chen, Chen Wei, Bin Li, and Yanming Fan. 2025. "Swarm Maneuver Decision Method Based on Learning-Aided Evolutionary Pigeon-Inspired Optimization for UAV Swarm Air Combat" Drones 9, no. 3: 218. https://doi.org/10.3390/drones9030218

APA Style

Sun, Y., Chen, Y., Wei, C., Li, B., & Fan, Y. (2025). Swarm Maneuver Decision Method Based on Learning-Aided Evolutionary Pigeon-Inspired Optimization for UAV Swarm Air Combat. Drones, 9(3), 218. https://doi.org/10.3390/drones9030218

Article Menu

Swarm Maneuver Decision Method Based on Learning-Aided Evolutionary Pigeon-Inspired Optimization for UAV Swarm Air Combat

Abstract

1. Introduction

2. Problem Statements

2.1. Dynamic Model of UAV

2.2. Systematic Architecture of UAV Swarm Maneuver Decision Method

2.3. Situation Function of UAV Swarm Maneuver Decision Method

3. Learning-Aided Evolutionary Pigeon-Inspired Optimization Algorithm

3.1. Pigeon-Inspired Optimization Algorithm

3.2. Learning-Aided Evolution Pigeon-Inspired Optimization Algorithm

4. Swarm Maneuver Decision Method Based on LAEPIO Algorithm

4.1. UAV Swarm Attack Target Allocation

4.2. Maneuver Decision Method Based on LAEPIO Algorithm

5. Simulation Results and Analysis

5.1. Comparative Analysis of Algorithm Performance in Air Combat Simulation

5.2. Air Combat Simulation and Result Analysis

6. Discussion

7. Conclusions and Future Research

Author Contributions

Funding

Data Availability Statement

DURC Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI