1. Introduction
Recently, UAVs have evolved from solitary operational units into sophisticated swarm formations that are capable of carrying out complex missions either autonomously or semi-autonomously [
1]. However, this shift towards swarm warfare requires an unprecedented level of individual intelligence among the UAV components and complex swarm control methods, which pose significant technological challenges and continue to expand the boundaries of modern drone capabilities.
The swarm intelligence decision-making approach of UAV swarm pertains to the decision-making technology that enables multiple UAVs to autonomously collaborate for the accomplishment of complex tasks without centralized control and plays a vital role in UAV swarm air combat. In order to solve the difficulty of search space explosion in a cluster air combat decision problem, researchers have made a lot of attempts that could be broadly divided into the following three categories of methods: first, the maneuver decision-making method based on an expert system [
2,
3,
4]; second, swarm autonomous decision making based on machine learning with deep neural networks [
5,
6,
7]; last, autonomous decision making for UAV swarm in an air combat approach based on swarm bionic intelligence [
8]. The solution based on an expert system is generally reliable. Actually, it has become the mainstream solution to a UAV swarm air combat problem in the past few years. However, this scheme relies too much on the priori information from experts, and this kind of theoretical knowledge is often difficult to describe all decision-making scenarios in air combat decision making, so a general decision-making expert system could hardly be constructed [
9]. As regards swarm autonomous decision making based on deep learning, it has a certain degree of universality after the successful training of a deep neural network, but the difficulty of network training and the amount of computing power consumption make it difficult to leave the laboratory and be truly applied to practical application scenarios [
10]. In the realm of the theory of bionic intelligent algorithm, UAV swarms attain diverse combat capabilities by controlling individual UAV decisions and manifesting complex behaviors at the group level. During this process, the swarm intelligence algorithm derived by simulating various swarm behaviors in nature is extensively employed in the decision making of UAV swarm game confrontation due to its excellent scalability, parallelism, and straightforward implementation [
11]. There are numerous attempts of bionic intelligent algorithms to explore the UAV swarm air combat game problem and find a practical swarm intelligence control method from different institutions and scholars. For example, the attention-enhanced bidirectional gate recurrent unit based on the tuna swarm optimization algorithm is proposed to identify the intention of enemy UAVs in the beyond visual range air combat situation in [
12]. Inspired by the hierarchical structure of wolves’ social division of labor, Zhou and Chen et al. [
13] improve the traditional wolf colony optimization algorithm by imitating the information transmission mechanism of wolves with a different division of labor. It is applied to the UAV swarm target assignment problem. Among swarm intelligence algorithms, the pigeon-inspired optimization (PIO) algorithm is adopted to satisfy the rapid convergence requirement of a maneuver decision problem for its swift convergence in this paper. The competitive learning pigeon-inspired optimization algorithm mentioned in [
14] is used to search for the optimal decision in the air combat game. Duan et al. [
15] propose an autonomous maneuver decision for an unmanned aerial vehicle via improved pigeon-inspired optimization, in which the PIO algorithm is improved by creating a new individual evolutionary form. Although these improved methods indeed perform better than then original PIO algorithm in some cases, the architecture of an optimization algorithm is not really changed, which means that it is difficult for these methods to substantially outperform the traditional PIO algorithms.
Summarily, this paper contains the following contributions:
By imitating the process of human cognition and learning behavior, a new optimization algorithm structure named LAEPIO algorithm is proposed, which combines learning-aided evolution for an optimization (LEO) mechanism and a PIO algorithm. Compared with the previous algorithm improvement [
16], the LAEPIO algorithm, combined with an artificial neural network, shows certain advantages in complex decision tree search problems, which greatly increases the robustness of the algorithm while improving the convergence speed to a certain extent.
Considering the complex conditions in the battlefield, the precise dynamic model of the UAV is adopted in this paper, and a complex and comprehensive situation function is established to describe the battlefield advantage, and the results of the situation function are used as the key basis for the autonomous decision making of the UAV.
The swarm maneuver decision method based on the LAEPIO algorithm is proposed to deal with the dynamic performance requirement of a complex battlefield environment. In an air combat, the UAV used a swarm maneuver decision method based on the LAEPIO algorithm, which is able to predict the enemy’s next action mode and quickly adopt an optimal strategy to stay in a better situation.
The remainder of this paper is organized as follows:
Section 2 presents the problem statement. The LAEPIO algorithm is proposed in
Section 3. The UAV swarm confrontation game method and the autonomous maneuver decision method based on LAEPIO are designed in
Section 4. Simulation results and analysis compared with [
17,
18,
19,
20] are presented in
Section 5. A detailed discussion about the method is in
Section 6, and the paper concludes in
Section 7.
2. Problem Statements
The objective of air combat decision making is to identify the optimal action strategy within a fleeting decision window to create superior offensive advantages for UAVs, ultimately securing victory in the entire engagement. Therefore, the real-time requirement of the air combat decision-making method is extremely high, and the conventional algorithm is difficult to directly meet the requirements. Consequently, a more precise nonlinear dynamic model and a decision system capable of responding promptly are indispensable.
2.1. Dynamic Model of UAV
During the simulation of air combat decision scenarios, it relies on finely detailed UAV models to achieve real-time decision making and enhance combat capability. Therefore, a UAV nonlinear model based on aircraft dynamics is proposed. We describe the fixed-wing model used in this paper in a body coordinate system [
21].
The controlled variables have 12 state variables as the following Equation (
1):
where
are the position state of the UAV;
are the roll angle, pitch angle, and yaw angle;
u,
v, and
w are the components of the velocity along the axis; and
are the angle velocity of the Euler angle. The variables
represent the velocity component. The dynamic model is given as follows:
where
, and
are the force components along the axis;
g is the gravity; and
m is the mass of the model.
where
, and
represent the coordinate components of inertia moment, and
are the moments along the axis in the body-fixed reference frame.
2.2. Systematic Architecture of UAV Swarm Maneuver Decision Method
If the continuous state variables of the UAV are directly used as the action space of a decision-making agent, it is complex and difficult to realize. Many flight actions in the process of air combat, such as serpentine maneuver and high-speed dive, are difficult to be modeled directly. Therefore, it is necessary to analyze and dissect the prior knowledge of the air combat game process so that a series of basic actions (meta-actions) can be obtained, which greatly simplifies the difficulty of modeling and discards the search space of agent decision making, so as to make the application of bionic intelligence possible. The collection of these basic actions is called the maneuver library. The maneuvering action library used in this paper includes 21 basic actions; the specific actions are shown in
Table 1.
The autonomous decision-making system of the UAV air combat designed in this paper is composed of the following parts: Sensors obtain the state information of our UAV swarm and adversarial UAV, and then perform attack target allocation and attack resource calculation based on state information gathered before. When the attack conditions are met, the attack mission starts. If the target is destroyed, the target information is returned; otherwise, the reinforcements are requested. Then determine if the task is complete. If the task is finished, output the result and return; otherwise, carry out the next round of target allocation. The air combat decision system is designed as shown in
Figure 1.
2.3. Situation Function of UAV Swarm Maneuver Decision Method
The situation function
is used to describe the battlefield environment and the advantages of both sides in the air combat game process determined by the position information of us and the enemy. According to the situation function
, we can predict the next maneuver action of the enemy and provide the basis for the next decision of the agent. The definition of the situation function is given below. The situation function consists of four parts, which respectively represent the influence mechanism of the angle relationship, the position relationship, the speed relationship, and the relative altitude on the air battlefield situation. The situation value is calculated to design the decision objective, which is to maximize the situation advantage of the own side [
22,
23,
24,
25]. First, define the state description as Equation (
6).
where
R is a vector pointing from the current position to the target position,
q is the angle of vision,
is the angle of track,
is the angle of track inclination, and
V is the vector of velocity.
The situation value is calculated through the relationship between the angle, position, velocity, and altitude of both sides, and the decision objective is designed to maximize the utilization of the situation. The angle factor of the situation function
is as follows:
where
is the limitation of an angle normally set as 80,
is a correct coefficient,
, and
is a function missile attack distance. From the formula, we can see that when our angle of vision
is smaller than the limit angle
, the angle advantage factor
is significantly proportional to the attack distance and inversely proportional to the
; when
is larger than the limit angle,
is inversely proportional to both
and
. Therefore, it is necessary to minimize
in the decision-making process.
The distance factor of the situation function
is shown as follows:
where
D is the distance between our position and the target position, the stable firing range
is a coefficient,
is a standard height correct coefficient, and
is a constant equal to 1000 m. As we can see, ignoring the influence of an altitude factor, the situation assessment value is larger when the distance between two aircraft is closer to the limit distance
. Therefore, in actual combat, getting as close as possible to the enemy aircraft outside the enemy warning range will effectively improve the situation assessment value.
The velocity factor of the situation function
is as follows:
where
and
are the velocity limitation of the UAV,
is the desire velocity,
is the current height, and
D is the distance between us and the enemy.
represents the actual far boundary of this type of missile when the entry angle is
. Obviously, only when the speed tracking performance of the controller is satisfied that the desired speed is larger and the situation assessment value is higher. However, a very large desired velocity brings great challenges to the robustness of a flight controller. Additionally, the desire velocity is coupled with distance between a friend and a foe and the attack distance.
The height factor of the situation function
is as shown below:
where
and
are the height limitation of the UAV,
represents the current height of the target, and
is the desired height. Being similar to the definition of velocity factor, only when the height tracking performance of the controller is satisfied that the desired height is higher and the situation assessment value is larger. Meanwhile, the desired height is also coupled with the distance between a friend and a foe and the attack distance.
The final situation function can be obtained by normalizing and weighting the above four situation assessment functions as illustrated in function (
13).
3. Learning-Aided Evolutionary Pigeon-Inspired Optimization Algorithm
Inspired by the process of human cognition and learning, a novel optimization algorithm structure, designated as the LAEPIO algorithm, is proposed in this section. It combines the learning-aided evolution for optimization (LEO) mechanism and the PIO algorithm.
3.1. Pigeon-Inspired Optimization Algorithm
The pigeon-inspired optimization algorithm is a global optimization method that simulates biological behavior. Duan et al. [
15] developed the PIO algorithm inspired by a pigeon swarm’s unique ability to solve an optimal problem that needs to quickly converge. The PIO algorithm mainly uses the map–compass operators and landmark operators to update the position and velocity of the pigeon flock. Pigeons have magnetic induction structures on their beak structures, sense the geomagnetic field using magnetic objects in flight, and then form a map in their mind. In the pigeon-inspired optimization algorithm, a virtual pigeon is used to simulate the navigation process, and the position and velocity of the pigeon are initialized. In the multi-dimensional search space, the respective position and velocity are updated in each iteration as Equation (
14). The speed of the
i-th pigeon is determined by the speed of its previous generation and the best position and location of the current pigeon.
where
R is the map factor, rand is a random number, and
t is the number of generations. The position of the
i-th pigeon is determined by its previous position and its current speed. The flight of all pigeons is guaranteed by the map, and the best position of pigeons can be obtained by comparison.
The landmark operator is used to model the influence of landmarks on pigeons in the navigation tool. When flying close to the destination, pigeons rely more on nearby landmarks. In the landmark model, the number of half pigeons is recorded in each generation. Those pigeons far from the destination are not familiar with the surface, so they will no longer have the ability to distinguish the path. At this stage, the pigeon flock optimizes its flight direction and speed by looking for the surrounding landmarks. The size of the population becomes half of the size of the last iteration. After the population size is halved, the population center position is calculated, and the individual flight direction is updated based on the center position as Equation (
15).
where
is the adaptability value of every individual,
is the center of the position, and
is the size of the population.
The PIO algorithm has the characteristics of fast search speed and strong evolution ability, but it also has some limitations. For example, the algorithm is easy to fall into a local optimal solution with the increase in the number of iterations. To solve this problem, the learning-aided evolutionary optimization (LEO) framework is introduced into the PIO algorithm.
3.2. Learning-Aided Evolution Pigeon-Inspired Optimization Algorithm
A learning-aided evolutionary optimization framework [
26] plus learning and evolution for solving optimization problems is introduced into the PIO algorithm in this paper. The LAEPIO algorithm, as shown in
Figure 2, is inspired by the origin mechanism of human intelligence, imitates human cognition and the learning process, obtains information about the objective function from the algorithm operation process to train the neural network, and finally realizes the effect of auxiliary intelligent optimization algorithm evolution.
Following the human learning process, in the early stage of the algorithm, the individual evolution only depends on the update formula of the traditional optimization algorithm. In this process, a lot of information about the objective function will be accumulated, which will not be used in the traditional algorithm process. However, in the LEO mechanism, this information is initially filtered to form a successful evolution pair (SEP), shown as follows:
In order to make use of the knowledge about the objective function accumulated along with the exploration of the objective function during the individual updating process, the previously accumulated SEPs are used to train the ANN network after the pooling operation in the middle of the algorithm, and the loss function
of the neural network is taken as follows:
In the latter stage of the algorithm, the training of the neural network is basically completed, which means that the cognition is basically formed and the learning step is basically completed. At this time, a reasonable way to use the neural network is needed. The LEO mechanism provides two operation methods to assist in the evolution: learning mutation (
) and learning crossover (
) operation. Their definitions are given as follows:
where
is the range of the variation rate
;
is the crossover rate calculated by the following Equation (
19) in which
is a constant; and
,
, and
are all randomly chosen individual best positions
.
To address the limitations of traditional pigeon-inspired optimization (PIO) algorithms, including premature convergence, insufficient global exploration, and parameter sensitivity, this paper integrates the LEO mechanism to the PIO algorithm to develop a new PIO algorithm based on the learning evolution assistant mechanism. The proposed framework is specifically designed for solving UAV air combat game optimization problems. The LEO mechanism enhances algorithmic intelligence through the following three critical improvements:
Concurrent multi-threaded neural network training preserves the algorithm’s initial rapid convergence characteristics.
Dynamic parameter adaptation eliminates manual tuning requirements.
Guided evolutionary strategies significantly improve convergence accuracy within constrained computational budgets.
Notably, while conventional PIO demonstrates accelerated convergence during initial iterations, its population diversity deteriorates progressively, leading to diminished global search capabilities. The introduced LEO mechanism effectively compensates for these deficiencies with the help of a large number of previous accumulated learning experience about the function to be optimized. This synergistic integration not only reduces local optimum entrapment probability but also enhances algorithmic stability while maintaining computational efficiency advantages inherent to PIO architectures. The pseudocode for LAEPIO is given as Algorithm 1.
Algorithm 1 Algorithm of LAEPIO |
Input: Variable X to be optimized X (This is Inputs) Output: Best variable and best fitness , (This is Outputs)
- 1:
initialization - 2:
for each do - 3:
if the ANN is trained over then - 4:
update position as Equation ( 18) - 5:
else - 6:
update position as Equation ( 14) - 7:
end if - 8:
if current position and satisfy the condition of SEPs then - 9:
put into replaybuffer - 10:
end if - 11:
update , and train ANN as Equation ( 17) - 12:
end for - 13:
for each do - 14:
if the ANN is trained over then - 15:
update position as Equation ( 18) - 16:
else - 17:
update position as Equation ( 15) - 18:
end if - 19:
if current position and satisfy the condition of SEPs then - 20:
put into replaybuffer - 21:
end if - 22:
update , and train ANN as Equation ( 17) - 23:
end for - 24:
return Outputs
|
In the early iteration of the algorithm, the PIO algorithm has fast convergence speed, which is suitable for quickly obtaining a large number of successful evolution samples. By screening the successful evolution pairs in the algorithm iteration process to form the training data set, and using a parallel computing method, the auxiliary network can be trained in the algorithm iteration process. In the latter iteration of the algorithm, the PIO algorithm tends to be stable and the exploration ability is insufficient, so the training auxiliary network is introduced, and the prediction results of the auxiliary network are randomly mixed into the algorithm update link through the method of crossover and mutation to improve the exploration of the algorithm. Regarding the analysis of time complexity, the algorithm can be considered to have the same time complexity as the PIO algorithm due to the influence of parallel computing.
4. Swarm Maneuver Decision Method Based on LAEPIO Algorithm
In this section, the swarm maneuver decision method based on the LAEPIO algorithm is proposed, which is able to predict the enemy’s next action mode and quickly adopt an optimal strategy to stay in a better situation. In order to deal with the problem of unmanned aerial vehicle (UAV) swarm air combat, the attacking targets should be dynamically allocated. By constructing the efficiency function (
22) of multi-objective allocation, feasible allocation results can be obtained in a very short time by using the LAEPIO algorithm mentioned in this paper. Then, for each UAV in each local battlefield after allocation, the decision chain is obtained by using the decision optimization algorithm proposed in this paper. Finally, we will expand the advantage of our UAV cluster in the overall battlefield.
4.1. UAV Swarm Attack Target Allocation
UAV attack target allocation refers to the reasonable allocation of tasks to each UAV in the scenario of multi-UAV cooperative operations to maximize the effect of cooperative operations. This is one of the key technologies for UAV swarm to realize the efficient execution of combat tasks. Inspired by the group cooperative predation behavior of the gray wolf, in [
27], th authors propose a task allocation method based on the concept of bionics. Mapping the coyote group’s hunting behavior to UAVs’ attack allocation, we define total attack resource as follows:
where
n is maximum quantity of UAVs and
is the combat resource of any UAV. The minimum requirement for a total attack resource is as follows:
where
is a redundancy coefficient
and
is the least consumption of the current UAV
u for the target
t. If the feasible attack formation meets the total attack resource requirements, it will be added to the pre-attack formation until all feasible formations are traversed. All feasible formations that meet the minimum attack resource requirements are searched by the LAEPIO algorithm. When the attack capability of one UAV cannot kill the target, multiple UAVs can coordinate to complete the mission.
When any UAV meets the minimum attack resource requirement, it decides whether to participate in the mission according to the probability function shown as Equation (
22).
where
is the reward function of the task objective
k,
is a constant, and
is the time required for the
i-th UAV to fly from the current position to the position of the mission target, which is determined by the following equation:
where
is the initial payoff of the task target
k, decaying over time depending on the factor
;
and
are model parameters; and
is the execution of the
i-th UAV on the mission target
k.
is the minimum allowed execution probability of the task.
is the distance from the
i-th UAV to the mission target
k.
Traditional auction algorithms obtain the most optimal solution by one-to-one matching, which, however, ignores the possibility of many to one or one to many in real air combat scenarios. With the help of the powerful search ability of the LAEPIO algorithm, Equation (
23) is used as the objective function to search for the possible optimal allocation, which fully considers the complex battlefield environment of unmanned cluster air combat.
4.2. Maneuver Decision Method Based on LAEPIO Algorithm
The UAV maneuver decision method based on the LAEPIO algorithm allows a UAV to execute coordinated actions autonomously or semi-autonomously so that a swarm can quickly complete a siege strangulation on the battlefield. We design the maneuver decision method based on the LAEPIO algorithm, whose architecture is shown in
Figure 3, where our UAV is on the red side and the adversarial UAV is on the blue side. Each individual in the formation shares the full battlefield information sensed by the swarm. After collecting the battlefield information, the agent via LAEPIO evaluates the state of the two UAVs and predicts the adversarial UAVs’ maneuvers in the future according to game theory. Finally, a maneuver decision chain is selected from the maneuver decision library according to the minimax rule.
Every time, the future situation is calculated from the beginning and the blue side’s actions are predicted again. Eventually, we will obtain a decision chain, which will make it possible for the red side to win the battle. However, the huge search space makes it an NP-hard problem in theory, so it is necessary to introduce the LAEPIO algorithm to rapidly achieve a search in the limited space.
The whole flowchart of a UAV swarm maneuver decision method via LAEPIO is given in
Figure 4.
As
Figure 4 shows, first, UAV swarm enters the small air combat link after the attack target allocation based on the LAEPIO algorithm. Then using the method designed in this paper to select the maneuver strategy to achieve the maximum value of the objective function by predicting the enemy maneuver action, the blue side obtains the next action through other methods. Then, the next maneuver of the blue side is predicted by the situation function based on state information. The LAEPIO algorithm is used to rapidly search the optimal decision from a decision tree under the current state information and prediction information. Finally, we need to judge if the task is complete. If the attack mission is successfully completed, exit the combat state; otherwise, request nearby friendly units for support for the next round of attack mission.
5. Simulation Results and Analysis
In this section, a swarm maneuver decision method based on the LAEPIO algorithm and other algorithms are used by both sides in the UAV air combat simulation. Then, we record the simulation results and analyze excellence between different algorithms.
5.1. Comparative Analysis of Algorithm Performance in Air Combat Simulation
In order to test the UAV swarm maneuver decision method of close air combat based on the LAEPIO algorithm, the simulation experiment of UAV close air combat based on a nonlinear dynamic model designed in
Section 1 is carried out. The parameter settings of PIO, particle swarm optimization(PSO), genetic algorithm (GA), sparrow search algorithm (SSA), and LAEPIO algorithms according to the need of the decision-making problem in the test are shown in
Table 2. The parameters of the different algorithms are all set in a suitable range.
In order to verify the performance of the LAEPIO algorithm, different optimization algorithms are used in a certain maneuver decision process, and their respective fitness values are plotted in
Figure 5.
As we can see, in the optimization process of this maneuver decision, the GA and PIO algorithms need a longer iteration time before the fitness value decreases significantly, and the final convergence value is also very large. Although the fitness value of the SSA algorithm decreases quickly and even exceeds that of the LAEPIO algorithm in the initial stage, the final convergence value is not ideal and only lies between the PIO and GA algorithms. The only one that can be compared with the LAEPIO algorithm is the PSO algorithm; however, not only the proposed LAEPIO algorithm converges faster, but also it has a better value of convergence compared with PSO algorithms.
The following is the result of the comparison between LAEPIO and other algorithms tested by benchmark functions.
From the horizontal comparison of information in
Table 3 and
Figure 6, it can be concluded that the LAEPIO algorithm designed in this paper has excellent optimization ability and strong stability.
5.2. Air Combat Simulation and Result Analysis
The UAV swarm attack target allocation method based on LAEPIO is simulated under three different initial conditions (6V10, 10V10, 15V10), and the results are shown in
Figure 7, where the red square used the agent based on the LAEPIO algorithm and the blue square is the agent of other algorithms.
As we can see in
Figure 7, the target allocation algorithm based on the LAEPIO algorithm can always quickly find a relatively good allocation scheme, whether under the conditions of advantage (15V10), disadvantage (6V10), or even power (10V10). Combined with the maneuver decision scheme based on the LAEPIO algorithm, the resource advantages are continuously accumulated in the local cooperative combat, and the overall victory in the battlefield is finally achieved.
Further, the UAV using the LAEPIO algorithm and the UAV using the traditional matrix game algorithm and the traditional PIO algorithm are simulated in a Matlab simulation.
Table 4 shows four kinds of initial conditions designed for testing the performance of the agent proposed in this paper.
The results of the simulation are shown in
Figure 8 and
Figure 9. Compared with the traditional matrix game algorithm and the traditional PIO algorithm, the agent based on the LAEPIO algorithm can quickly shoot down the blue side in the dominant or equilibrium situation. Even in the case of the initial state disadvantage, the red side can also achieve complex maneuvers by selecting the correct maneuver strategy to create advantages for itself and escape or even use the terrain to turn back the defeat. Even in the case of the initial state disadvantage, the red side can also achieve complex maneuvers by selecting the correct maneuver strategy to create advantages for itself and escape or even use the terrain to turn back the defeat.
In
Figure 8, as the figure shows, on the left side are air combat results in four different initial conditions, while on the right side is a plot of the scores of the red and blue teams over time for each condition. The specific calculation of the score is obtained by averaging the situation function values of the UAVs. As mentioned above, this is an evaluation index that is complex coupled with many factors such as the attack angle of the UAV, the missile launch distance, the speed limit of the UAV, and the height of the UAV. It can clearly indicate the situation of both sides in the process of UAV confrontation.
There are many different results in the simulation. In
Figure 8a, the initial conditions of both sides are roughly the same, and the red UAV induces the blue UAV to crash into the obstacle through the dive and jerk pull strategy. In
Figure 8c, the red side obtains enough advantage after complicated maneuvers and fires to shoot down the blue-side UAV. In
Figure 8g, despite the disadvantage of the initial conditions, the red team gets rid of the lock of the blue team’s UAV through efficient decision making, and finally, the two sides draw and leave the battlefield.
In
Figure 9, two other different simulation results are shown. In
Figure 9e, the red has a certain advantage in the initial conditions and quickly shoots down the blue UAV. In
Figure 9g, the red UAV is shown to escape from the enemy’s attack range through complex maneuvers and induce it to crash into an obstacle in the absolute disadvantage situation.
In order to further verify the adaptability of the method in this paper to the complex battlefield, the simulation experiment of 1Vn and nVn is designed, and the advantage of nV1 is not considered. The agents using the matrix game and the PIO algorithm are simulated against 1Vn and nVn, respectively, and the initial conditions are set as general conditions. The experimental results are shown in
Figure 10 and
Figure 11.
As we can see in
Figure 10 and
Figure 11, the simulation combat verifies the effectiveness of multi-UAV collaboration, under the condition of 1V2, the blue UAVs fighting independently so that the red UAVs are easy to get out of the siege situation through a large angle maneuver, resulting in the red side taking advantage of the plane to fight in 1V1 and finally turning the tide. However, under the initial condition of 2V2, the red UAVs can quickly solve a target by forming a scissor-shaped strangling action to create a greater advantage for the swarm.
In order to further verify the efficiency and superiority of the algorithm in the air combat maneuver decision problem, we order the UAV using the maneuver decision method based on LAEPIO to conduct 100 simulated confrontations with the UAV using other methods at the initial conditions of equilibrium. The result of the combat simulation is shown in
Table 5. In the combat simulation, each UAV has a special shot-down mark, which is marked as shot down when the UAV is within the enemy’s attack range for a certain amount of time. When the UAV is judged to be shot down, its situation value will clear to zero and reduce the score of its formation. When all the enemy UAVs are judged to be shot down, the simulation stops, and our victory is immediately determined. If there are still UAVs not shot down by both sides until the end of the simulation, the final result will be determined according to the final scores of both sides. Only when the distance between the two sides is large that the winner with the higher score will be judged; otherwise, it will be regarded as a draw.
From the data in the table, we can see that the UAV swarm maneuver decision method via LAEPIO used in this paper has significant advantages in the 1v1 simulation confrontation with the maneuver decision agents based on the PSO, PIO, SSA and GA algorithms.
6. Discussion
Based on the analysis of the experimental outcomes, it is demonstrated that the proposed autonomous decision-making system efficiently addresses the UAV swarm combat issue. Within the identical system framework, when compared with conventional optimization algorithms, the present approach exhibits superior efficiency and robustness. Nonetheless, the swarm maneuver decision problem possesses an extensive solution space and incorporates the dynamic attributes of the real-world environment, thereby presenting a significant challenge to the algorithm’s convergence rate. Consequently, the PIO algorithm is employed in this study due to its rapid convergence characteristic, which meets the stringent requirements of the maneuver decision problem.
However, in the complex electromagnetic environment on the real battlefield, incomplete environmental information is the norm, which is undoubtedly a great test for the robustness of the unmanned system. It is undoubtedly fatal for swarm intelligence algorithms that rely on environmental awareness to construct situation functions. Therefore, we introduce the LEO mechanism into the PIO algorithm to enhance the robustness of a swarm maneuver decision method. All of the traditional attempts try to find a suitable optimization algorithm and make it perform better when applying to their problems by making improvements on the basis of the original algorithm. However, even if this approach is effective most of the time, it always costs researchers a lot of time and effort to design improved combinations of various optimization algorithms and design complex test flows to verify the effectiveness of algorithms. Therefore, we introduced a new optimization algorithm structure with adaptive learning ability into a traditional swarm intelligence decision-making approach to meet the needs of a variety of complex UAV swarm maneuver decision problems in this paper.
Then, although we have found a suitable optimization algorithm to deal with a swarm maneuver decision problem, how to carry out intelligent swarm warfare is still an unknown goal. Therefore, we propose a systematic architecture of a UAV swarm maneuver decision method as shown in
Figure 1, where we divide the swarm combat problem into two subproblems, the dynamic allocation of attack targets and the small-scale swarm maneuver decision. For these two subproblems, we respectively refer to [
27] and reference [
22,
23,
24,
25] to establish the optimization objective function of the problem.
Finally, comprehensive simulations across diverse air combat scenarios are designed to verify the feasibility of the proposed method. In addition, the swarm maneuver decision method based on the LAEPIO algorithm, along with other algorithms, is implemented in the UAV air combat simulation. This implementation aims to further demonstrate the superiority of the employed algorithm.
7. Conclusions and Future Research
Starting from the air combat maneuver decision-making problem of UAVs, this paper combines the bionic concept, incorporates human learning and cognitive methods into the design concept of bionic intelligent computing methods, and establishes a new LEO mechanism and a decision-optimization algorithm named LAEPIO algorithm with the PIO algorithm.
Second, the LAEPIO algorithm is applied to the attack target allocation process and the air combat maneuver decision-making process of UAVs. Compared with the traditional matrix game algorithm, the standard PIO algorithm, the SSA algorithm, the PSO algorithm, and the GA, the efficiency and superiority of the swarm maneuver decision-making method based on LAEPIO are verified. In this experiment, the LAEPIO algorithm outperforms the above-mentioned optimization algorithms.
Finally, a simulated air combat simulation is designed, and the superiority of the proposed algorithm is further verified by simulating confrontations with the matrix game algorithm and the traditional PIO algorithm in 1V1, 1Vn, and nVn scenarios.
Although we have designed many simulation experiments to verify the rationality and superiority of the proposed method, it cannot represent performance in the real battlefield; therefore, more realistic battlefield environment models, more complex tactical options, and a more detailed modeling of UAV reconnaissance and strike capabilities will be established in the future to further improve the method proposed in this paper.
Author Contributions
Conceptualization, Y.S. and Y.C.; methodology, Y.S. and Y.C.; software, Y.C. and C.W.; validation, Y.C., Y.S. and C.W.; formal analysis, Y.C. and Y.F.; investigation, Y.C. and C.W.; resources, Y.C. and C.W.; data curation, Y.C. and C.W.; writing—original draft preparation, Y.S., B.L. and Y.C.; writing—review and editing, Y.S., Y.F., B.L. and Y.C.; visualization, Y.S. and Y.C.; supervision, Y.S. and C.W.; project administration, Y.S. and Y.C.; funding acquisition, Y.S. and C.W. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Natural Science Foundation of China under grant numbers 62473025, U24B20156, and 62103040.
Data Availability Statement
The parameters used in this paper are given in the paper. If any researchers need to obtain more details about the simulation or want to engage in academic communication, please contact us.
DURC Statement
The current research is restricted in the range of air combat decision, which is beneficial for enhancing technological advancements, increasing efficiency across autonomous maneuver making of UAV swarm, and improving the adaptability of UAV swarm to complex environments. This research does not pose a threat to public health or national security. The authors acknowledge the dual-use potential of research involving UAV swarm and confirm that all necessary precautions have been taken to prevent potential misuse. As an ethical responsibility, the authors strictly adhere to relevant national and international laws concerning Dual Use Research of Concern (DURC). The authors advocate for responsible deployment, ethical considerations, regulatory compliance, and transparent reporting to mitigate misuse risks and foster beneficial outcomes.
Acknowledgments
The authors would like to thank the editors and the reviewers for their constructive comments.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Wu, C.; Guo, Z.; Zhang, J.; Mao, K.; Luo, D. Cooperative Path Planning for Multiple UAVs Based on APF B-RRT* Algorithm. Drones 2025, 9, 177. [Google Scholar] [CrossRef]
- Wang, X.; Wang, W.J.; Song, K.P.; Wang, M. UAV Air-Combat Decision-Making Technology Based on Evolutionary Expert System Tree. Ordnance Ind. Autom. 2019, 38, 42–47. [Google Scholar]
- Chin, H.H. Knowledge-based system of supermaneuver selection for pilot aiding. J. Aircr. 1989, 26, 1111–1117. [Google Scholar] [CrossRef]
- Bechtel, R.J. Air Combat Maneuvering Expert System Trainer; Air ForceSystems Command: San Antonio, TX, USA, 1992. [Google Scholar]
- Zhang, J.D.; Yang, Q.M.; Shi, G.Q.; Lu, Y.; Wu, Y. UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning. J. Syst. Eng. Electron. 2021, 32, 1421–1438. [Google Scholar]
- Wang, L.; Zheng, S.; Tai, S.; Liu, H.; Yue, T. UAV air combat autonomous trajectory planning method based on robust adversarial reinforcement learning. Aerosp. Sci. Technol. 2024, 153, 109402. [Google Scholar] [CrossRef]
- Gao, X.; Zhang, Y.; Wang, B.; Leng, Z.; Hou, Z. The Optimal Strategies of Maneuver Decision in Air Combat of UCAV Based on the Improved TD3 Algorithm. Drones 2024, 8, 501. [Google Scholar] [CrossRef]
- Dong, Z.; Zhao, M.; Jiang, L.; Wang, Z. Review of Key Technologies for Autonomous Collaboration in Heterogeneous Unmanned System Clusters. Telem. Telecontrol 2024, 45, 111. [Google Scholar]
- Zhang, Y.; Tu, Y.G.; Zhang, L.; Cui, H.; Wang, J.Y. Current Situation and Prospect of Deep Reinforcement Decision-making Methods in Intelligent Air Combat. Aero Weapon. 2024, 31, 21–31. [Google Scholar]
- Xu, Y.F.; Zhou, Z.D.; Song, Z.F.; Ji, W.T.; Wang, J.W.; Zhou, Y.F. Research on Improved Maneuvering Decision-making Algorithm of Deep Reinforcement Learning for Close-range Air Combat. In Proceedings of the 7th National Conference on Swarm Intelligence and Cooperative Control in 2023, Nanjing, China, 24–27 November 2023. [Google Scholar]
- Li, W.; Huang, S.Y.; Liu, H.M.; Sun, Z.J. Review of Research on UAV Swarm Countermeasure Decision-making Algorithms. Aeronaut. Sci. Technol. 2024, 35, 9–17. [Google Scholar]
- Xie, L.; Deng, S.; Tang, S.; Huang, C.; Dong, K.; Zhang, Z. Beyond visual range maneuver intention recognition based on attention enhanced tuna swarm optimization parallel BiGRU. Complex Intell. Syst. 2023, 10, 2151–2172. [Google Scholar]
- Zhou, T.L.; Chen, M.; Han, Z.L.; Wang, Q. Multi-UAV Cooperative Multi-target Assignment Based on Improved Wolf Pack Algorithm. Navig. Position. Timing 2022, 9, 46–55. [Google Scholar]
- Yu, Y.P.; Liu, J.C.; Chen, W. Hawk and pigeon’s intelligence for UAV swarm dynamic combat game via competitive learning pigeon-inspired optimization. Sci. China (Technol. Sci.) 2022, 65, 10721086. [Google Scholar] [CrossRef]
- Duan, H.B.; Lei, Y.Q.; Xia, J.; Deng, Y.; Shi, Y. Autonomous maneuver decision for unmanned aerial vehicle via improved pigeon-inspired optimization. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 3156–3170. [Google Scholar] [CrossRef]
- Li, C.; Duan, H.B. Target detection approach for UAVs via improved Pigeon-inspired Optimization and Edge Potential Function. Aerosp. Sci. Technol. 2014, 39, 352–360. [Google Scholar] [CrossRef]
- Yao, Z.X.; Li, M.; Chen, Z.J. A Multi-Aircraft Cooperative Counter-Multiple-Target Mission Decision-Making Method Based on Game Theory Model. Aeronaut. Comput. Tech. 2007, 37, 7–11. [Google Scholar]
- Yuan, T.Y.; Fang, Y.C. Multi-step decision-making target assignment method for multi-UAV cooperative air combat based on IIP-GA. In Proceedings of the Chinese Control Conference (CCC), Kunming, China, 28–31 July 2024. [Google Scholar]
- Zheng, Z.Q.; Duan, H.B. Maneuver decision-making for close-range air combat of unmanned aerial vehicles based on pigeon-inspired optimizer with limited patience. J. Comput. Appl. 2024, 44, 1401–1407. [Google Scholar]
- Wang, L.M.; Wang, Y.H.; Chen, M.; Liu, H.T. Research on Incomplete Information Game Strategy Based on Improved Sparrow Algorithm. J. Jilin Univ. (Inf. Sci. Ed.) 2022, 40, 589–599. [Google Scholar]
- Li, Y.F.; Lyu, Y.G.; Shi, J.; Li, W. Autonomous Maneuver Decision of Air Combat Based on Simulated Operation Command and FRV-DDPG Algorithm. Aerospace 2022, 9, 658. [Google Scholar] [CrossRef]
- Wang, Y.; Ding, D.L.; Zhang, P.; Xie, L.; Zhang, X.W. Research on Adaptive Situation Assessment Method for UCAV Close-Range Air Combat. Unmanned Syst. Technol. 2023, 6, 85–94. [Google Scholar]
- Liu, Y.; Wei, X.L.; Qu, H.; Gan, X.S. UAV Air Combat Situation Analysis and Tactical Optimization Based on STPA Method. J. Command. Control 2023, 9, 651–659. [Google Scholar]
- Zhao, K.; Huang, C. Air combat situation assessment for UAV based on improved decision tree. In Proceedings of the 2018 Chinese Control And Decision Conference (CCDC), Shenyang, China, 9–11 June 2018. [Google Scholar]
- Meng, X.F.; Du, H.W.; Feng, P.W. Study on situation assessment in air combat based on Gaussian cloudy Bayes-ian network. Comput. Eng. Appl. 2016, 52, 249–253. [Google Scholar]
- Zhan, Z.H.; Li, J.Y.; Kwong, S.; Zhang, J. Learning-Aided Evolution for Optimization. IEEE Trans. Evol. Comput. 2023, 27, 1794–1808. [Google Scholar] [CrossRef]
- Peng, Y.L.; Duan, H.B.; Zhang, D.F.; Wei, C. Dynamic task allocation for unmanned aerial vehicle swarms inspired by grey wolf cooperative predation behavior. Control Theory Appl. 2021, 38, 1855–1862. [Google Scholar]
Figure 1.
Systematic architecture of UAV swarm maneuver decision method.
Figure 1.
Systematic architecture of UAV swarm maneuver decision method.
Figure 2.
Structure of the LAEPIO algorithm.
Figure 2.
Structure of the LAEPIO algorithm.
Figure 3.
Architecture of UAV swarm maneuver strategy library search method, red is our side and blue is the enemy.
Figure 3.
Architecture of UAV swarm maneuver strategy library search method, red is our side and blue is the enemy.
Figure 4.
Flowchart of the UAV swarm maneuver decision method.
Figure 4.
Flowchart of the UAV swarm maneuver decision method.
Figure 5.
Iteration curves of maneuver decision based on different optimization algorithms.
Figure 5.
Iteration curves of maneuver decision based on different optimization algorithms.
Figure 6.
The convergence curves of the studied techniques for seven benchmark functions.
Figure 6.
The convergence curves of the studied techniques for seven benchmark functions.
Figure 7.
Air combat simulation of attack object allocation based on LAEPIO.
Figure 7.
Air combat simulation of attack object allocation based on LAEPIO.
Figure 8.
Air combat simulation between LAEPIO agent and matrix game method.
Figure 8.
Air combat simulation between LAEPIO agent and matrix game method.
Figure 9.
Air combat simulation between LAEPIO and PIO method.
Figure 9.
Air combat simulation between LAEPIO and PIO method.
Figure 10.
Air combat simulation between LAEPIO and matrix game method: (a) 1V2 simulation between LAEPIO and matrix game method, (b) comparison of scores, (c) 2V2 simulation between LAEPIO and matrix game method, and (d) comparison of scores.
Figure 10.
Air combat simulation between LAEPIO and matrix game method: (a) 1V2 simulation between LAEPIO and matrix game method, (b) comparison of scores, (c) 2V2 simulation between LAEPIO and matrix game method, and (d) comparison of scores.
Figure 11.
Air combat simulation between LAEPIO and PIO.
Figure 11.
Air combat simulation between LAEPIO and PIO.
Table 1.
Trial maneuver library.
Table 1.
Trial maneuver library.
No. | Normal Load Factor | Roll Angle (Deg) |
---|
1 | | 0 |
2 | | 0 |
3 | 1 | 0 |
4 | | |
5 | | |
6 | | |
7 | | |
8 | | |
9 | 1 | |
10 | | |
11 | | |
12 | | |
13 | | |
14 | | |
15 | | |
16 | | |
17 | | |
18 | | |
19 | | |
20 | | |
21 | | 0 |
Table 2.
Parameters of PIO, PSO, GA, SSA, and LAEPIO algorithm.
Table 2.
Parameters of PIO, PSO, GA, SSA, and LAEPIO algorithm.
Algorithms | Parameters | Meanings | Values (Dimensionless) |
---|
PIO | | Maximum number of iterations of map and compass operators | 150 |
| Maximum number of iterations of landmark operator | 50 |
| Population size | 100 |
R | Map and compass constant | |
PSO | | Learning factor | |
| Maximum number of iterations | 200 |
| Population size | 100 |
| Mass factor | |
GA | | Mutation probability | |
| Crossover probability | |
| Maximum number of iterations | 200 |
| Population size | 100 |
SSA | | Safety threshold | |
| Seeker probability | |
| Follower probability | |
| Maximum number of iterations | 200 |
| Population size | 100 |
LAEPIO | | Maximum number of iterations of map and compass operators | 150 |
| Maximum number of iterations of landmark operator | 50 |
| Population size | 100 |
R | Map and compass constant | |
| Rate of variation | |
Table 3.
The statistical results of benchmark functions by the LAEPIO algorithm and other recent methods.
Table 3.
The statistical results of benchmark functions by the LAEPIO algorithm and other recent methods.
Function | Statistic | LAEPIO | PIO | PSO | GA | SSA |
---|
F1 | best | | | | | |
mean | | | | 1.55 | |
median | | | | 1.36 | |
worst | | | | 4.01 | |
std | | | | 1.17 | |
F2 | best | | 2.38 | | | |
mean | | 4.87 | | 2.59 | |
median | | 4.66 | | 2.76 | |
worst | | 9.53 | | 3.36 | |
std | | 2.39 | | | |
F3 | best | | 8.87 | | 7.61 | |
mean | | | | | |
median | | | | 2.04 | |
worst | 1.99 | | 1.21 | | 2.70 |
std | | | | | |
F4 | best | 0.00 | | | | |
mean | | | | | |
median | | | | | |
worst | | | | | |
std | | | | | |
F5 | best | | | | 7.56 | |
mean | | | 1.71 | | |
median | | | 1.65 | | |
worst | 7.67 | | 4.53 | | 3.96 |
std | 1.58 | | 1.43 | | 1.89 |
Table 4.
Initial state of UAVs in simulation.
Table 4.
Initial state of UAVs in simulation.
Condition | Side | Value of State |
---|
general | red side | |
blue side | |
balance | red side | |
blue side | |
advantage | red side | |
blue side | |
disadvantage | red side | |
blue side | |
Table 5.
Engagement statistics of air combat simulations.
Table 5.
Engagement statistics of air combat simulations.
Algorithm | Number of Victories | Number of Failures | Number of Draws | Average Score of LAEPIO | Average Score of Others |
---|
LAEPIO vs. PIO | 50 | 22 | 28 | 0.635 | 0.427 |
LAEPIO vs. PSO | 51 | 25 | 24 | 0.647 | 0.413 |
LAEPIO vs. SSA | 62 | 13 | 25 | 0.681 | 0.390 |
LAEPIO vs. GA | 58 | 14 | 28 | 0.712 | 0.491 |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).