Next Article in Journal
Research Status and Development Trend of Lower-Limb Squat-Assistant Wearable Devices
Previous Article in Journal
Numerical Investigation on the Aerodynamic Benefits of Corrugated Wing in Dragonfly-like Hovering Flapping Wing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bio-Inspired Swarm Confrontation Algorithm for Complex Hilly Terrains

1
School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, China
2
Key Laboratory of Autonomous Systems and Networked Control, Ministry of Education, Guangzhou 510641, China
3
Guangdong Engineering Technology Research Center of Unmanned Aerial Vehicle Systems, Guangzhou 510641, China
*
Author to whom correspondence should be addressed.
Biomimetics 2025, 10(5), 257; https://doi.org/10.3390/biomimetics10050257
Submission received: 20 February 2025 / Revised: 21 March 2025 / Accepted: 26 March 2025 / Published: 22 April 2025
(This article belongs to the Section Bioinspired Sensorics, Information Processing and Control)

Abstract

:
This paper explores a bio-inspired swarm confrontation algorithm specifically designed for complex hilly terrains in the context of electronic games. The novelty of the proposed algorithm lies in its utilization of biologically inspired strategies to facilitate adaptive and efficient decision-making in dynamic environments. Drawing from the collective hunting behaviors of various animal species, this paper distills two key confrontation strategies: focused fire for target selection and flanking encirclement for movement coordination and attack execution. These strategies are embedded into a decentralized swarm decision-making framework, enabling agents to exhibit enhanced responsiveness and coordination in complex gaming landscapes. To validate its effectiveness, extensive experiments were conducted, comparing the proposed approach against three established algorithms. The results demonstrate that this method achieves a confrontation win rate exceeding 80%, outperforming existing techniques in both engagement efficiency and survivability. Additionally, two novel performance indices, namely the average agent quantity loss rate and the average health loss rate, are introduced to provide a more comprehensive assessment of algorithmic effectiveness. Furthermore, the impact of key algorithmic parameters on performance indices is analyzed, offering insights into the adaptability and robustness of the proposed algorithm.

1. Introduction

With the advancement of technology, there is a growing demand for applications involving unmanned swarm cooperation and confrontational scenarios, not only in real-world robotics but also in virtual environments such as electronic games. Swarm confrontation represents a novel tactical paradigm that leverages the coordinated behavior of multiple unmanned aerial vehicles (UAVs) [1,2,3,4,5]. These algorithms have also found widespread application in gaming simulations, such as in StarCraft II [6], where complex agent coordination and strategic planning play a critical role in gameplay.
To enhance task execution efficiency and the success rate of agent swarms in complex and dynamic confrontational environments, a series of simulation methodologies for swarm confrontation strategies has been developed [7,8]. Evolutionary algorithms, such as particle swarm optimization and differential evolution, play a pivotal role in this research. These algorithms iteratively refine candidate solutions by emulating biological evolutionary mechanisms, including selection, crossover, and mutation, to converge toward optimal solutions. Multi-agent reinforcement learning (MARL), a subfield of reinforcement learning [9], focuses on developing strategies in environments where multiple agents coexist and interact. Each agent learns to perform optimal actions through collaboration or confrontation with other agents to achieve its objectives.
In recent years, swarm intelligence algorithms have seen substantial advancements [10] and have played a pivotal role in swarm confrontation. Reference [11] presented the mayfly algorithm, an optimization method inspired by mayflies’ behavior, combining swarm intelligence and evolutionary principles. Reference [12] presented a mathematical model that captures red fox behaviors such as foraging, hunting, population dynamics, and evading predators. By combining local and global optimization strategies with a reproduction mechanism, this model forms the basis of the red fox optimization algorithm. Reference [13] introduced the Flying Foxes Optimization (FFO) algorithm, drawing inspiration from the adaptive survival strategies of flying foxes in heatwaves. By incorporating fuzzy logic for dynamic parameter adjustment, FFO functions as a self-adaptive, parameter-free optimization technique. Reference [14] offered an innovative approach in swarm robotics, drawing inspiration from the foraging behavior of fish schools. By employing a bio-inspired neural network and a self-organizing map, the swarm replicates fish-like behaviors, including collision-free navigation and dynamic subgroup formation. Reference [15] explored the critical role of UAV swarms in the modern world, highlighting the urgent need for attack–defense-capable swarms. It introduced a bio-inspired decision-making method for UAV swarm confrontations using MARL, addressing the challenges of exponential training time as swarm size increases, drawing inspiration from natural group hunting behaviors.
This paper presents a bio-inspired confrontation algorithm aimed at improving success rates in swarm-based confrontations, particularly within the context of electronic games. Specifically, in a hilly environment, the undulating terrain obstructs the agents’ field of view, preventing them from fully acquiring real-time information about the opponent. Inspired by the hunting behaviors of various animal groups, such as lion prides and wild dog packs, two confrontation strategies are explored: the focused-fire strategy and flanking encirclement strategy. These strategies are integrated within hilly environments to develop a novel biomimetic swarm confrontation algorithm.
The contributions of this work are as follows:
  • In contrast to purely 2D or 3D confrontation environments [15,16,17,18,19,20,21], this is the first time that a semi-3D confrontation environment, i.e., hilly terrain, has been considered regarding the swarm confrontation problem, which brings many challenges. First, the ability of the agent to gather information about opponents is limited. Second, virtual projectiles or actions executed by agents may be blocked by the terrain. Furthermore, the terrain constrains the agents’ postures, adding even more complexity to decision-making.
  • Compared to agents that employ a particle model for movement [8,16,22,23,24], to suit the semi-3D confrontation environment, this paper adopts the unicycle model as a kinematic model of agents, which is more realistic yet complicated for confrontation scenarios. In addition, the rotating module responsible for targeting can freely spin on its supporting plane, while the elevation unit is capable of vertical adjustment. As a result, incorporating the additional degrees of freedom introduced by these rotational components leads to a more complex kinematic model compared to the standard unicycle model.
  • Drawing on the behavioral characteristics exhibited by prides of lions and packs of wild dogs during their hunts, this paper proposes key algorithms suited to swarm confrontations. Compared with algorithms based on reinforcement learning or target-based assignment [15,25,26], the proposed approach focuses on specific behaviors throughout the confrontation, enhancing its interpretability and practical applicability—particularly in simulation-based environments such as electronic games. In direct comparisons against the aforementioned algorithms, the proposed method achieves a win rate exceeding 80%.
  • For the evaluation of confrontation algorithms, in addition to traditional win rate assessment [24,25,27,28,29], two more performance indices are adopted, i.e., the agents’ quantity loss rate and the agents’ health loss rate. These two indices reflect the cost paid by the swarm confrontation algorithm to win from different perspectives, and the test results further highlight the superiority of the proposed bio-inspired swarm confrontation algorithm.

2. Related Work

2.1. Optimization Algorithms

In terms of evolutionary algorithms, reference [16] proposed an evolutionary algorithm (EA)-based attack strategy for swarm robots in denied environments, eliminating reliance on global positioning and communication. Each robot optimizes its movement using local sensing, evaluating threats and benefits through an EA-driven fitness function. With integrated collision avoidance, the swarm achieves effective collaboration and confrontation. Reference [30] introduced an evolutionary task allocation method for optimizing drone task distribution based on collaborative behavior, alongside a collaborative control method for UAVs to maintain formation during task execution. Reference [31] developed an optimized multi-UAV cooperative path planning approach for complex confrontation scenarios. A realistic threat model was developed, incorporating threat levels and fuel consumption constraints within a multi-objective optimization framework. Reference [32] proposed an evolutionary expert system tree for managing unexpected situations in aerial combat, while reference [33] introduced an enhanced particle swarm optimization algorithm that improves global search capabilities without adding computational complexity. Reference [34] examined strategic choices in a game-theoretic model of UAVs using a strategy evolution game, and reference [35] proposed an evolutionary optimization algorithm addressing the limitations of particle swarm optimization. Reference [36] expanded torch, a heterogeneous–homogeneous swarm coevolution method designed to enhance the evolutionary capabilities of swarm robots. Addressing the challenges of balancing evolutionary efficiency and strategy performance, torch employs a swarm coevolution mechanism to accelerate adaptation. A behavior expression tree is incorporated to expand the strategy search space, enabling more flexible and effective evolution. Reference [37] presented an improved differential evolution method based on Pareto optimal matching for multi-objective binary optimization problems. However, further optimization is required for complex environments with obstacles and multi-region challenges, and for integrating task allocation and collaborative control.

2.2. Multi-Agent Reinforcement Learning

MARL has seen significant advancements in recent years  [38,39]. Reference [40] proposed the hierarchical attention actor–critic (HAAC) algorithm to enhance decision-making in large-scale UAV swarm confrontations. By integrating a hierarchical actor policy with a centralized critic network based on the hierarchical two-stage attention network, HAAC captures UAV interactions and optimizes coordination. It effectively reduces state and action space complexity, improving scalability and outperforming existing methods in large-scale scenarios. Reference [41] proposed a one-vs-one within-visual-range air combat strategy generation algorithm based on a multi-agent deep deterministic policy gradient (MADDPG). The combat scenario is modeled as a two-player zero-sum Markov game, incorporating a target position prediction method to enhance decision-making. To bypass the constraints of basic fighter maneuvers, a continuous action space is adopted. Additionally, a potential-based reward shaping method improves learning efficiency. Reference [42] introduced a learning-based interception strategy for UAV territorial defense against invaders from various directions and speeds. The initial state’s impact on interception success was analyzed to define viable defense boundaries. Given the continuous action and state spaces, conventional decision methods face dimensionality issues. To address this, a fuzzy logic-enhanced actor–critic algorithm was proposed, effectively reducing computational complexity. To manage group situational complexity, reference [43] proposed a multi-agent transformer integrated with a virtual object network. Furthermore, reference [44] established two non-cooperative game models within the multi-agent deep reinforcement learning paradigm, successfully achieving Nash equilibrium in a five-on-five drone confrontation scenario. Reference [45] validated task allocation and decision-making in a simulation environment with mobile threats and targets. Reference [28] proposed a MARL approach that integrates macro actions and human expertise for UAV swarm decision-making. By modeling the swarm as a multi-agent system and using macro actions to address sparse rewards and large state-action spaces, the method enhances learning efficiency. Human-designed actions further optimize policies, enabling superior performance in complex confrontation scenarios. Lastly, reference [46] explored pursuit–evasion using deep reinforcement learning, where multiple homogeneous agents pursue an omnidirectional target under unicycle kinematics. A shared experience approach trains a policy for a fixed number of pursuers, executed independently at runtime.
Compared to the aforementioned algorithms, the proposed algorithm seamlessly integrates behaviors observed in animal confrontations into the confrontation process. It eliminates the need for model training and complex iterative computations while still delivering high performance.
Notation: R denotes the set of real numbers. For two vectors, a and b, a · b and a × b denote the inner product and cross product of a and b, respectively. For a nonzero vector a, M ( a ) is defined by
M ( a ) = a a .
For θ R , define the following rotation matrix R ( θ ) :
R ( θ ) = cos θ sin θ 0 sin θ cos θ 0 0 0 1 .

3. Problem Description

In this paper, we consider the swarm confrontation problem of two swarms of agents in hilly terrain. In particular, the two swarms of agents have equal quantities and abilities. This setting is especially relevant to electronic game simulations, where agents frequently engage in symmetric confrontational tasks within terrain-rich environments. In this section, descriptions of the hilly terrain and the agent model are given first, which are then followed by a description of the swarm confrontation problem.

3.1. Confrontation Environment

A representative example of the hilly terrain used in this study for electronic game simulations is illustrated in Figure 1. Let L 1 and L 2 denote the length and width of the map, respectively, and let H represent the maximum height of the terrain. Note that agents can only move along the surface of the hilly terrain, which brings three challenges that have never been faced before. First, the ability of the agent to gather information about opponents is limited, as the hills may block the agent’s field of view, as shown by Figure 1. Second, the shells fired by the agents may be blocked by the terrain. Third, the terrain constrains the agents’ posture, making it difficult to aim.

3.2. Agent Model

In this paper, agents are categorized into red and blue teams. Suppose each team consists of N agents. For i = 1 , , N , r i represents the ith agent on the red team, while b i represents the ith agent on the blue team. By default, the red team is equipped with the bio-inspired swarm confrontation algorithm, while the blue team is equipped with other existing swarm confrontation algorithms.

3.2.1. Kinematics

The kinematic equations for agent r i are given by
x r i ( t + 1 ) = x r i ( t ) + v cos ( θ r i ( t ) ) sin ( φ r i ( t ) ) Δ t y r i ( t + 1 ) = y r i ( t ) + v sin ( θ r i ( t ) ) sin ( φ r i ( t ) ) Δ t z r i ( t + 1 ) = z r i ( t ) + v cos ( φ r i ( t ) ) Δ t θ r i ( t + 1 ) = θ r i ( t ) + ω r i ( t ) Δ t ϑ r i ( t + 1 ) = ϑ r i ( t ) + Ω r i ( t ) Δ t σ r i ( t + 1 ) = σ r i ( t ) + Φ r i ( t ) Δ t
where p r i ( t ) = x r i ( t ) , y r i ( t ) , z r i ( t ) T R 3 represents the position of agent r i at time t; v R represents the linear speed; θ r i ( t ) R and ω r i ( t ) R represent the heading angle and angular velocity of the body, respectively; φ r i ( t ) R represents the pitch angle of the body, which is determined by the topography; ϑ r i ( t ) R and Ω r i ( t ) R denote the heading angle and angular velocity of the rotating module, respectively, while σ r i ( t ) R and Φ r i ( t ) R represent the pitch angle and angular velocity of the elevation unit, respectively; and Δ t denotes the sampling time. The various aforementioned angles are illustrated by Figure 2. Additionally, ω m , Ω m , and Φ m denote the maximum rotational speed for ω r i ( t ) , Ω r i ( t ) , and Φ r i ( t ) , respectively. In this paper, ω r i ( t ) , Ω r i ( t ) , Φ r i ( t ) are considered to be the control inputs of agent r i . These control inputs will be specified later by the proposed bio-inspired swarm confrontation algorithm. At the onset of the confrontation, the initial parameters are configured as follows: x r i ( 0 ) [ 0 , L 1 ] , y r i ( 0 ) [ 0 , L 2 ] , z r i ( 0 ) [ 0 , H ] , θ r i ( 0 ) = 0 , φ r i ( 0 ) = 0 , ϑ r i ( 0 ) = 0 , σ r i ( 0 ) = 0 . Similarly, we can define p b i ( t ) , θ b i ( t ) , ω b i ( t ) , ϑ b i ( t ) , Ω b i ( t ) , σ b i ( t ) , Φ b i ( t ) for agent b i , and the details are omitted. Note that the agents of the blue team have the same linear speed v and maximal rotational speeds ( ω m , Ω m , and Φ m ) as the agents of the red team.

3.2.2. Information Acquisition

During confrontation, an agent detects opponents by uniformly emitting rays, as shown by Figure 3. Define d b j r i ( t ) = p r i ( t ) p b j ( t ) . The set of opponents whose information can be obtained by agent r i at time t is defined as follows:
N r i ( t ) = { b j d b j r i ( t ) d m v and the ray from r i to b j is not obstructed by the terrain }
where d m v denotes the maximum detection range of the ray. Similarly, N b i ( t ) denotes the set of opponents whose information is accessible to agent b i at time t.
Note that on the one hand, the rays can only detect within the maximum detection range d m v , and on the other hand, the rays can be obstructed by hills. For agent r i , it can acquire the following information at time t.
  • The positions of all the surviving agents of the red team at time t.
  • The positions of all the surviving agents of the blue team belonging to the set N r i ( t ) .
The method of information acquisition for the agents of the blue team is the same.

3.2.3. Attack and Damage

Agents engage opponents by launching projectiles with an initial speed of v p , which subsequently follow a ballistic trajectory under the influence of gravity. The direction of the projectile is determined by the posture of the elevation unit. Each agent starts with an initial health point ( H P ), and once hit by a shell, the health point is reduced by h d . Let h r i ( t ) denote the health point of r i at time t. If h r i ( t ) 0 , agent r i is considered destroyed. Moreover, the minimum firing interval between two attacks is t c m , and the swarm confrontation algorithm will decide when to fire. Similarly, we can define h b i ( t ) for  b i .

3.3. Winning of the Confrontation

At the beginning of the confrontation, the red and blue teams are positioned at opposite corners of the map. Winning is declared for the side that annihilates all the agents of the opposing side within the time limit t m . If all the agents are destroyed within t m , or neither team wins within t m , it is called a tie.

3.4. Algorithm Performance Indices

To assess the performance of the algorithm, three algorithm performance indices are considered in this paper, namely, the win rate, average agent quantity loss rate, and average agent health loss rate, which are detailed as follows. Consider a series of M matches between the red and blue teams. For the red team, let M w r denote the number of matches won by the red team, and H s r represent the initial health points of all members of the red team. For k = 1 , , M w r , define n k r and h s k r as the number of agents lost by the red team and the total health points lost by the red team in the kth winning match, respectively. Then, the performance indices for the red team’s algorithm are established as follows:
  • Winning rate W r :
    W r = M w r M
  • Average agent quantity loss rate ξ r :
    ξ r = 1 M w r k = 1 M w r n k r N
  • Average agent health loss rate λ r :
    λ r = 1 M w r k = 1 M w r h s k r H s r
The parameters W b , ξ b , and λ b can be similarly defined for the blue team following the same approach.

4. Bio-Inspired Swarm Confrontation Algorithm Design

Based on biologically inspired algorithms, agents must primarily address two key issues during the swarm confrontation process: selecting attack targets and making decisions regarding movement during the confrontation. This chapter begins by analyzing animal group behavior, summarizing the corresponding confrontation algorithms, and then connecting these algorithms to real confrontation scenarios for implementation.

4.1. Bio-Inspired Rules

We employ the following analysis to address the problem of target selection during each agent’s confrontation process. As illustrated in Figure 4, a pack of wild dogs spots a group of wildebeests and swiftly closes in, attempting to scatter them. The wildebeests initially cluster together to confront the predators but soon become startled and begin to flee, with the wild dogs in pursuit. During the chase, a smaller, isolated individual emerges from the group, becoming the focus of the wild dogs’ attention. The pack then concentrates its efforts on launching an attack on the vulnerable wildebeest.
For the wild dogs, each individual is smaller in size and weaker in strength compared to a wildebeest. When the wildebeests cluster together, it becomes difficult for the wild dogs to inflict damage. Therefore, when an isolated individual appears within the wildebeest group, the wild dogs quickly shift their target, creating a situation where the many overpower the few, effectively completing the hunt. Drawing on the collective hunting behavior of a wild dog pack, agents in a hilly-terrain confrontation can switch attack targets based on the opponent’s position. If an opponent is far from its group, it becomes the priority target. This tactic results in a localized numerical advantage, allowing agents to eliminate the target quickly. We refer to this behavior as the focused-fire strategy.
Efficient confrontation algorithms must select targets judiciously and make real-time decisions during the confrontation, adjusting their movement direction based on the evolving situation. This section further analyzes the group attack behavior of animals. As shown in Figure 5, three lions seize the opportunity to attack a buffalo, approaching it in a triangular formation. The central lion confronts the buffalo head-on, while the lions on both sides maneuver to flank it, forming a pincer movement. After completing the encirclement, the lions launch their attack and complete the hunt.
If the lion pride were to attack head-on as a group, the buffalo, sensing danger, would likely counterattack or flee, which could result in casualties among the lions or allow the buffalo to escape. The pride increases its chances of a successful hunt by attacking from multiple directions. In an agent-based confrontation, if two or more agents target the same opponent, one agent can engage the opponent head-on while the others flank from the sides, efficiently neutralizing the target. We refer to this behavior as the flanking encirclement strategy.

4.2. Design of Swarm Confrontation Algorithm

After analyzing and adapting bio-inspired rules, these principles need to be applied to practical confrontation algorithms. The design of the confrontation algorithm is mainly divided into three parts: target selection, motion planning, and automatic aiming. Taking the red agent r i as an example, the following sections detail the design of these three components.

4.2.1. Target Selection

Inspired by the hunting behavior of wild dogs in nature, the target selection algorithm employs the focused-fire strategy. Define d r k r i ( t ) = p r i ( t ) p r k ( t ) . Let n b a r i ( t ) represent the number of surviving opponents detectable by r i , and let p c b r i ( t ) denote the central position of these opponents. Let I x r i ( t ) denote the label of the xth closest surviving opponent to r i , and let T r i ( t ) denote the label of the attack target chosen by r i . Let c t be a positive integer and d f be a positive real number. The target selection algorithm is described by Algorithm 1.
Algorithm 1 Target Selection Algorithm
1:
input: n b a r i ( t ) —the number of surviving opponents detectable by r i ,
2:
            p c b r i ( t ) —the central position of these detected opponents.
3:
output: T r i ( t ) —the label of the attack target chosen by r i .
4:
set  x = 1
5:
if  n b a r i ( t ) = 0   then
6:
     T r i ( t ) = n u l l
7:
else
8:
     C = true
9:
     set  o j r i = 0 , j { 1 , 2 , , N }
10:
     while  x n b a r i ( t )  and  C  do
11:
         l = I x r i ( t )
12:
         if  p c b r i ( t ) p b l ( t ) > d f  then
13:
            T r i ( t ) = l
14:
            C = false
15:
         else
16:
            set  c = 0 , k = 1
17:
            while  k N  do
18:
                if  r k is surviving and k i  then
19:
                   if  o k r i = 0  then
20:
                   if  d b l r i ( t ) > d b l r k ( t )  then
21:
                          c = c + 1
22:
                          o k r i = 1
23:
            if  c > c t  then
24:
               x = x + 1
25:
            else
26:
               T r i ( t ) = l
27:
               C = false
28:
        if  x > n b a r i ( t )  then
29:
            T r i ( t ) = I 1 r i ( t )
According to Algorithm 1, n b a r i ( t ) and p c b r i ( t ) serve as input parameters, while T r i ( t ) functions as the output parameter. The target selection algorithm follows a multi-level decision-making process. First, after obtaining I x r i ( t ) , r i evaluates the spatial distribution of its opponents. If the distance between b l and the center of visible opponents within r i ’s range exceeds d f , b l is considered to have deviated from its team formation, and r i prioritizes attacking b l . Second, as indicated in steps 10 to 27 of Algorithm 1, these steps involve an iterative computation process, with c t playing a crucial role in the iteration. If b l is positioned closer to its own team, r i determines its relative ranking within the team based on proximity to b l . If r i ranks beyond c t , it must recalculate I x r i ( t ) and repeat this process iteratively until its ranking falls within c t . This design helps prevent the excessive concentration of attack targets among red agents, thereby reducing resource overflow and minimizing wastage. Finally, if no opponent within r i ’s field of view meets the above conditions, r i selects the nearest opponent as its attack target, denoted as I 1 r i ( t ) . As described above, the algorithm not only prevents an excessive number of agents from attacking the same target, thereby avoiding unnecessary concentration of launched projectiles, but also creates a local numerical advantage. This demonstrates the focused-fire strategy proposed in this paper, and a flowchart of the algorithm is shown in Figure 6.

4.2.2. Motion Planning

Incorporating the competitive behavior of biological swarms into an agent’s confrontation process primarily involves planning its trajectory. Given that the field is undulating and there are no complex obstacles, we implement the agent’s path planning using the artificial potential field method. Considering that the agent also needs to avoid obstacles presented by teammates in the environment, the agent’s direction of movement can be decomposed into the sum of two vectors.
(1) Consider the motion planning of r i in the absence of obstacles. When T r i ( t ) = null , r i selects the nearest hilltop, denoted as p m r i ( t ) , as its movement target to facilitate opponent searching. Conversely, when T r i ( t ) null , r i selects p b T r i ( t ) as its movement target. Here, p b T r i ( t ) denotes the position of the opponent labeled T r i , which has been assigned according to Algorithm 1. The movement direction toward the target is defined as follows:
G o r i ( t ) = p m r i ( t ) p r i ( t ) T r i ( t ) = n u l l p b T r i ( t ) p r i ( t ) T r i ( t ) n u l l
In a hunt, a pride of lions typically attacks prey from multiple directions. The lions at the front often feint to distract the prey while the lions on the flanks wait for an opportunity to strike. Inspired by this behavior, agents can employ a flanking encirclement strategy during confrontations by setting different movement directions.
The following section introduces the method for determining the relative position of r i within the team. Let ρ r i ( t ) represents the relative position of r i within the friendly team that shares the same opponent. When ρ r i ( t ) = 0 , r i is in the middle; when ρ r i ( t ) = 1 , r i is on the left side; and when ρ r i ( t ) = 1 , r i is on the right side. The method for obtaining ρ r i ( t ) is presented as follows:
l r i = M ( p b T r i ( t ) p c r i ( t ) )
d r i ( t ) = ( p r i ( t ) p c r i ( t ) ) · l r i × l z l r i × l z
ρ r i ( t ) = 1 d r i ( t ) > ϵ 1 , 0 ϵ 1 d r i ( t ) ϵ 1 , 1 d r i ( t ) < ϵ 1 .
where p c r i ( t ) represents the position of the agent closest to p b T r i ( t ) among the group of agents sharing the same attack target. Meanwhile, d r i ( t ) denotes the projected offset of r i within the team, and ϵ 1 is the reference value used to determine the position interval. l z denotes the unit direction vector along the z-axis. The actual movement direction G r i ( t ) of r i in an obstacle-free environment is obtained by multiplying ρ r i ( t ) by the rotation angle θ s and applying the resulting rotation matrix to G o r i ( t ) . In the case where T r i ( t ) = null , G r i ( t ) is directly equivalent to G o r i ( t ) .
(2) Calculate the vector X k r i ( t ) between teammates p r k ( t ) and p r i ( t ) within the obstacle avoidance range d a . Since closer teammates require stronger obstacle avoidance force, the resulting vector should be larger. Therefore, it is necessary to normalize the vector and apply weighting. This algorithm selects 1 / d r k r i ( t ) as the weight for each vector, and finally, the sum of all vectors, denoted as X r i ( t ) , is obtained:
X k r i ( t ) = p r i ( t ) p r k ( t ) ( d r k r i ( t ) ) 2 if 0 < d r k r i ( t ) d a and r k is alive , 0 otherwise .
X r i ( t ) = k = 1 N X k r i ( t )
Since the influence of each vector on the agent’s movement is different, each vector needs to be normalized and weighted to obtain the final direction of movement F r i ( t ) :
F r i ( t ) = k 1 M ( G r i ( t ) ) + k 2 M ( X r i ( t ) )
where k 1 and k 2 denote the weight coefficients assigned to each vector.
Let t c r i represent the time elapsed since r i fired its last shell. d b c 1 represents the maximum distance threshold for r i to execute a retreating flanking encirclement strategy, while d b c 2 represents the minimum distance threshold for r i to execute a flanking maneuver during an advance, as well as the minimum retreat distance for flanking when t c r i < t c m . d a represents the distance for avoiding teammates. θ F r i ( t ) represents the heading angle of F r i ( t ) . e θ 1 r i ( t ) and e θ 2 r i ( t ) denote the deviation angles between the current movement direction and the final target direction in the clockwise and counterclockwise directions, respectively. The detailed implementation is presented in Algorithm 2.
Algorithm 2 Motion Planning Algorithm
1:
input  G o r i ( t )
2:
output  ω r i ( t )
3:
if  T r i ( t ) = n u l l   then
4:
     G r i ( t ) = G o r i ( t )
5:
else
6:
     calculate the  ρ r i ( t )  by Equations (9)–(11)
7:
     if  ( d b T r i ( t ) < d b c 1 ) or ( d b c 1 < d b T r i ( t ) d b c 2 and t c r i < t c m )  then
8:
        G r i ( t ) = R θ s ρ r i ( t ) G o r i
9:
     else
10:
        G r i ( t ) = R θ s ρ r i ( t ) G o r i
11:
calculate the  X r i ( t )  by Equations (12) and (13)
12:
calculate the  F r i ( t )  by Equation (14)
13:
e θ 1 r i ( t ) = ( θ r i ( t ) θ F r i ( t ) ) mod ( 2 π ) , e θ 2 r i ( t ) = ( θ F r i ( t ) θ r i ( t ) ) mod ( 2 π )
14:
ω r i ( t ) = sign ( e θ 2 r i ( t ) e θ 1 r i ( t ) ) ω m
According to Algorithm 2, when r i detects opponents, it first computes G o r i ( t ) and then determines its relative position ρ r i ( t ) among teammates that share the same attack target. Based on ρ r i ( t ) , r i adjusts the direction of G o r i ( t ) . If r i is positioned on the right side of the formation, G o r i ( t ) is rotated clockwise by θ s degrees; if it is on the left side, the rotation is counterclockwise by θ s degrees. If r i is centrally positioned within the formation, its movement direction remains unchanged. In scenarios where only two red agents share the same attack target, it suffices to determine the relative position of the agent positioned farther from the target and assign it the appropriate movement direction. When r i is within a distance of d b c 1 from the attack target, or if its firing cooldown is active while being within d b c 2 , its movement direction is set to retreat. Based on the previous steps, agents can be assigned to either direct confrontation or flanking maneuvers, enabling them to attack opponents from multiple angles. This approach is referred to as the flanking encirclement strategy. The critical steps of this strategy are outlined in steps 6 to 10 of Algorithm 2. As a result, G r i ( t ) is determined. Then, incorporating the obstacle avoidance vector X r i ( t ) yields the final movement direction F r i ( t ) . The corresponding flowchart of the algorithm is shown in Figure 7.

4.2.3. Automatic Aiming Algorithm

In the following, taking r i as an example, the motion process of the rotating module and the elevation unit after determining the attack target T r i ( t ) is introduced. Upon identifying T r i ( t ) , r i adjusts ϑ r i ( t ) and σ r i ( t ) , based on the relative angle between the target and its position, thereby achieving target aiming. When r i calculates the vector u o r i ( t ) from itself to the opponent, it then computes the angle θ t u r r i ( t ) between u o r i ( t ) and the rotating module’s direction vector u t u r r i ( t ) in the X O Y plane, rotating the rotating module left or right to make θ t u r r i ( t ) approach 0. Additionally, r i determines the angle θ b a r r i ( t ) between u o r i ( t ) and the unit direction vector of the elevation unit u b a r r i ( t ) , simultaneously rotating the elevation unit up or down to make θ b a r r i ( t ) approach 0. ϵ 2 denotes the deviation range between the target angle and the actual angle. f a r i ( t ) serves as a flag indicating whether r i is actively aiming at an opponent. The specific implementation process is shown in Algorithm 3.
Algorithm 3 Automatic Aiming Algorithm
1:
input  T r i ( t ) , u t u r r i ( t ) , u b a r r i ( t )
2:
output  Ω r i ( t ) , Φ r i ( t ) , f a r i ( t )
3:
if  T r i ( t ) n u l l   then
4:
     u o r i ( t ) = M ( T r i ( t ) p r i ( t ) )
5:
     u o 1 r i ( t ) = ( u o x r i , u o y r i , 0 ) , u t u r 1 r i ( t ) = ( u t u r x r i , u t u r y r i , 0 )
6:
     θ t u r r i ( t ) = arccos u o 1 r i ( t ) · u t u r 1 r i ( t ) u o 1 r i ( t ) u t u r 1 r i ( t ) sign l z · ( θ t u r r i ( t ) × u o 1 r i ( t ) )
7:
     Ω r i ( t ) = θ t u r r i ( t ) Ω m π
8:
     if  θ t u r r i ( t ) < ϵ 2  then
9:
          θ b a r r i ( t ) = arccos u o r i ( t ) · u b a r r i ( t ) u o r i ( t ) u b a r r i ( t ) ) sign ( u o y ( t ) u b a r y r i ( t ) )
10:
        Φ r i ( t ) = θ b a r r i ( t ) Φ m π
11:
        if  θ t u r r i ( t ) < ϵ 2  and  θ b a r r i ( t ) < ϵ 2  then
12:
            f a r i ( t ) = 1
13:
        else
14:
            f a r i ( t ) = 0
15:
     else
16:
        f a r i ( t ) = 0

4.2.4. Bio-Inspired Swarm Confrontation Algorithm

At the beginning of the confrontation, each agent determines its attack target using Algorithm 1. Then, it calculates its actual movement direction using Algorithm 2. Finally, Algorithm 3 is executed to precisely align with the target. During movement, the agent continuously assesses whether the conditions for firing are met and proceeds with the attack when appropriate. If all opponents are eliminated, the confrontation ends. Otherwise, Algorithms 1–3 are re-executed to recalculate the strategy.
By integrating the algorithm designs discussed above, the final pseudo-code and a flowchart of the bio-inspired swarm confrontation algorithm are established, and are presented in Algorithm 4 and Figure 8. n r a ( t ) and n b a ( t ) represent the total number of surviving agents for the red and blue teams, respectively. Additionally, the entire process is carried out sequentially within time step t.
Algorithm 4 Bio-inspired Swarm Confrontation Algorithm
1:
for step t do
2:
     execute Algorithm 1
3:
     execute Algorithm 2
4:
     execute Algorithm 3
5:
     if  t c r i > t c m  and  f a r i ( t ) = 1  then
6:
        agent starts firing
7:
        t c r i = 0
8:
     else
9:
        t c r i = t c r i + 1
10:
     if  t > t m  or  n r a ( t ) = 0  or  n b a ( t ) = 0  then
11:
        algorithm terminates
12:
     else
13:
        t = t + 1

4.3. Algorithm Complexity Analysis

The bio-inspired confrontation algorithm presented in this paper consists primarily of three components: target selection, motion planning, and automatic aiming. The computational complexity of the automatic aiming algorithm is O ( 1 ) , while the complexities of the other components are as follows:
(1) Target selection: Calculating the closest opponent to the agent has a complexity of O ( N ) . Recalculating the opponent based on local principles has a complexity of O ( m N ) , where m represents the number of recalculations required, m [ 1 , N ] . Calculating the centroid of opponents within the agent’s field of view has a complexity of O ( N ) .
(2) Motion planning: Determining the agent’s position relative to the same opponent group has a complexity of O ( N ) . Calculating the combined vector for teammate obstacle avoidance also has a complexity of O ( N ) . Similarly, calculating the combined vector for opponent obstacle avoidance is O ( N ) .
The overall algorithmic complexity is O ( N ) (best case) to O ( N 2 ) (worst case).

5. Result Analysis

Comparative algorithms within the current environment must be introduced and adapted to evaluate the effectiveness of the swarm confrontation algorithm proposed in this paper. The comparative algorithms selected are the MARL Based on Biomimetic Action Space algorithm [15], the Consensus-Based Auction (CBA) algorithm [25], and the Assign Nearest (AN) algorithm [26].

5.1. Results Analysis for a Single Match

To more intuitively demonstrate the bio-inspired algorithm of agents during the confrontation process, this paper uses the AN algorithm as the opponent and selects a 10V10 confrontation scale for a detailed analysis of the confrontation process. The sequence of events is depicted in Figure 9.
In Figure 9d, blue agent b 8 becomes separated from the rest of its team during the confrontation, prompting red agents r 4 , r 7 , and r 8 to prioritize launching coordinated attacks on b 8 . This process exemplifies the focused-fire strategy employed in the bio-inspired approach. Similarly, in Figure 9e, blue agent b 7 is also isolated, leading red agents r 5 , r 9 , and r 10 to direct their attacks toward it in accordance with the same focused-attack strategy.
In Figure 9a,b, without knowledge of the opponent’s positions, the red team disperses its formation in preparation for launching attacks from multiple directions. In Figure 9c–f, the red agents in different positions exhibit varying retreat directions, forming both a frontal containment and flanking maneuvers. Additionally, the red agents actively move to flank the opponent, as seen with agents r 5 and r 10 in Figure 9c,d, and agents r 4 and r 8 in Figure 9d,e. These coordinated attacks from different directions demonstrate the flanking encirclement strategy.

5.2. Analysis of Results Under Different Scenarios

The confrontation scenarios in this paper are constructed using the Unity platform, a widely adopted development tool in the electronic gaming industry. A total of 100 simulation tests are conducted against three opponents under varying algorithm parameters, confrontation scales, and map configurations to comprehensively evaluate the performance of the proposed algorithm. Before the confrontation begins, the environmental parameters are initialized with values of v p = 1400 m / s , t m = 200 s , t c m = 2 s , d m v = 1200 m , H P = 3 , h d = 1 , v = 20 m / s , w m = 25 / s , Ω m = 20 / s , Φ m = 15 / s , d f = 100 m , c t = 3 , θ s = π / 4 , d a = 30 m , ϵ 1 = 30 m , ϵ 2 = 0 . 01 , k 1 = 1 , and k 2 = 0.5 . After each confrontation, the win rates and indices are recorded and analyzed to assess the impact of different parameter values on these outcomes.

5.2.1. Analysis of Results Under Different Algorithm Parameters

The algorithm in this study includes two critical parameters, d b c 1 and d b c 2 . Here, d b c 1 represents the minimum distance for maneuvering and containment; if the distance between an agent and an opponent is less than d b c 1 , the agent will immediately maneuver backward. d b c 2 represents the minimum distance for flanking during advancement and the maximum trigger distance for retreating and flanking maneuvers. When the distance between an agent and an opponent exceeds d b c 2 , agents on both sides will implement a flanking encirclement strategy. If the agent is in a firing cooldown state and the distance is less than d b c 2 , it will maneuver backward based on its position.
First, when both teams are in close proximity, agents may become overly clustered, leading to teammates obstructing the line of sight to opponents and diminishing the effectiveness of localized focused fire. This issue is further exacerbated when destroyed agents remain stationary at their last positions, increasing occlusion and reducing overall combat efficiency. To mitigate this, a minimum retreat distance threshold d b c 1 is introduced to ensure adequate spacing between agents and opponents, thereby facilitating the execution of confrontation strategies. Second, since projectiles require a cooldown period after each attack, agents are temporarily unable to inflict damage on opponents. To enhance agent safety, a retreat trigger is activated based on the distance threshold d b c 2 , ensuring that agents maintain a safe distance from opponents while their attack systems are in cooldown. In summary, these two parameters play a crucial role in the proposed confrontation algorithm, balancing attack efficiency and spatial positioning to optimize engagement outcomes.
When d b c 2 = 500 m , d b c 1 is varied from 100 m to 500 m in increments of 100 m . Conversely, when d b c 1 = 100 m , d b c 2 is varied from 100 m to 800 m in increments of 100 m . The confrontation win rates and indices for varying d b c 1 are shown in Figure 10a–c. The win rates and indices for varying d b c 2 are shown in Figure 10d–f.
First, we discuss the parameter d b c 1 . As shown in the figure, the algorithm’s win rate consistently exceeds 90%, indicating that variations in the value of d b c 1 have little effect on the algorithm’s win rate. However, indices ξ r and λ r increase with the growth of d b c 1 , which indicates a decline in the algorithm’s performance. The agent’s backward maneuvering behavior is closely related to d b c 1 . When d b c 1 = 500 m , the agent retreats when it is still far from the opponent, leading to a more dispersed formation. Even if the agent is in a favorable attack position, it cannot quickly regroup to eliminate the opponent and may become isolated, resulting in concentrated enemy fire.
Next, we analyze d b c 2 . The win rate graph shows that d b c 2 significantly impacts the algorithm’s performance. This is primarily due to the fast shell speed of the agents—when d b c 2 is set to a smaller value, agents are more likely to engage in direct encounters, with a high probability of being hit. A smaller d b c 2 also leads to prolonged exposure in the line of sight, limiting the ability to fully leverage terrain for tactical maneuvering. As a result, the attack pattern often degenerates into direct firefights. However, as d b c 2 increases, the win rate improves. For example, against AN, the win rate W r increases from 0.60 at d b c 2 = 100 m to 0.96 at d b c 2 = 500 m . Both ξ r and λ r initially decrease as d b c 2 rises within the range [100, 600] m, but increase again when d b c 2 > 600 m . For instance, when facing AN, as d b c 2 increases from 100 m to 600 m, ξ r decreases from 0.56 to 0.45, while λ r drops from 0.70 to 0.61. When an agent is positioned closer to an opponent, the duration of direct engagement increases, reducing the agent’s maneuverability and making it more susceptible to concentrated opponent attacks. An agent will initiate a retreat only when its distance to the opponent falls below d b c 2 and its attack system is in a cooldown state. When d b c 2 is relatively large, the retreat trigger zone falls within the interval [ d b c 1 , d b c 2 ] , which may cause the team to become overly dispersed, thereby weakening the effectiveness of the flanking encirclement strategy. Although retreat-oriented behavior can help maintain a high win rate, agents become more likely to be targeted and defeated through focused opponent attacks, ultimately degrading the algorithm’s overall performance.

5.2.2. Analysis of Results Under Different Confrontation Scales

The results under different confrontation scales are shown in Figure 11. From the confrontations at different scales, it is evident that larger confrontation scales lead to higher win rates for the algorithm, a trend especially pronounced when the opponent is AN. In 5v5 scenarios, the total health of the team is relatively lower compared to larger scales, and fewer agents are involved in flanking and localized focused-fire maneuvers. As a result, even when a flanking formation is established, if an agent on one side encounters the opponent head-on and is in a disadvantageous position, it may be quickly eliminated, causing the flanking encirclement strategy to collapse. Consequently, the win rate in such cases is only 0.81. However, as the scale increases, the bio-inspired strategy allows for a more complete formation. The increase in the number of agents on each side improves the margin for error, provides more firing points, and enables the agents to eliminate targets more quickly. On a scale of 20v20, the win rate consistently exceeds 95%.
The indices of the algorithm also vary with the confrontation scale. When facing AN and CBAA, the algorithm’s indices improve as the confrontation scale increases. Both algorithms are based on target selection, making the flanking encirclement strategy proposed in this paper highly effective. An increased confrontation scale leads to a greater number of attack positions and dilutes the opponent’s offensive intensity, thereby accelerating the elimination of opponents and mitigating team losses. From 5v5 to 20v20, both ξ r and λ r drop by over 10. However, when facing RL, the results for ξ r and λ r increase by more than 30% from 5v5 to 20v20. This is because the RL algorithm defaults to targeting the nearest opponent, and once an attack target is locked, agents using RL tend to charge aggressively. Suppose agents equipped with the BIO algorithm fail to form a proper formation in time. This results in clustering, increasing agent and HP losses, thereby reducing the algorithm’s overall performance.

5.2.3. Result Analysis on Different Maps

In addition to the current confrontation map, we conducted tests on another map. Compared to the previous map, this one has a gentler slope, and the specific terrain is shown in Figure 12. Furthermore, an additional comparative algorithm, the Evolutionary Algorithm-Based Attack (EABA) Strategy [16], was introduced in the other map. The confrontation scale was 10v10, with d b c 1 = 100 m and d b c 2 = 500 m . The confrontation results are shown in Figure 13.
From the results, it can be observed that the win rate of the algorithm in this paper remains above 90%. When facing AN and CBAA opponents, both ξ r and λ r show slight increases. For example, in the confrontation against AN, ξ r increases from 0.46 to 0.61, and λ r rises from 0.61 to 0.73. Due to the flatter terrain, the probability of shells being obstructed by the ground during flight is lower, which increases the likelihood that red team agents may be hit by opponent shells while spreading out to form a flanking formation, resulting in higher losses for their team. Conversely, when facing RL, both ξ r and λ r show a slight decrease, which can be attributed to the RL model’s weaker adaptability to the new map, leading to lower confrontation performance. When confronting the EABA algorithm, the proposed approach yields a lower w r , while both performance indices, ξ r and λ r , show noticeable increases. This phenomenon primarily results from the flatter terrain, which improves the likelihood of acquiring opponent position information. With enhanced visibility, the EABA algorithm can better exploit its fitness function through iterative optimization, thereby strengthening its confrontation capabilities and negatively impacting the performance of the proposed algorithm. In summary, this paper’s algorithm maintains a high win rate on the new confrontation map and achieves better ξ r and λ r results compared to its opponents, demonstrating the algorithm’s advantages in different environments.

6. Conclusions

From the perspective of electronic game scenarios, this paper explores a swarm confrontation algorithm designed for complex hilly terrains. A highly dynamic hilly confrontation environment is constructed, where intelligent agent swarms from both a red and a blue team possess equal numbers and identical capabilities, with each agent’s movements constrained by kinematic limitations. Drawing inspiration from the hunting confrontation behaviors of wild dog packs and lion prides in nature, two key strategies are proposed: a focused-fire strategy for target selection and a flanking encirclement strategy for motion planning. The former improves local performance by aggregating agent behaviors toward a shared objective, while the latter improves overall confrontation efficiency through coordinated movement and poditioning. To comprehensively evaluate the algorithm’s performance, the proposed approach is benchmarked against three existing confrontation algorithms. A total of 100 confrontation tests are conducted across different algorithm parameters, confrontation scales, and environmental conditions. The experimental results demonstrate that the proposed algorithm achieves a confrontation win rate exceeding 80% against baseline algorithms while maintaining lower average agent loss rates and a reduced average agent health loss rate. In conclusion, this biologically inspired confrontation algorithm not only offers a straightforward and practical solution but also exhibits superior performance in swarm-based confrontations.
For future work, we suggest an in-depth exploration of opponent searching in environments with denied information to enhance the algorithm’s confrontation capabilities under limited visibility. Additionally, examining the impact of communication constraints, such as delays and packet loss, on swarm coordination and overall performance will be essential. Developing robust algorithms to mitigate these challenges will be a key focus moving forward.

Author Contributions

Conceptualization, H.G. and H.C.; methodology, F.M. and H.C.; software, F.M. and W.X.; validation, F.M.; investigation, H.G., F.M. and H.C.; writing—original draft preparation, F.M. and H.C.; visualization, F.M. and R.N.; supervision, H.C.; funding acquisition, H.G. and H.C. All authors have read and agreed to the published version of the manuscript.

Funding

Funding for this work was provided in part by the National Natural Science Foundation of China under grant numbers 62276104 and U22A2062, and in part by Fundamental Research Funds for the Central Universities.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data of this paper are available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

NThe total number of agents in each team
r i The ith agent of the red team
b i The ith agent of the blue team
p r i ( t ) The position of agent r i
vThe linear speed
ω r i ( t ) The angular velocity of agent r i
θ r i ( t ) The heading angle of agent r i
φ r i ( t ) The pitch angle of agent r i
ϑ r i ( t ) The heading angle of agent r i ’s turret
Ω r i ( t ) The rotation speed of agent r i ’s turret
σ r i ( t ) The heading angle of agent r i ’s barrel
Φ r i ( t ) The rotation speed of agent r i ’s barrel
Δ t The sampling time
d m v The maximum detection range of the agents
N r i ( t ) The set of opponents whose information can be obtained by agent r i at time t
ω m The maximum speed of ω r i ( t )
Ω m The maximum speed of Ω r i ( t )
Φ m The maximum speed of Φ r i ( t )
v p The initial speed of the shells
h d The damage inflicted on an agent upon being hit by a single shell
h r i ( t ) The health point of r i at time t
t m The maximum execution time per confrontation
MThe total number of matches played between the red and blue teams
M w r The total number of matches won by the red team
H s r The initial health points of all members of the red team
n k r The number of agents lost by the red team in the kth winning match
h s k r The total health points lost by the red team in the kth winning match
W r The winning rate of r i
ξ r The average agent quantity loss rate of the red team
λ r The average agent health loss rate of the red team
n b a r i ( t ) The number of surviving opponents detectable by r i
n b a ( t ) The number of surviving agents on the blue team
p c b r i ( t ) The central position of these opponents
I x r i ( t ) The label of the xth closest surviving opponent to r i
T r i ( t ) The final attack target selected by r i
p b l r i ( t ) The position of r i ’s current attack target
p m r i ( t ) The position of the nearest hilltop to r i
G r i ( t ) The movement direction of r i without obstacle avoidance
p b T r i ( t ) The position of the opponent labeled T r i
X k r i ( t ) The obstacle avoidance vector induced by teammate r k on r i
X r i ( t ) The sum of obstacle avoidance vectors exerted by all teammates on r i
F r i ( t ) The final desired movement direction of r i
ρ r i ( t ) The relative position of r i within the friendly team that shares the same opponent
p c r i ( t ) The position of the agent closest to p b T r i ( t ) among the group of agents sharing the same attack target
l r i The unit direction vector from p c r i ( t ) to p b o r i ( t )
l z The unit direction vector along the z-axis
d i r ( t ) The projected offset within a team sharing the same attack target
ϵ 1 The reference value used to determine the position interval
d r k r i ( t ) The distance between r i and r k
d b j r i ( t ) The distance between r i and b j
t c r i The time elapsed since r i fired its last shell
t c m The minimum firing interval between two attacks
d b c 1 The maximum distance threshold for r i to execute a retreating flanking encirclement strategy
d b c 2 The minimum distance threshold for r i to execute a flanking maneuver during an advance, as well as the minimum retreat distance for flanking when t c r i < t c m
d a The distance for avoiding teammates
θ F r i ( t ) The heading angle of F r i ( t )
e θ 1 r i ( t ) The clockwise angle between the current movement direction and the final target direction
e θ 2 r i ( t ) The counterclockwise angle between the current movement direction and the final target direction
u t u r r i ( t ) The unit direction vector of r i ’s turret
u b a r r i ( t ) The unit direction vector of r i ’s barrel
f a r i ( t ) The flag indicating whether r i is actively aiming at an opponent
u o r i ( t ) The unit vector from r i to T r i ( t )
u o 1 r i ( t ) The projection of u o r i ( t ) onto the X O Y plane
u t u r 1 r i ( t ) The projection of u t u r r i ( t ) onto the X O Y plane
θ t u r r i ( t ) The angle formed between u o 1 r i ( t ) and u t u r 1 r i ( t )
θ b a r r i ( t ) The angle formed between u o r i ( t ) and u b a r r i ( t )
ϵ 2 The deviation range between the target angle and the actual angle

References

  1. Ayamga, M.; Akaba, S.; Nyaaba, A.A. Multifaceted applicability of drones: A review. Technol. Forecast. Soc. Change 2021, 167, 120677. [Google Scholar] [CrossRef]
  2. Day, M. Multi-Agent Task Negotiation Among UAVs to Defend Against Swarm Attacks. Ph.D. Thesis, Naval Postgraduate School, Monterey, CA, USA, 2012. [Google Scholar]
  3. Kong, L.; Liu, Z.; Pang, L.; Zhang, K. Research on UAV Swarm Operations. In Man-Machine-Environment System Engineering; Long, S., Dhillon, B.S., Eds.; Springer: Singapore, 2023; pp. 533–538. [Google Scholar]
  4. Niu, W.; Huang, J.; Miu, L. Research on the concept and key technologies of unmanned aerial vehicle swarm concerning naval attack. Command. Control. Simul. 2018, 40, 20–27. [Google Scholar]
  5. Xiaoning, Z. Analysis of military application of UAV swarm technology. In Proceedings of the IEEE 2020 3rd International Conference on Unmanned Systems (ICUS), Harbin, China, 27–28 November 2020; pp. 1200–1204. [Google Scholar]
  6. Vinyals, O.; Babuschkin, I.; Czarnecki, W.M.; Mathieu, M.; Dudzik, A.; Chung, J.; Choi, D.H.; Powell, R.; Ewalds, T.; Georgiev, P.; et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 2019, 575, 350–354. [Google Scholar] [CrossRef] [PubMed]
  7. Xia, W.; Zhou, Z.; Jiang, W.; Zhang, Y. Dynamic UAV Swarm Confrontation: An Imitation Based on Mobile Adaptive Networks. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 7183–7202. [Google Scholar] [CrossRef]
  8. Zhang, L.; Yu, X.; Zhang, S. Research on Collaborative and Confrontation of UAV Swarms Based on SAC-OD Rules. In Proceedings of the 4th International Conference on Information Management and Management Science. Association for Computing Machinery, Chengdu, China, 27–29 August 2021; pp. 273–278. [Google Scholar]
  9. Raslan, H.; Schwartz, H.; Givigi, S.; Yang, P. A Learning Invader for the “Guarding a Territory” Game. J. Intelligent Robot. Syst. 2016, 83, 55–70. [Google Scholar] [CrossRef]
  10. Yang, X.S.; Deb, S.; Zhao, Y.X.; Fong, S.; He, X. Swarm intelligence: Past, present and future. Soft Comput. 2018, 22, 5923–5933. [Google Scholar] [CrossRef]
  11. Zervoudakis, K.; Tsafarakis, S. A mayfly optimization algorithm. Comput. Ind. Eng. 2020, 145, 106559. [Google Scholar] [CrossRef]
  12. Połap, D.; Woźniak, M. Red fox optimization algorithm. Expert Syst. Appl. 2021, 166, 114107. [Google Scholar] [CrossRef]
  13. Zervoudakis, K.; Tsafarakis, S. A global optimizer inspired from the survival strategies of flying foxes. Eng. Comput. 2023, 39, 1583–1616. [Google Scholar] [CrossRef]
  14. Li, J.; Yang, S.X. Intelligent Fish-Inspired Foraging of Swarm Robots with Sub-Group Behaviors Based on Neurodynamic Models. Biomimetics 2024, 9, 16. [Google Scholar] [CrossRef]
  15. Chi, P.; Wei, J.; Wu, K.; Di, B.; Wang, Y. A Bio-Inspired Decision-Making Method of UAV Swarm for Attack-Defense Confrontation via Multi-Agent Reinforcement Learning. Biomimetics 2023, 8, 222. [Google Scholar] [CrossRef]
  16. Liu, H.; Zhang, J.; Zu, P.; Zhou, M. Evolutionary Algorithm-Based Attack Strategy With Swarm Robots in Denied Environments. IEEE Trans. Evol. Comput. 2023, 27, 1562–1574. [Google Scholar] [CrossRef]
  17. Wang, Y.; Bai, P.; Liang, X.; Wang, W.; Zhang, J.; Fu, Q. Reconnaissance Mission Conducted by UAV Swarms Based on Distributed PSO Path Planning Algorithms. IEEE Access 2019, 7, 105086–105099. [Google Scholar] [CrossRef]
  18. Yu, Y.; Liu, J.; Wei, C. Hawk and pigeon’s intelligence for UAV swarm dynamic combat game via competitive learning pigeon-inspired optimization. Sci. China Technol. Sci. 2022, 65, 1072–1086. [Google Scholar] [CrossRef]
  19. Zhang, T.; Chai, L.; Wang, S.; Jin, J.; Liu, X.; Song, A.; Lan, Y. Improving Autonomous Behavior Strategy Learning in an Unmanned Swarm System Through Knowledge Enhancement. IEEE Trans. Reliab. 2022, 71, 763–774. [Google Scholar] [CrossRef]
  20. Xiang, L.; Xie, T. Research on UAV Swarm Confrontation Task Based on MADDPG Algorithm. In Proceedings of the 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), Harbin, China, 25–27 December 2020; pp. 1513–1518. [Google Scholar] [CrossRef]
  21. Finand, B.; Loeuille, N.; Bocquet, C.; Fédérici, P.; Monnin, T. Solitary foundation or colony fission in ants: An intraspecific study shows that worker presence and number increase colony foundation success. Oecologia 2024, 204, 517–527. [Google Scholar] [CrossRef]
  22. Liu, F.; Dong, X.; Yu, J.; Hua, Y.; Li, Q.; Ren, Z. Distributed Nash Equilibrium Seeking of N-Coalition Noncooperative Games With Application to UAV Swarms. IEEE Trans. Netw. Sci. Eng. 2022, 9, 2392–2405. [Google Scholar] [CrossRef]
  23. Guo, Z.; Li, Y.; Wang, Y.; Wang, L. Group motion control for UAV swarm confrontation using distributed dynamic target assignment. Aerosp. Syst. 2023, 6, 689–701. [Google Scholar] [CrossRef]
  24. Wang, L.; Qiu, T.; Pu, Z.; Yi, J.; Zhu, J.; Zhao, Y. A Decision-making Method for Swarm Agents in Attack-defense Confrontation. IFAC-PapersOnLine 2023, 56, 7858–7864. [Google Scholar] [CrossRef]
  25. Xing, D.; Zhen, Z.; Gong, H. Offense-defense confrontation decision making for dynamic UAV swarm versus UAV swarm. Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng. 2019, 233, 5689–5702. [Google Scholar] [CrossRef]
  26. Gergal, E.K. Drone Swarming Tactics Using Reinforcement Learning and Policy Optimization; Trident Scholar Report 506; U.S. Naval Academy: Annapolis, MD, USA, 2021. [Google Scholar]
  27. Cai, H.; Luo, Y.; Gao, H.; Chi, J.; Wang, S. A Multiphase Semistatic Training Method for Swarm Confrontation Using Multiagent Deep Reinforcement Learning. Comput. Intell. Neurosci. 2023, 2023, 2955442. [Google Scholar] [CrossRef] [PubMed]
  28. Wang, B.; Li, S.; Gao, X.; Xie, T. UAV Swarm Confrontation Using Hierarchical Multiagent Reinforcement Learning. Int. J. Aerosp. Eng. 2021, 2021, 3360116. [Google Scholar] [CrossRef]
  29. Shahid, S.; Zhen, Z.; Javaid, U.; Wen, L. Offense-Defense Distributed Decision Making for Swarm vs. Swarm Confrontation While Attacking the Aircraft Carriers. Drones 2022, 6, 271. [Google Scholar] [CrossRef]
  30. Wang, J.; Duan, S.; Ju, S.; Lu, S.; Jin, Y. Evolutionary Task Allocation and Cooperative Control of Unmanned Aerial Vehicles in Air Combat Applications. Robotics 2022, 11, 124. [Google Scholar] [CrossRef]
  31. Xu, C.; Xu, M.; Yin, C. Optimized multi-UAV cooperative path planning under the complex confrontation environment. Comput. Commun. 2020, 162, 196–203. [Google Scholar] [CrossRef]
  32. Xuan, W.; Weijia, W.; Kepu, S.; Minwen, W. UAV Air Combat Decision Based on Evolutionary Expert System Tree. Ordnance Ind. Autom. 2019, 38, 42–47. [Google Scholar]
  33. Sun, B.; Zeng, Y.; Zhu, D. Dynamic task allocation in multi autonomous underwater vehicle confrontational games with multi-objective evaluation model and particle swarm optimization algorithm. Appl. Soft Comput. 2024, 153, 111295. [Google Scholar] [CrossRef]
  34. Hu, S.; Ru, L.; Lv, M.; Wang, Z.; Lu, B.; Wang, W. Evolutionary game analysis of behaviour strategy for UAV swarm in communication-constrained environments. IET Control. Theory Appl. 2024, 18, 350–363. [Google Scholar] [CrossRef]
  35. Shefaei, A.; Mohammadi-Ivatloo, B. Wild Goats Algorithm: An Evolutionary Algorithm to Solve the Real-World Optimization Problems. IEEE Trans. Ind. Inform. 2018, 14, 2951–2961. [Google Scholar] [CrossRef]
  36. Wu, M.; Zhu, X.; Ma, L.; Wang, J.; Bao, W.; Li, W.; Fan, Z. Torch: Strategy evolution in swarm robots using heterogeneous–homogeneous coevolution method. J. Ind. Inf. Integr. 2022, 25, 100239. [Google Scholar] [CrossRef]
  37. Bai, Z.; Zhou, H.; Shi, J.; Xing, L.; Wang, J. A hybrid multi-objective evolutionary algorithm with high solving efficiency for UAV defense programming. Swarm Evol. Comput. 2024, 87, 101572. [Google Scholar] [CrossRef]
  38. Xu, D.; Chen, G. Autonomous and cooperative control of UAV cluster with multi-agent reinforcement learning. Aeronaut. J. 2022, 126, 1–20. [Google Scholar] [CrossRef]
  39. Xu, D.; Chen, G. The research on intelligent cooperative combat of UAV cluster with multi-agent reinforcement learning. Aerosp. Syst. 2022, 5, 107–121. [Google Scholar] [CrossRef]
  40. Nian, X.; Li, M.; Wang, H.; Gong, Y.; Xiong, H. Large-scale UAV swarm confrontation based on hierarchical attention actor-critic algorithm. Appl. Intell. 2024, 54, 3279–3294. [Google Scholar] [CrossRef]
  41. Kong, W.; Zhou, D.; Yang, Z.; Zhang, K.; Zeng, L. Maneuver Strategy Generation of UCAV for within Visual Range Air Combat Based on Multi-Agent Reinforcement Learning and Target Position Prediction. Appl. Sci. 2020, 10, 5198. [Google Scholar] [CrossRef]
  42. Zhou, K.; Wei, R.; Zhang, Q.; Wu, Z. Research on Decision-making Method for Territorial Defense Based on Fuzzy Reinforcement Learnin. In Proceedings of the 2019 Chinese Automation Congress (CAC), Hangzhou, China, 22–24 November 2019; pp. 3759–3763. [Google Scholar] [CrossRef]
  43. Jiang, F.; Xu, M.; Li, Y.; Cui, H.; Wang, R. Short-range air combat maneuver decision of UAV swarm based on multi-agent transformer introducing virtual objects. Eng. Appl. Artif. Intell. 2023, 123, 106358. [Google Scholar] [CrossRef]
  44. Wang, Z.; Liu, F.; Guo, J.; Hong, C.; Chen, M.; Wang, E.; Zhao, Y. UAV Swarm Confrontation Based on Multi-agent Deep Reinforcement Learning. In Proceedings of the 2022 41st Chinese Control Conference (CCC), Hefei, China, 25–27 July 2022; pp. 4996–5001. [Google Scholar] [CrossRef]
  45. Fang, J.; Han, Y.; Zhou, Z.; Chen, S.; Sheng, S. The collaborative combat of heterogeneous multi-UAVs based on MARL. J. Phys. Conf. Ser. 2021, 1995, 012023. [Google Scholar] [CrossRef]
  46. Kouzeghar, M.; Song, Y.; Meghjani, M.; Bouffanais, R. Multi-Target Pursuit by a Decentralized Heterogeneous UAV Swarm using Deep Multi-Agent Reinforcement Learning. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 3289–3295. [Google Scholar] [CrossRef]
  47. Youtube. Wild Dogs vs. Wildebeests. Available online: https://www.youtube.com/watch?v=h4SlAc2U1A4 (accessed on 7 February 2025).
  48. Youtube. Lions vs. Buffalo—A Wild Encounter. Available online: https://www.youtube.com/watch?v=t7KMsIdlx1E (accessed on 7 February 2025).
Figure 1. A panoramic view of the hilly terrain. As illustrated in the figure, red and blue agents labeled 1 to 5 are respectively positioned at the opposite corners of the terrain. This terrain comprises multiple rolling hills, with elevations ranging from 40 m to 60 m, and spans an area of 1500 m by 1500 m. The undulating topography includes lower-lying valleys and prominent hilltops.
Figure 1. A panoramic view of the hilly terrain. As illustrated in the figure, red and blue agents labeled 1 to 5 are respectively positioned at the opposite corners of the terrain. This terrain comprises multiple rolling hills, with elevations ranging from 40 m to 60 m, and spans an area of 1500 m by 1500 m. The undulating topography includes lower-lying valleys and prominent hilltops.
Biomimetics 10 00257 g001
Figure 2. The angles of the agent. (a) The schematic illustration of the angles σ r i ( t ) and φ r i ( t ) . (b) The schematic illustration of the angles θ r i ( t ) and ϑ r i ( t ) .
Figure 2. The angles of the agent. (a) The schematic illustration of the angles σ r i ( t ) and φ r i ( t ) . (b) The schematic illustration of the angles θ r i ( t ) and ϑ r i ( t ) .
Biomimetics 10 00257 g002
Figure 3. Agent r 1 uniformly emits rays for detection. As illustrated in the figure, red agent r 1 emits rays toward blue agents b 1 , b 2 , and b 3 for detection. These rays successfully detect b 1 but fail to reach b 2 and b 3 . Specifically, b 2 is beyond the maximum detection range, while b 3 is obstructed by the terrain. Therefore, as defined by Equation (4), b 1 is a member of the set N r i ( t ) , whereas b 2 and b 3 are not included.
Figure 3. Agent r 1 uniformly emits rays for detection. As illustrated in the figure, red agent r 1 emits rays toward blue agents b 1 , b 2 , and b 3 for detection. These rays successfully detect b 1 but fail to reach b 2 and b 3 . Specifically, b 2 is beyond the maximum detection range, while b 3 is obstructed by the terrain. Therefore, as defined by Equation (4), b 1 is a member of the set N r i ( t ) , whereas b 2 and b 3 are not included.
Biomimetics 10 00257 g003
Figure 4. The focused-fire strategy is illustrated by the confrontation between a pack of wild dogs and a group of wildebeests [47]. (a) The wild dogs spot the wildebeest group and quickly close in, attempting to scatter them. (b) The wildebeests begin to flee, with the wild dogs in pursuit. (c) During the escape, a smaller, isolated individual emerges from the wildebeest group, becoming the target for the wild dogs. (d) The wild dogs then concentrate their efforts and launch an attack.
Figure 4. The focused-fire strategy is illustrated by the confrontation between a pack of wild dogs and a group of wildebeests [47]. (a) The wild dogs spot the wildebeest group and quickly close in, attempting to scatter them. (b) The wildebeests begin to flee, with the wild dogs in pursuit. (c) During the escape, a smaller, isolated individual emerges from the wildebeest group, becoming the target for the wild dogs. (d) The wild dogs then concentrate their efforts and launch an attack.
Biomimetics 10 00257 g004
Figure 5. The flanking encirclement strategy is illustrated by the confrontation between a lion pride and a buffalo herd [48]. (a) Three lions seize the opportunity to attack a buffalo, with the pride approaching in a triangular formation. (b) The central lion confronts the buffalo head-on while the lions on both sides flank the buffalo to form a pincer. (c) The lions complete the encirclement and launch an attack. (d) The lions successfully complete the hunt.
Figure 5. The flanking encirclement strategy is illustrated by the confrontation between a lion pride and a buffalo herd [48]. (a) Three lions seize the opportunity to attack a buffalo, with the pride approaching in a triangular formation. (b) The central lion confronts the buffalo head-on while the lions on both sides flank the buffalo to form a pincer. (c) The lions complete the encirclement and launch an attack. (d) The lions successfully complete the hunt.
Biomimetics 10 00257 g005
Figure 6. A flowchart of the target selection algorithm.
Figure 6. A flowchart of the target selection algorithm.
Biomimetics 10 00257 g006
Figure 7. A flowchart of the motion planning algorithm.
Figure 7. A flowchart of the motion planning algorithm.
Biomimetics 10 00257 g007
Figure 8. A flowchart of the bio-inspired swarm confrontation algorithm.
Figure 8. A flowchart of the bio-inspired swarm confrontation algorithm.
Biomimetics 10 00257 g008
Figure 9. A diagram of the confrontation process. (a) Agents from both the red and blue teams are generated at diagonal positions on the map. (b) Both teams advance toward the higher hills on the map in search of opponents, while the red agents disperse in multiple directions. This maneuver leads to a direct confrontation between the two sides, prompting the red agents to initiate an attack. (c) After a round of attacks, the red agents begin to maneuver towards the rear. (d) The red agents form an encirclement around the opponent from multiple directions, employing strategies such as focused fire and flanking encirclement. (e) The red agents tighten the encirclement. (f) The match ends with all opponents eliminated, and the red team wins this round.
Figure 9. A diagram of the confrontation process. (a) Agents from both the red and blue teams are generated at diagonal positions on the map. (b) Both teams advance toward the higher hills on the map in search of opponents, while the red agents disperse in multiple directions. This maneuver leads to a direct confrontation between the two sides, prompting the red agents to initiate an attack. (c) After a round of attacks, the red agents begin to maneuver towards the rear. (d) The red agents form an encirclement around the opponent from multiple directions, employing strategies such as focused fire and flanking encirclement. (e) The red agents tighten the encirclement. (f) The match ends with all opponents eliminated, and the red team wins this round.
Biomimetics 10 00257 g009aBiomimetics 10 00257 g009b
Figure 10. Upon fixing either parameter d b c 1 or d b c 2 , and varying the other, the corresponding results are obtained. (a) Results of W r with varying d b c 1 . (b) Results of W r with varying d b c 2 . (c) Results of ξ with varying d b c 1 . (d) Results of ξ with varying d b c 2 . (e) Results of λ with varying d b c 1 . (f) Results of λ with varying d b c 2 .
Figure 10. Upon fixing either parameter d b c 1 or d b c 2 , and varying the other, the corresponding results are obtained. (a) Results of W r with varying d b c 1 . (b) Results of W r with varying d b c 2 . (c) Results of ξ with varying d b c 1 . (d) Results of ξ with varying d b c 2 . (e) Results of λ with varying d b c 1 . (f) Results of λ with varying d b c 2 .
Biomimetics 10 00257 g010aBiomimetics 10 00257 g010b
Figure 11. Results under different confrontation scales. (a) Results of W r under different confrontation scales. (b) Results of ξ under different confrontation scales. (c) Results of λ under different confrontation scales.
Figure 11. Results under different confrontation scales. (a) Results of W r under different confrontation scales. (b) Results of ξ under different confrontation scales. (c) Results of λ under different confrontation scales.
Biomimetics 10 00257 g011
Figure 12. A hilltop panoramic view of the other hilly terrain. This map shares the same dimensions as the first one but exhibits steeper overall slopes with less pronounced elevation variations.
Figure 12. A hilltop panoramic view of the other hilly terrain. This map shares the same dimensions as the first one but exhibits steeper overall slopes with less pronounced elevation variations.
Biomimetics 10 00257 g012
Figure 13. Results using different confrontation maps. (a) Results of W r under different confrontation maps. (b) Results of ξ under different confrontation maps. (c) Results of λ under different confrontation maps.
Figure 13. Results using different confrontation maps. (a) Results of W r under different confrontation maps. (b) Results of ξ under different confrontation maps. (c) Results of λ under different confrontation maps.
Biomimetics 10 00257 g013
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cai, H.; Ma, F.; Ni, R.; Xu, W.; Gao, H. Bio-Inspired Swarm Confrontation Algorithm for Complex Hilly Terrains. Biomimetics 2025, 10, 257. https://doi.org/10.3390/biomimetics10050257

AMA Style

Cai H, Ma F, Ni R, Xu W, Gao H. Bio-Inspired Swarm Confrontation Algorithm for Complex Hilly Terrains. Biomimetics. 2025; 10(5):257. https://doi.org/10.3390/biomimetics10050257

Chicago/Turabian Style

Cai, He, Fu Ma, Ruifeng Ni, Weiyuan Xu, and Huanli Gao. 2025. "Bio-Inspired Swarm Confrontation Algorithm for Complex Hilly Terrains" Biomimetics 10, no. 5: 257. https://doi.org/10.3390/biomimetics10050257

APA Style

Cai, H., Ma, F., Ni, R., Xu, W., & Gao, H. (2025). Bio-Inspired Swarm Confrontation Algorithm for Complex Hilly Terrains. Biomimetics, 10(5), 257. https://doi.org/10.3390/biomimetics10050257

Article Metrics

Back to TopTop