Next Article in Journal
Estimating Blood Pressure during Exercise with a Cuffless Sphygmomanometer
Previous Article in Journal
Optimal Distributed Finite-Time Fusion Method for Multi-Sensor Networks under Dynamic Communication Weight
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-UAV Collaborative Search and Attack Mission Decision-Making in Unknown Environments

1
Beijing Key Laboratory of High Dynamic Navigation Technology, Beijing 100192, China
2
Ministry of Education Key Laboratory of Modern Measurement & Control Technology, Beijing 100101, China
3
School of Automation, Beijing Information Science & Technology University, Beijing 100192, China
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(17), 7398; https://doi.org/10.3390/s23177398
Submission received: 17 July 2023 / Revised: 21 August 2023 / Accepted: 21 August 2023 / Published: 24 August 2023
(This article belongs to the Section Vehicular Sensing)

Abstract

:
To address the challenge of coordinated combat involving multiple UAVs in reconnaissance and search attacks, we propose the Multi-UAV Distributed Self-Organizing Cooperative Intelligence Surveillance and Combat (CISCS) strategy. This strategy employs distributed control to overcome issues associated with centralized control and communication difficulties. Additionally, it introduces a time-constrained formation controller to address the problem of unstable multi-UAV formations and lengthy formation times. Furthermore, a multi-task allocation algorithm is designed to tackle the issue of allocating multiple tasks to individual UAVs, enabling autonomous decision-making at the local level. The distributed self-organized multi-UAV cooperative reconnaissance and combat strategy consists of three main components. Firstly, a multi-UAV finite time formation controller allows for the rapid formation of a mission-specific formation in a finite period. Secondly, a multi-task goal assignment module generates a task sequence for each UAV, utilizing an improved distributed Ant Colony Optimization (ACO) algorithm based on Q-Learning. This module also incorporates a colony disorientation strategy to expand the search range and a search transition strategy to prevent premature convergence of the algorithm. Lastly, a UAV obstacle avoidance module considers internal collisions and provides real-time obstacle avoidance paths for multiple UAVs. In the first part, we propose a formation algorithm in finite time to enable the quick formation of multiple UAVs in a three-dimensional space. In the second part, an improved distributed ACO algorithm based on Q-Learning is introduced for task allocation and generation of task sequences. This module includes a colony disorientation strategy to expand the search range and a search transition strategy to avoid premature convergence. In the third part, a multi-task target assignment module is presented to generate task sequences for each UAV, considering internal collisions. This module provides real-time obstacle avoidance paths for multiple UAVs, preventing premature convergence of the algorithm. Finally, we verify the practicality and reliability of the strategy through simulations.

1. Introduction

In recent years, with the increase in mission complexity and the enhancement of UAV autonomy, unmanned aircraft swarm systems (UAS) have gradually received attention from researchers. Drone swarms have been used for target search and rescue [1,2], cluster operation [3,4], and target tracking [5,6]. Within the military domain, coordinated reconnaissance search attacks by UAV swarms in unknown environments is an important application. Drone swarm reconnaissance operations can provide more information, faster response times, greater accuracy, and lower costs. UAV swarm missions are characterized by complexity and dynamics, making multi-objective tasking of multiple UAVs a challenge [7,8].
Task allocation is the assignment of a task to an intelligence or a group of intelligences, with the goal of finding an optimal or near-optimal mapping between an intelligence and a task [9]. Auction or market-based approaches are commonly used in multi-robot task allocation problems [10,11], including AsyMTRE-D [12], Murdoch [13], M+ [14], and TraderBots [15]. Graph-theoretic particle swarm optimization for military applications [16], Sandholm’s algorithm with K-means clustering [17], Efficiency-based tasking [18], and many other approaches focus on minimizing the distance carried out by the robot to complete the task while satisfying the robot’s own resource allocation. Borja Fernandez-Gauna et al. developed a distributed round-robin Q-Learning for the transportation of hoses using UGVs [19]. Mathew et al. developed path planning algorithms for miniature UAVs and UGVs performing parcel delivery tasks [20]; the problem is modeled as a multi-warehouse delivery problem, transformed into a generalized traveler’s problem, and solved in polynomial time using a kernel sequence enumeration algorithm. Bayesian Reinforcement Learning (RL) allows the agents to learn the capabilities of other agents through interactions and transform repetitive coalition formation problems into sequential decision problems [21]. This approach was validated in the soccer team formation problem [22], Modeling Dynamic Robot Formation for the Area Coverage Problem Using Weighted Voting Games and Q-Learning, and Extended to Formation-Based Navigation Problems [23,24]. The formation structure is pruned using Shapley values and marginal contributions, and the transition of robots from one formation to another is represented by the Markov process, which searches for the optimal structure of the formation space using a Markov probability distribution. Sayan D. et al. proposed a multi-criteria decision-making algorithm based on influence diagrams to select the best formation algorithm for a given realistic scenario [25]. Pengxing Zhu et al. constructed a heterogeneous UAV multi-task assignment model based on multiple realistic constraints and proposed an improved semi-randomized Q-Learning algorithm, which was demonstrated through simulation to be able to improve the performance of task execution compared to heuristic algorithms such as the Q-Learning algorithm, including improving the reasonableness of the task assignment, with a gain value of improvement of 12.12% [26]. Han Qi et al. solved the target assignment and trajectory planning separately and proposed a multi-intelligence target assignment and trajectory planning method based on a multi-intelligence reinforcement learning algorithm [27]. Minchan Li et al. proposed an effective stability quantum particle swarm optimization algorithm (SQPSO) and designed an efficient coalition-building quantum particle swarm optimization algorithm (EQPSO), which incorporates coalition similarity judgments and reduces coalition formation time overheads [28]. Heba Kurdi et al. proposed a bacterial heuristic algorithm for UAV task assignment [29]. Xiang Ma et al. proposed an improved virtual force method to obtain the optimal damage effectiveness control strategy, which solves the numerical solution problem caused by the high complexity of the damage function [30]. Xia Chen et al. proposed a systematic framework for solving the combinatorial optimization model with a new particle swarm optimization algorithm based on a bootstrapping mechanism, and for the trajectory planning problem, an ant colony optimization algorithm based on adaptive parameter tuning and bidirectional search was proposed [31]. Taehoon Yoo et al. proposed a reinforcement learning-based intelligent formation controller that enables each UAV to communicate with other UAVs [32]. Yuchong Gao et al. proposed a scalable, purely azimuthal passive UAV formation keeping distributed control algorithm to maintain multi-UAV formation using pure angular information through only a small amount of inter-UAV communication [33].
In this paper, a multi-UAV-distributed self-organized collaborative intelligent reconnaissance operation strategy is proposed for the problem of collaborative reconnaissance operation of multi-UAV multi-mission targets. We have performed the following main tasks:
  • A distributed self-organized multi-UAV cooperative intelligent reconnaissance combat strategy (CISCS) is proposed for the multi-UAV cooperative reconnaissance combat problem;
  • A formation controller with time constraints is proposed to extend the multi-layer artificial potential field to avoid the internal collision of multiple UAVs in fast formation and to solve the problem where the traditional artificial potential field method tends to fall into the optimal solution;
  • Reinforcement learning is utilized to accelerate the convergence speed of the ACO algorithm and expand the search capability, and a colony disorientation strategy and a search-switching strategy are incorporated to prevent its premature maturity.
We will discuss the above according to the following structure: Section 2 is the multi UAV multi-task target feature scene and task allocation decision model; Section 3 designs the overall structure of CISCS, including a formation controller with time constraints for fast formation, a distributed and improved ant colony algorithm based on Q-Learning for generating multi-UAV task sequences, and an extended artificial-potential field algorithm for drone avoidance within the drones formation; in Section 4, we conducted simulation and demonstration of this algorithm; and the conclusions are presented in Section 5.

2. Scene Description and Modeling

In this section, the search attack task in an uncertain environment is modeled; the search attack task planning problem is defined, and the constraints of the search attack task planning model are given.
Assumption 1.
Each UAV is capable of performing reconnaissance search and attack missions.
V = { V 1 , V 2 , , V N v } indicates N v multi-drone formation. T = { T 1 , T 2 , , T N t } is the set of N t unknown targets distributed within the task area.

2.1. Collaborative Search Attack Constraints

Multi-UAV coordinated search–attack mission is an online planning problem whose goal is to discover and destroy as many targets as possible in a dynamic and complex environment. Therefore, the primary problem is to develop a distributed optimization model for search–attack missions. A detailed description of these constraints is given below [34]:
(1)
Maximum range constraint
Single UAV mission with maximum range Dmax:
DmaxDi(t) ≥ 0 (i = 1, 2, 3, …, Nv)
where Di(t) denotes the cumulative distance traveled by the ith UAV before moment t; Di(t) denotes the maximum distance traveled;
(2)
Maximum turning angle constraint
The drone has a maximum turning angle θ m a x
θ m a x θ i ( t ) 0   ( i = 1 ,   2 ,   3 ,   ,   N v )
where θ i ( t ) denotes the turning angle of the ith UAV at moment t. The maximum turn angle limits the UAV turn radius and turn rate, θ m a x   ( 0 ,   π ) . When the desired turn angle is 0, it means that the drone does not need to turn; when the desired turn angle is pi, it means that the drone is expected to fly in the reverse direction;
(3)
Minimum flight distance between drones
In order to ensure the flight safety of multi-UAV formations, there exists a safety distance for multi-UAVs to perform their tasks:
  d i j t d m i n 0   ( i = 1 ,   2 ,   3 ,   ,   N v )
where d i j t denotes the distance between the ith UAV and the jth UAV at time t; d m i n denotes the minimum flight distance between any two drones;
(4)
Risk area avoidance constraints
UAVs need to maneuver to avoid the risk zone during combat, so there is a minimum risk radius between the UAV and the center of the risk zone:
d i l t R m i n 0   ( i = 1 ,   2 ,   3 ,   ,   N v ,   l = 1 ,   2 ,   3 ,   ,   N l )
where dil(t) denotes the distance between the ith UAV and the lth risk zone at time t; Rmin denotes the minimum risk radius.

2.2. Multi-UAV Collaborative Mission Description

(1)
Description of reconnaissance coverage
Ground reconnaissance coverage radius exists for UAVs during reconnaissance missions.
Reconnaissance coverage:
P r = i = 1 N v S i S A r e a
where Si denotes the reconnaissance area of the ith UAV; SArea denotes the total area of the combat zone.
Reconnaissance search benefit function:
Jr = Pr
(2)
Description of the attack mission
Attack gain function:
J a = i = 1 N v j = 1 T a i j s c o r e j
where aij denotes that the ith UAV attacked the jth target; when aij = 0, it means that the attack mission on the jth target was not accomplished; when aij = 1, it means that the attack mission on the jth target was accomplished. scorej denotes the gain of the attack on the jth target.

2.3. Mathematical Model for Tasking

The coordinated multi-UAV search attack needs to maximize the benefit function under the above constraints, so the following centralized multi-UAV task allocation constraint model can be established:
U = a r g   max U ( η J r + ( 1 η ) J a ) s . t . D m a x D i t 0 θ m a x θ i t 0 d i j t d m i n 0 d i l t R m i n 0 ,   ( i , j = 1 , 2 , , N v , i j )
where η ∈ {0, 1}; when η = 0, it denotes that the UAV performs an attack mission; when η = 1, it denotes that the UAV performs a reconnaissance mission. U denotes the UAV decision-making. Assuming that each UAV is an independent individual and all have the ability to make independent decisions, the centralized optimization model (8) can be decomposed into a distributed optimization model:
U i = a r g   max U i ( η i J r i ( X i , X ~ j ) + ( 1 η i ) J a i ( X i , X ~ j ) ) s . t . D i   m a x D i t 0 θ i   m a x θ i t 0 d i j t d i j   m i n 0 d i l t R i   m i n 0 ,   ( i , j = 1 , 2 , , N v , i j )
where U i denotes the decision of the ith drone, J r i denotes the reconnaissance gain of the ith UAV, J a i denotes the gain from the ith drone attack; Xi denotes the state of the ith drone; X ~ j = { X j | j T a d j a c e n t i } , where T a d j a c e n t i denotes the set of UAVs that can communicate with the ith UAV.

3. Algorithms for Multi-Drone Collaborative Intelligent Warfare

Based on the above-distributed optimization model (9), multi-UAVs interact with the battlefield for information, real-time formation control of multi-UAVs, task assignment, and path planning. In this section, we will design a multi-UAV cooperative intelligent combat algorithm for the multi-UAV real-time target search and attack problem.

3.1. Formation Controller with Time Constraints

When multiple UAVs fly in formation, the Leader sends position information to the follower, and the follower adjusts its desired position coordinates in real time according to the Leader’s position information and, eventually, forms into a stable formation [35]. In order to improve the rapidity of multi-UAV formation, we propose a finite time-based multi-UAV formation control algorithm, as shown in Equation (10).
u i f ( t ) = k f [ P θ i t ,   φ i t d f d i t + k i t 0 t 1 d f d t d t + k d d d f d t d t ]
k f = w t w t ,   t [ 0 ,   t w )
In Equation (10), u i f t is the input of UAV i; k f is the finite time gain variable; t w indicates limited time; d f is the distance between the drone and the formation target; d f is the desired distance; d i t is the distance between UAV i and the formation target at the moment t; θ i t is the angle between the line of the leader and the follower and the z-axis positive direction, and φ i t is the angle between the projection of this line on the XOY plane and the x-axis positive direction. P ( θ , φ ) is s i n θ c o s φ s i n φ c o s θ c o s θ , which is used to convert u i f ( t ) to an output quantity in the x, y, and z directions.
In Equation (11), w is the finite time gain normal, and tw is the finite time parameter.
Theorem 1.
For a multi-UAV system, the finite time gain kf in Equation (11) satisfies that w is a positive constant and t w is an arbitrarily given finite time; then, it is guaranteed that the multi-UAVs form a leader–follower flight formation at any time tw.
Proof. 
The Lyapunov function is now chosen as
V d i t = 1 2 d f d i t 2
Derivation leads to
V ˙ d i t = ( d f d i t ) d i ˙ t
From the controller Equation (10)
V ˙ d i t = ( d f d i t ) · w t w t [ P + k i + k d · d f P · d i t k i 0 t 1 d i t d t k d · d d t d i ( t ) ] 0
Therefore, according to Lyapunov stability analysis, the given UAV distance controller is stable and the actual flight distance d i t converges to the desired flight distance d f . This proves the convergence and stability of the controller. □

3.2. Reconnaissance Search Attack Tasking Algorithm

We propose an improved ant colony multi-task multi-objective solving method based on Q-Learning with constraints by utilizing the powerful feedback memory capability of Q-Learning [36], and the process of using this algorithm for UAV cooperation is described in detail below.

3.2.1. Q-Learning State Space

The ant colony position information is used as the state space of Q-Learning, which guides the later ant colonies to find the best solution through the historical optimal path and accelerates the convergence of the ant colony algorithm. The Euclidean distance between the current state xt of the ant colony and the optimal solution of the previous state is taken as the current state St of Q-Learning, i.e.,
S t = x t x t 1
where x t 1 denotes the optimal position at moment t − 1.
Normalize St
S t = H H ,   0.8 S t H M ,   0.6 S t < 0.8 M M ,   0.4 S t < 0.6 L M ,   0.2 S t < 0.4 L L ,   0 S t < 0.2
where HH, HM, MM, LM, LL denote five different ant colony states, respectively.

3.2.2. Q-Learning Action Space

Since ant colony search mainly involves three different parameters [37]: pheromone-inspired factor α; expectation-inspired factor β; and pheromone evaporation factor ρ. In this paper, we adaptively adjust the size of the three parameters according to the environment and regulate the search capability into five grades, namely, full-domain search, local search, hold search, slow-step convergence, and fast convergence, which are used as the action space of Q-Learning. These five search modes correspond to preset parameter combinations, as shown in Table 1.

3.2.3. Reward Function

In order to speed up the process of ant colony optimization, setting reasonable rewards for the ant colony can guide the ant colony to search in a more optimal direction. Considering multiple objectives in a multi-objective optimization problem, the sum of the fitness differences before and after the ant colony update is taken as the reward value, i.e.,:
r t ,   t + 1 = i = 1 n F i t i x t F i t i ( x t + 1 )
where r t , t + 1 denotes the reward value of the ant colony after updating; x t and x t + 1 denote the positions of the colony before and after updating; n denotes the number of objective functions; and F i t i x t denotes the fitness of the ith objective function at x t .

3.2.4. Q-Value Update

Learn the design of states, actions, and rewards based on the above Q. By judging the current state the colony is in, the action corresponding to the maximum Q value of that state in the Q table is selected [38]. The combination of parameters is selected based on the optimization of this action; the state information of the ant colony is updated by the new parameters; the rewards obtained from the updating process are calculated, and the Q-table is updated by combining the new states and actions of the colony in order to track, learn, and dynamically adjust the optimization process [39].
The update function for the Q-value is:
Q S t + 1 , a t + 1 = 1 α Q S t , a t + α [ r t , t + 1 + γ max α Q ( S t + 1 ,   a t ) ]
where α is the learning rate; γ is the discount; a t denotes the taken action at moment t, i.e., the update parameter at moment t; r t , t + 1 denotes the reward value after the update of the ant colony, and Q S t , a t denotes the action value of state S t taking action.
The algorithm flow of Q-learning is shown in Figure 1.

3.2.5. Ant Colony Lost

In order to prevent the ant colony algorithm from maturing prematurely and entering the local optimal solution, this paper proposes an ant colony lost strategy to increase the local search ability of the ant colony. As shown in Figure 2.When the ant colony selects the next node according to the transfer probability, it mimics the situation of ants becoming lost and sets the lost rate τ, so that the ants do not arrive at the next node according to the agreement but be lost to another nearby node, thus increasing the local search ability of the ant colony.
P o s n e x t = P o s i ,   r a n d τ P o s j ,   e l s e                    
It is now assumed that when the ant colony has entered into a locally optimal solution, but the colony is still in motion, there is a certain probability of the ant colony in the process of motion, i.e., the rate of disorientation; the ant colony does not follow the originally established route but is disoriented into another path, such as that of the ants in the blue circle in Figure 2, and enters into the other path, and if this disorientation obtains a new solution that is more superior to the previous one, then it means that the ant colony jumps out of the locally optimal solution.

3.2.6. Search Conversion Strategy

In the multi-task objective search optimization algorithm, the ant colony is prone to premature maturity and local optimization, which makes the ant colony algorithm unable to obtain the optimal solution and lose some of the optimization search samples in order to increase the diversity of sample search; this paper designs a multi-objective search transformation strategy, which is used to enhance the algorithm’s optimization searching ability, and balances the algorithm’s ability to explore globally and develop locally. On the basis of multi-task objective ant colony optimization, the variation strategy of jumping out of the local optimum is set for the ant colony. Firstly, the ant colony enters the global optimization session to obtain the short-lived optimal solution set G. Then, it enters the local search stage and performs random mutation based on Cauchy distribution on all the solutions in G and obtains the position of the ant colony after mutation. Finally, the newly generated ant colony is evaluated for fitness and ranked in comparison with the original ant colony in G, thus updating the G table. The two search methods are used to switch continuously during the optimization process to update the optimal set G [30].
In order to fully increase the diversity of search samples, this paper adopts a variant based on the Cauchy distribution, which is used to increase the local search capability of the ant colony. Cauchy distribution is suitable for exploring randomness.
Therefore, using the nature of the Cauchy distribution to combine it with the ant colony transfer probability function, the Cauchy operator is introduced into the ant colony transfer so that the optimal solution set is updated through the Cauchy distribution-based variation to improve the local exploration ability and increase the diversity of samples. The search transformation strategy improves the searchability of the ant colony and leads the ant colony to reject the local optimal solution with a certain probability, and its formula for the ant colony variation based on the Cauchy distribution is as follows:
p i j k ( t ) τ i j α · η i j β k C i τ i j α · η i j β + 1 2 + 1 π tan 1 t τ i j p i j ,   j C i 0
where i , j are the start and end points, respectively; α ,   β are the weighting factors; τ is the pheromone concentration; η is the visibility, which is the inverse of the distance between points i , j ; and C is the set of unvisited nodes.

3.2.7. Improved Ant Colony Multi-Task Objective Optimization Algorithm Based on Q-Learning

The ant colony is used as the intelligent body of Q-Learning, and the ant colony updates its own action parameters through its behavioral state, which realizes the expansion of the search range, accelerates the convergence speed of the algorithm, completes the problem of solving the multi-task objective, and overcomes the problems of the algorithm being easy to enter into the locally optimal solution and precocious maturity. The algorithm flow is shown in Algorithm 1.
Algorithm 1: Improved Ant Colony Multi-Task Objective Optimization Algorithm Based on Q-Learning
Step 1: Initialization
         Set the maximum number of iterations tmax, maximum number of ants mmax, α, β, ρ parameters
         Initialize the total amount of pheromones, Q table, and global optimal solution set G
         Initialize the ant colony and randomly allocate starting points for the ants
         Set initial pheromone concentration
         Set evaluation function and multi-task goal planning task
Step 2: Ant colony search
         For t = 1 to tmax do
           For m = 1 to mmax do
                  Calculate the probability of movement for each ant
                  Choose the next moving node
                  Ant loss strategy, choose whether to be lost in the path with a certain probability
                  Update the pheromone concentration
                  Evaluate the ant colony, update the optimal solution set G
                  Calculate ant reward, update the action value function
           End For
           Generate a transient optimal solution set G
           Perform search transformation strategy, update G
         End For
Step 3: Output the results
         Output the global optimal solution set G

3.2.8. Extended Artificial Potential Field Algorithm (EAPF)

To ensure that the flight safety of multi-UAVs in the mission avoids internal collision [40], we designed a multi-UAV drone avoidance control algorithm based on the traditional artificial potential field method. The algorithm extends the multilayer potential field function to improve the UAV avoidance performance based on the traditional artificial potential field algorithm with a gravitational potential field of
U a t t p = α 1 ( p p 0 ) 2 ,   d ( 0 ,   d 0 ) 2 d 0 α 1 p p 0 ,   d ( d 0 , + )
where U a t t p is the potential field function; α 1 denotes the gravitational coefficient; p denotes the current position of the UAV; p 0 is danger zone; d is the Euclidean distance between the UAV and the target, and d 0 is the distance threshold; when the distance between the position of the UAV and the location of the target is greater than the set distance threshold d 0 , the gravitational force is constant. When the distance between the UAV and the target is smaller than the set distance threshold d 0 , the UAV can be considered to be moving within the influence distance of the target point.
The repulsive potential field function is:
U r e p p = 0 ,   d ( d 1 , + ) β 1 ( 1 p p 1 1 d 0 ) 2 ( p p 0 ) γ 1 ,   d ( d 0 ,   d 1 ] 2 β 1 ( 1 p p 1 1 d 0 ) 2 ( p p 0 ) 2 ,   d ( 0 ,   d 0 ]
where U r e p p denotes the repulsive potential field function; β 1 denotes the repulsive coefficient; γ 1 denotes the repulsive factor; and d 1 denotes the distance threshold.
The gravitational function is
F a t t p = U a t t p = 2 α 1 p p 0 ,   d ( 0 ,   d 0 ) α 1 d 0 p p 0 p p 0 ,   d ( d 0 , + )
The repulsion function is
F r e p ( p ) = U r e p p = 0 ,   d ( d 1 , + ) F 1 + F 2 ,   d ( d 0 ,   d 1 ) F 1 + F 2 ,   d ( 0 ,   d 0 )
where F 1 and F 2 are
F 1 ( p ) = 2 β 1 1 p p 1 1 d 0 p p 0 γ 1 p p 1 2 ,   d ( d 0 ,   d 1 ) 2 β 1 1 p p 1 1 d 0 p p 0 2 p p 1 2 ,   d ( 0 ,   d 0 )
F 2 ( p ) = 2 β 1 1 p p 1 1 d 0 p p 0 γ 1 1 ,   d ( d 0 ,   d 1 ) 2 β 1 1 p p 1 1 d 0 p p 0 ,   d ( 0 ,   d 0 )
where F 1 and F 2 denote the two forces of repulsion; here is a segmented design of the repulsion.
The schematic diagram of the extended artificial potential field is shown in Figure 3. “!” denotes the danger zone; “T” denotes the target point.

4. Experimentation and Simulation

We utilized MATLAB/Simulink to test the performance of the CISCS algorithm from different angles.

4.1. Formation Controller Simulation

In the following, the effectiveness and reliability of the multi-UAV distributed self-organized collaborative intelligent reconnaissance combat strategy (CISCS) proposed are verified through example solutions. The simulation platform is a kind of multi-UAV simulation platform running in MATLAB R2019a/Simulink, as shown in Figure 4, Figure 5 and Figure 6.
In this simulation platform, the UAV body model consists of a UAV power model, a control efficiency model, and a rigid body model, and the UAV controller consists of a position controller and an attitude controller, both of which are PID controllers, and the input of the controller is the desired position, and the output is the PWM control command of the UAV motor, which is inputted into the UAV body model to control the UAV position attitude, and the UAV position attitude is fed back to the controller.
Formation control analysis: in order to conduct formation control performance analysis, we compare the two control methods of adding a formation controller and not adding a formation controller and set up comparison experiments of dynamic flight formation and static flight formation to verify the rapidity and stability of the formation controller. It can be seen that the formation controller can help multi-UAVs form stable flight formations faster.
In Table 2, we give the list of parameters, the selection of which relies on several experiments to obtain the best possible performance optimization.
Firstly, a formation simulation experiment is carried out using one Leader and two Followers; the position of each UAV in 3D space is shown in Table 3; multiple drones are placed in horizontal rows on the ground with a spacing of 1 m. The triangular formation Follower forms a 45° angle with the Leader at a distance of 2 m. The circular formation Follower forms an enclosing circle around the Leader with a radius of 2 m. The UAVs in the triangular shape are then simulated in a circular shape. After simulation, it is obtained, as shown in Figure 7, Figure 8, Figure 9 and Figure 10.
Extracting the simulation data, it can be seen that the formation time with the formation controller is significantly shorter than the formation time without the formation controller. Without changing the UAV’s own controller parameter, the formation controller can help the UAV form a triangular static formation faster, compared with the formation without a formation controller, to accelerate the formation by 3.5080 s. In Figure 8b, the UAV spacing is jittered due to the parameter adjustment problem of the formation controller, but the maximum range of the jittering is ±18.79% of the control amount, and the maximum value of the UAV spacing without the formation controller is 2.7663 m, which is 95.64% of over-regulation. It can be concluded that in static three-aircraft formation, the formation controller can form a faster and more stable formation.
In Figure 9a, without adding the formation controller, the two Follower-drones’ paths cannot be overlapped, and in Figure 9b, it can be seen that the distance control of multiple UAVs is unstable when they are in a circular formation around the Leader; after adding the UAV controller, the Follower-drones’ paths in Figure 10a are basically overlapped, and the UAV distance in Figure 10b can be stabilized around 2 m, and the overshooting is 95.64%. Stabilization can be stabilized near 2 m, and the formation time is 16.3720 s, which is larger than the formation time of 15.84 s for static formation, indicating that the formation of dynamic formation is more difficult. Under the condition of not changing the controller parameter of the UAV itself, in the three-aircraft dynamic formation, the formation effect without a formation controller is poor, and the UAV spacing jerks violently; after the addition of the formation controller, there is a small fluctuation in the UAV spacing, but the overall tendency is stable, and the overshooting amount is 27.71%. Therefore, in the three-aircraft dynamic formation, the formation control can help the UAV form a more stable formation.
To verify the scalable performance of this formation controller, formation simulation experiments with five machines are added.
In the five-aircraft static formation control experiment, the comparison between Figure 11a and Figure 12a are obvious; in the presence of the formation controller, the UAV formation is faster and with less error, which accelerates the formation by 3.992 s compared to the formation without the formation controller. In Figure 11b, the formation can be stabilized, but it is obvious that the overshoot is too large compared to Figure 12b, and the formation time is much longer. Although there is some jitter in Figure 12b, it is due to the UAV airframe model, and the range of jitter is small.
In the five-aircraft dynamic formation experiment, there is a large formation error in the Follower-drones in Figure 13a, which may be due to the difficulty of stabilizing the spacing of multiple UAVs during the formation process, which can also be obtained from Figure 13b; after adding the UAV controller, the Follower-drones in Figure 14a are able to form more stable circular formations around Leader, Figure 14b. A stable circular formation and the drone spacing in Figure 14 are obviously much more stable than those in Figure 13, indicating that the formation controller can help the multi-drones form a better formation.

4.2. Multi-UAV Cooperative Mission Simulation

In order to verify the effectiveness of CISCS, we call one, three, and five airplanes to perform the same mission and carry out simulation experiments according to the flow of takeoff, formation, formation flight, mission execution, and away from the mission area, and obtain the multi-aircraft mission sequences by CISCS, as shown in Figure 15, Figure 16, Figure 17 and Figure 18.
Task point coordinates are shown in Figure 15 and Table 4.
The mission execution sequence and mission completion time obtained from CISCS are shown in Table 5. From this, it can be seen that there is a reduction of 52.55% and 58.27% for three and five airplanes, respectively, as compared to a single airplane.

5. Conclusions

We propose a multi-UAV-distributed self-organized collaborative intelligent reconnaissance combat strategy. In this strategy, a multi-UAV formation controller with time constraints is designed for fast formation, and EAPF is utilized to prevent collisions within the multi-UAV formation, while a distributed improved ant colony algorithm based on Q-Learning is utilized to generate a sequence of tasks executed by multi-UAVs. Simulation experiments show that the multi-UAV finite-time formation controller can help multi-UAVs form stable formations faster and significantly improve the efficiency of task execution by assigning multiple task points to multiple UAVs through the task assignment algorithm, and prevent internal collisions within the UAV formation to guarantee the air safety of UAVs through the Extended Artificial Potential Field (EAPF) method.
CISCS assumes that UAVs are equipped with enough attack munitions and reconnaissance equipment to reconnaissance and attack arbitrary targets, and in the next research, the actual reconnaissance and combat capabilities of UAVs should be considered, increased with dynamic constraints, and multi-dimensional multi-platform collaborative combat issues, such as multi-kinds of UAVs, and ground equipment should be considered.

Author Contributions

Conceptualization, Z.L. and Q.L.; methodology, Z.L. and Q.L.; formal analysis, Z.L. and Q.L.; investigation, Z.L. and G.F.; writing—original draft preparation, Z.L. and Q.L.; writing—review and editing, Z.L., Q.L., and G.F.; supervision, Q.L. and G.F.; project administration, Q.L. and G.F. All authors have read and agreed to the published version of the manuscript.

Funding

Supported by The National Natural Science Foundation of China (Grant No. 61971048), the National Key Research and Development Program of China (No. 2020YFC1511702, 2022YFF0607400), and the Beijing Scholars Program.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Khalil, H.; Rahman, S.U.; Ullah, I.; Khan, I.; Alghadhban, A.J.; Al-Adhaileh, M.H.; Ali, G.; ElAffendi, M. A UAV-Swarm-Communication Model Using a Machine-Learning Approach for Search-and-Rescue Applications. Drones 2022, 6, 372. [Google Scholar] [CrossRef]
  2. Dah-Achinanon, U.; Bajestani, S.E.M.; Lajoie, P.-Y.; Beltrame, G. Search and rescue with sparsely connected swarms. Auton. Robot. 2023. [Google Scholar] [CrossRef]
  3. Chen, W.; Meng, X.; Liu, J.; Guo, H.; Mao, B. Countering Large-Scale Drone Swarm Attack by Efficient Splitting. IEEE Trans. Veh. Technol. 2022, 71, 9967–9979. [Google Scholar] [CrossRef]
  4. Jinqiang, H.; Husheng, W.; Renjun, Z.; Rafik, M.; Xuanwu, Z. Self-organized search-attack mission planning for UAV swarm based on wolf pack hunting behavior. J. Syst. Eng. Electron. 2021, 32, 1463–1476. [Google Scholar] [CrossRef]
  5. Zhou, L.; Leng, S.; Liu, Q.; Wang, Q. Intelligent UAV Swarm Cooperation for Multiple Targets Tracking. IEEE Internet Things J. 2022, 9, 743–754. [Google Scholar] [CrossRef]
  6. Zhou, W.; Li, J.; Liu, Z.; Shen, L. Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning. Chin. J. Aeronaut. 2022, 35, 100–112. [Google Scholar] [CrossRef]
  7. Liu, L.; Xu, M.; Wang, Z.; Fang, C.; Li, Z.; Li, M.; Sun, Y.; Chen, H. Delay-Informed Intelligent Formation Control for UAV-Assisted IoT Application. Sensors 2023, 23, 6190. [Google Scholar] [CrossRef]
  8. Chen, Y.; Jiao, Y.; Wu, M.; Ma, H.; Lu, Z. Group Target Tracking for Highly Maneuverable Unmanned Aerial Vehicles Swarms: A Perspective. Sensors 2023, 23, 4465. [Google Scholar] [CrossRef]
  9. Rizk, Y.; Awad, M.; Tunstel, E. Cooperative Heterogeneous Multi-Robot Systems: A Survey. ACM Comput. Surv. 2019, 52, 29. [Google Scholar] [CrossRef]
  10. Campbell, A.; Wu, A.S. Multi-agent role allocation: Issues, approaches, and multiple perspectives. Auton. Agents Multi-Agent Syst. 2011, 22, 317–355. [Google Scholar] [CrossRef]
  11. Tang, S.-Y.; Zhu, Y.-F.; Li, Q.; Lei, Y.-L. Survey of task allocation in multi agent systems. Syst. Eng. Electron. 2010, 32, 2155–2161. [Google Scholar]
  12. Parker, L.; Tang, F. Building multirobot coalitions through automated task solution synthesis. Proc. IEEE 2006, 94, 1289–1305. [Google Scholar] [CrossRef]
  13. Gerkey, B.P.; Matari, M.J. Sold!: Auction methods for multirobot coordination. IEEE Trans. Robot. Autom. 2002, 18, 758–768. [Google Scholar] [CrossRef]
  14. Botelho, S.C.; Alami, R. M+: A scheme for multi-robot cooperation through negotiated task allocation and achievement. In Proceedings of the IEEE International Conference on Robotics and Automation, Detroit, MI, USA, 10–15 May 1999; Volume 2, pp. 1234–1239. [Google Scholar]
  15. Zlot, R.; Stentz, A. Complex task allocation for multiple robots. In Proceedings of the IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005; pp. 1515–1522. [Google Scholar]
  16. Oh, G.; Kim, Y.; Ahn, J.; Choi, H.-L. PSO-Based Optimal Task Allocation for Cooperative Timing Missions; Elsevier Ltd.: Amsterdam, The Netherlands, 2016; pp. 314–319. [Google Scholar]
  17. Elango, M.; Kanagaraj, G.; Ponnambalam, S.G. Ponnambalam. Sandholm algorithm with K-means clustering approach for multi-robot task allocation. Int. Conf. Softw. Process 2013, 8298, 14–22. [Google Scholar]
  18. Abukhalil, T.; Patil, M.; Patel, S.; Sobh, T. Coordinating a Heterogeneous Robot Swarm Using Robot Utility-based Task Assignment (RUTA). In Proceedings of the IEEE 14th International Workshop on Advanced Motion Control, Auckland, New Zealand, 22–24 April 2016; pp. 57–62. [Google Scholar]
  19. Fernandez-Gauna, B.; Etxeberria-Agiriano, I.; Graña, M. Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning. PLoS ONE 2015, 10, e127129. [Google Scholar] [CrossRef]
  20. Mathew, N.; Smith, S.L.; Waslander, S.L. Planning Paths for Package Delivery in Heterogeneous Multirobot Teams. IEEE Trans. Autom. Sci. Eng. 2015, 12, 1298–1308. [Google Scholar] [CrossRef]
  21. Chalkiadakis, G.; Boutilier, C. Sequentially optimal repeated coalition formation under uncertainty. Auton. Agents Multi-Agent Syst. 2012, 24, 441–484. [Google Scholar] [CrossRef]
  22. Matthews, T.; Ramchurn, S.; Chalkiadakis, G. Competing with humans at fantasy football: Team formation in large partially-observable domains. In Proceedings of the 26th AAAI Conference on Artificial Intelligence, Toronto, ON, Canada, 22–26 July 2012; pp. 1394–1400. [Google Scholar]
  23. Dasgupta, P.; Cheng, K.; Banerjee, B. Adaptive multi-robot team reconfiguration using a policy-reuse reinforcement learning approach. In Advanced Agent Technology; Springer: Berlin/Heidelberg, Germany, 2012; pp. 330–345. [Google Scholar]
  24. Liao, S.S.; Zhang, J.-D.; Lau, R.; Wu, T. Coalition formation based onmarginal contributions and the markov process. Decis. Supp. Syst. 2014, 57, 355–363. [Google Scholar] [CrossRef]
  25. Sen, S.D.; Adams, J.A. An influence diagram based multi-criteria decision making framework for multirobot coalition formation. Auton. Agents Multi-Agent Syst. 2015, 29, 1061–1090. [Google Scholar] [CrossRef]
  26. Zhu, P.; Fang, X. Multi-UAV Cooperative Task Assignment Based on Half Random Q-Learning. Symmetry 2021, 13, 2417. [Google Scholar] [CrossRef]
  27. Qie, H.; Shi, D.; Shen, T.; Xu, X.; Li, Y.; Wang, L. Joint Optimization of Multi-UAV Target Assignment and Path Planning Based on Multi-Agent Reinforcement Learning. IEEE Access 2019, 7, 146264–146272. [Google Scholar] [CrossRef]
  28. Li, M.; Liu, C.; Li, K.; Liao, X.; Li, K. Multi-task allocation with an optimized quantum particle swarm method. Appl. Soft Comput. 2020, 96, 106603. [Google Scholar] [CrossRef]
  29. Kurdi, H.; AlDaood, M.F.; Al-Megren, S.; Aloboud, E.; Aldawood, A.S.; Youcef-Toumi, K. Adaptive task allocation for multi-UAV systems based on bacteria foraging behavior. Appl. Soft Comput. 2019, 83, 105643. [Google Scholar] [CrossRef]
  30. Ma, X.; Dai, K.; Li, M.; Yu, H.; Shang, W.; Ding, L.; Zhang, H.; Wang, X. Optimal-Damage-Effectiveness Cooperative-Control Strategy for the Pursuit–Evasion Problem with Multiple Guided Missiles. Sensors 2022, 22, 9342. [Google Scholar] [CrossRef]
  31. Chen, X.; Liu, Y.; Yin, L.; Qi, L. Cooperative Task Assignment and Track Planning For Multi-UAV Attack Mobile Targets. J. Intell. Robot. Syst. 2020, 100, 1383–1400. [Google Scholar]
  32. Yoo, T.; Lee, S.; Yoo, K.; Kim, H. Reinforcement Learning Based Topology Control for UAV Networks. Sensors 2023, 23, 921. [Google Scholar] [CrossRef] [PubMed]
  33. Gao, Y.; Feng, H.; Chen, J.; Li, J.; Wei, Z. A Scalable Distributed Control Algorithm for Bearing-Only Passive UAV Formation Maintenance. Sensors 2023, 23, 3849. [Google Scholar] [CrossRef] [PubMed]
  34. Zhen, Z.; Xing, D.; Gao, C. Cooperative search-attack mission planning for multi-UAV based on intelligent self-organized algorithm. Aerosp. Sci. Technol. 2018, 76, 402–411. [Google Scholar] [CrossRef]
  35. Liu, Y.; Jiang, C.; Zhang, T.; Zhao, Z.Y.; Deng, Z.L. Multi-UAV finite-time circular formation control considering internal collision avoidance. J. Mech. Eng. 2022, 58, 61–68. [Google Scholar]
  36. Han, H.; Xu, Z.; Wang, J. A Q-learning-based Multi-task Multi-objective Particle Swarm Optimization Algorithm. Control. Decis. Mak. 2023, 1–9. [Google Scholar] [CrossRef]
  37. Huang, X.; Lei, B.; Ji, G.; Zhang, B. Energy Criticality Avoidance-Based Delay Minimization Ant Colony Algorithm for Task Assignment in Mobile-Server-Assisted Mobile Edge Computing. Sensors 2023, 23, 6041. [Google Scholar] [CrossRef] [PubMed]
  38. Vimal, S.; Khari, M.; Crespo, R.G.; Kalaivani, L.; Dey, N.; Kaliappan, M. Energy enhancement using Multi objective Ant colony optimization with Double Q learning algorithm for IoT based cognitive radio networks. Comput. Commun. 2020, 154, 481–490. [Google Scholar] [CrossRef]
  39. Fu, Y.; Shen, Y.; Tang, L. A Dynamic Task Allocation Framework in Mobile Crowd Sensing with D3QN. Sensors 2023, 23, 6088. [Google Scholar] [CrossRef] [PubMed]
  40. Zheng, L.; Yu, W.; Li, G.; Qin, G.; Luo, Y. Particle Swarm Algorithm Path-Planning Method for Mobile Robots Based on Artificial Potential Fields. Sensors 2023, 23, 6082. [Google Scholar] [CrossRef]
Figure 1. Q-Learning Algorithmic.
Figure 1. Q-Learning Algorithmic.
Sensors 23 07398 g001
Figure 2. Ant colony lost strategy.
Figure 2. Ant colony lost strategy.
Sensors 23 07398 g002
Figure 3. Schematic diagram of EAPF.
Figure 3. Schematic diagram of EAPF.
Sensors 23 07398 g003
Figure 4. Multi-UAV simulation module.
Figure 4. Multi-UAV simulation module.
Sensors 23 07398 g004
Figure 5. Stand-alone control model.
Figure 5. Stand-alone control model.
Sensors 23 07398 g005
Figure 6. Quadcopter UAV control model.
Figure 6. Quadcopter UAV control model.
Sensors 23 07398 g006
Figure 7. (a) A 3D view of the non-formation controller. (b) Distance between UAVs without formation controller. Formation time = 15.84 s.
Figure 7. (a) A 3D view of the non-formation controller. (b) Distance between UAVs without formation controller. Formation time = 15.84 s.
Sensors 23 07398 g007
Figure 8. (a) A 3D view with formation controller. (b) Distance between UAVs with formation controllers. Formation time = 12.3320 s.
Figure 8. (a) A 3D view with formation controller. (b) Distance between UAVs with formation controllers. Formation time = 12.3320 s.
Sensors 23 07398 g008
Figure 9. (a) A 3D view of the formation-free controller. (b) Distance between UAVs without formation controller.
Figure 9. (a) A 3D view of the formation-free controller. (b) Distance between UAVs without formation controller.
Sensors 23 07398 g009
Figure 10. (a) A 3D view with formation controller. (b) Distance between UAVs with formation controllers. Formation time = 16.3720 s.
Figure 10. (a) A 3D view with formation controller. (b) Distance between UAVs with formation controllers. Formation time = 16.3720 s.
Sensors 23 07398 g010
Figure 11. (a) A 3D view without formation controller. (b) Distance between UAVs without formation controller. Formation time = 17.5360 s.
Figure 11. (a) A 3D view without formation controller. (b) Distance between UAVs without formation controller. Formation time = 17.5360 s.
Sensors 23 07398 g011
Figure 12. (a) A 3D view with formation controller. (b) Distance between UAVs with formation controllers. Formation time = 13.5440 s.
Figure 12. (a) A 3D view with formation controller. (b) Distance between UAVs with formation controllers. Formation time = 13.5440 s.
Sensors 23 07398 g012
Figure 13. (a) A 3D view without formation controller. (b) Distance between UAVs without formation controller.
Figure 13. (a) A 3D view without formation controller. (b) Distance between UAVs without formation controller.
Sensors 23 07398 g013
Figure 14. (a) A 3D view with formation controller. (b) Distance between UAVs with formation controllers.
Figure 14. (a) A 3D view with formation controller. (b) Distance between UAVs with formation controllers.
Sensors 23 07398 g014
Figure 15. Map of take−off points and mission points.
Figure 15. Map of take−off points and mission points.
Sensors 23 07398 g015
Figure 16. Three–dimensional trajectory of a stand-alone mission flight.
Figure 16. Three–dimensional trajectory of a stand-alone mission flight.
Sensors 23 07398 g016
Figure 17. Three-dimensional trajectory of a three-engine mission flight.
Figure 17. Three-dimensional trajectory of a three-engine mission flight.
Sensors 23 07398 g017
Figure 18. Three-dimensional trajectory of a five-engine mission flight.
Figure 18. Three-dimensional trajectory of a five-engine mission flight.
Sensors 23 07398 g018
Table 1. Ant colony update parameters.
Table 1. Ant colony update parameters.
Ant Colony Update Parameters
αβρ
Global Search1.03.00.15
Local Search1.253.500.25
Maintenance Search1.504.00.45
Slow convergence1.754.500.65
Fast Convergence2.05.00.75
Table 2. Parameter Setting.
Table 2. Parameter Setting.
Parameter MeaningParameterValue
Finite time parameter t w 20
Finite time gain w 1.0
Learning rate α 0.2
Discount γ 0.9
Lost rate τ 0.4
Gravitational coefficient α 1 9
Repulsion coefficient β 1 13
Arrival distance d 0 0.1
Distance threshold d 1 0.5
repulsive factor γ 1 1.5
Table 3. Initial position of the drone.
Table 3. Initial position of the drone.
UAVLocation
UAV1(0, 0, 0)
UAV2(0, 1, 0)
UAV3(0, −1, 0)
UAV4(0, 2, 0)
UAV5(0, −2, 0)
Table 4. Task point information.
Table 4. Task point information.
NumberLocationNumberLocation
1(100, 50, −1.2)6(175, 0, −0.9)
2(200, 50, −0.8)7(125, 0, −1.5)
3(150, 25, −1.2)8(100, −50, −0.7)
4(150, 0, −0.7)9(150, −25, −1.0)
5(200, 0, −0.9)10(200, −50, −1.1)
Table 5. Sequence of tasks and completion dates.
Table 5. Sequence of tasks and completion dates.
Number of UAVMission SequenceTime of Mission
1T1 → T2 → T5 → T10 → T8 → T7 → T3 → T6 → T9 → T4223.584 s
3T1 → T3 → T2;
T7 → T4 → T6 → T5;
T8 → T9 → T10;
106.096 s
5T3 → T4; T2 → T5; T7 → T9; T1 → T6;T8 → T1093.312 s
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liang, Z.; Li, Q.; Fu, G. Multi-UAV Collaborative Search and Attack Mission Decision-Making in Unknown Environments. Sensors 2023, 23, 7398. https://doi.org/10.3390/s23177398

AMA Style

Liang Z, Li Q, Fu G. Multi-UAV Collaborative Search and Attack Mission Decision-Making in Unknown Environments. Sensors. 2023; 23(17):7398. https://doi.org/10.3390/s23177398

Chicago/Turabian Style

Liang, Zibin, Qing Li, and Guodong Fu. 2023. "Multi-UAV Collaborative Search and Attack Mission Decision-Making in Unknown Environments" Sensors 23, no. 17: 7398. https://doi.org/10.3390/s23177398

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop