Dynamic Task Allocation for Heterogeneous Multi-UAVs in Uncertain Environments Based on 4DI-GWO Algorithm

Huang, Hanqiao; Jiang, Zijian; Yan, Tian; Bai, Yu

doi:10.3390/drones8060236

Open AccessArticle

Dynamic Task Allocation for Heterogeneous Multi-UAVs in Uncertain Environments Based on 4DI-GWO Algorithm

¹

National Key Laboratory of Unmanned Aerial Vehicle Technology, Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an 710072, China

²

Northwest Institute of Nuclear Technology, Xi’an 710024, China

^*

Author to whom correspondence should be addressed.

Drones 2024, 8(6), 236; https://doi.org/10.3390/drones8060236

Submission received: 3 April 2024 / Revised: 27 May 2024 / Accepted: 27 May 2024 / Published: 1 June 2024

Download

Browse Figures

Versions Notes

Abstract

:

As the missions and environments of unmanned aerial vehicles (UAVs) become increasingly complex in both space and time, it is essential to investigate the dynamic task assignment problem of heterogeneous multi-UAVs aiming at ground targets in an uncertain environment. Considering that most of these existing tasking methods are limited to static allocation in a deterministic environment, this paper firstly constructs the fuzzy multiconstraint programming model for heterogeneous multi-UAV dynamic task assignment based on binary interval theory, taking into account the effects of uncertain factors like target location information, mission execution time, and the survival probability of UAVs. Then, the dynamic task allocation strategy is designed, consisting of two components: dynamic time slice setting and the four-dimensional information grey wolf optimization (4DI-GWO) algorithm. The dynamic time slices create the dynamic adjustment of solving frequency and effect, and the 4DI-GWO algorithm is improved by designing the four-dimensional information strategy that expands population diversity and enhances global search capability and other strategies. The numerical analysis shows that the proposed strategy can effectively solve the dynamic task assignment problem of heterogeneous multi-UAVs under an uncertain environment, and the optimization of fitness values demonstrates improvements of 5~30% in comparison with other optimization algorithms.

Keywords:

multiple heterogeneous unmanned aerial vehicles; dynamic task allocation; uncertain environment; four-dimensional information grey wolf optimization algorithm

1. Introduction

With the rapid rise of artificial intelligence and aircraft technology, unmanned aerial vehicles (UAVs) are playing an increasing role in engineering applications by virtue of their low cost and perfect functions [1,2]. Nowadays, the coordinated operation of multiple UAVs has developed into an important mode to adapt to more complex environments and play better application effects [3,4,5]. Among them, cooperative task allocation of multiple UAVs is one of the key technologies in the top-level design of UAV cooperative operations [6,7]. However, most of the existing UAV task allocation investigations are based on the static progress and the definite mission environment, without considering the key problems of uncertain environment and dynamic change in the actual application process.

It is not enough for multi-UAV task allocation research to stay only in static target allocation in deterministic environments. On the one hand, task allocation under synergistic multi-UAVs is inevitably a dynamic process [8]. The static task allocation scheme is unable to match the flexible and changing situation in real applications. On the other hand, if the influence of environmental parameters in the task allocation process, such as changes in the target location information, the uncertainty of the duration of the target’s execution, and the success rate of the UAV’s strikes against the ground targets, as well as the threat of the ground positions to the UAVs are not considered, it will lead to the fact that the “optimal” task allocation scheme formulated in advance will fail to obtain the desired effect or even to complete the mission [9]. Both static target assignment and deterministic environments are idealized treatments for practical task assignment problems, and it is necessary to further research dynamic task assignment for multiple UAVs under uncertain environments.

Dynamic task allocation is a multistage allocation process that is much more complex than static task allocation [10]. For one thing, changing environment information needs to be taken into account throughout the decision-making process, requiring an “observe–fire–observe” strategy. For another, the purpose of dynamic target allocation is to find the global optimal solution of the whole confrontation process, not a single stage.

Although most of the current research focuses on static task allocation, some scholars have realized the importance of dynamic task allocation and conducted research.

The work in [11] designed a compromise task assignment method for multiple UAVs operating in rescue and search scenarios under multiple constraints, where the introduction of compromised dynamic performance impact can handle new tasks that may arise during task execution to accommodate dynamic events. Considering the problems of the limitation and wastage of computational resources a multitask particle swarm optimization algorithm based on a dynamic on-demand allocation strategy was proposed to achieve dynamic allocation in the study [12]. The research in [13] proposed an improved particle swarm optimization algorithm and introduced constraints such as weapon deflagration time window to verify the effectiveness of the particle swarm algorithm in solving the weapon target allocation problem with multiple missiles. In their research on the heterogeneous UAV task allocation problem, the researchers in [14] constructed a dynamic collaborative task allocation model and then improved the contract net algorithm by using psychological coefficients, the blackboard model, as well as the buffer pool mechanism, which resulted in improved efficiency of the task allocation and more reasonable results for collaborative task allocation. Several researchers [15,16,17] proposed a multiobjective simplified swarm optimization algorithm to address the problems of long computation time and low solving efficiency in the weapon target allocation algorithm and proved its high computational efficiency and solving accuracy. Some studies [18,19] proposed a multiobjective particle swarm control allocation method based on adaptive probability guidance and increased the variation factor according to the convergence index to realize the multiobjective control allocation of the manipulation surface. For the multiarmed bandit (MAB) problem in nonstationary environments, the work in [20] utilized the predictive power of large language models (LLMs) to deal with dynamic environments and opened new avenues for applying LLMs to enhance traditional decision-making strategies in dynamic environments. The objective of this paper is to present a dynamic strategy for the allocation of tasks within the context of engineering practice for the Blue side’s multilayered defense system. Consequently, the scheme of the task allocation at each level can be effectively aligned by introducing dynamic time slices. Beyond that, the rest of the literature mentioned above addressed the allocation scheme to some extent in a dynamic process. However, there is still a lack of studies that allow for quantitative adjustment of the allocation frequency. The correlation between the level of dynamics and the efficacy of the entire system requires further characterization.

For the uncertain environment, the actual tasking problem is often faced with partial information uncertainty with the increasing number of disturbing factors in the mission environment. The information related to the heterogeneous multi-UAVs of own side can be easily obtained, but the information related to the ground targets to be attacked may be imprecise, or force majeure is encountered during the flight, which will lead to the critical parameters in the multi-UAV tasking being fuzzy numbers.

The literature on task allocation in uncertain environments is similarly limited and the mainstream approaches can be categorized into robust optimization vs. fuzzy planning. Some scholars have proposed the use of robust optimization methods to solve the problem of multi-UAV task allocation in uncertain environments. In the work in [21], a two-stage robust optimization model was proposed to solve the problem of drone scheduling under the uncertainty of demand. In the study in [22], considering the uncertainty of drone flight fuel consumption parameters, a robust optimization method was used to construct a robust equivalent of the original model, which achieves an effective balance between the robustness and the optimality of the task assignment. A robust optimization module was proposed in the research in [23] to improve the robustness of the task assignment scheme under time–cost uncertainty. However, the task allocation schemes solved by robust optimization methods suffer from the problem of over-conservatism, which makes it difficult to achieve global optimization under dynamic task allocation problems. Meanwhile, aiming at solving the problem of task assignment under uncertain conditions, the fuzzy planning method describes the uncertainty parameter in the form of a fuzzy set. In the study in [24], a fuzzy resource scheduling model with time–cost constraints was developed based on the fuzzy planning theory for the resource scheduling problem with uncertain execution time, and an improved chaotic ant colony algorithm was proposed to solve the model. In the research in [25], a multitarget fuzzy machine constraint model based on the credibility measurement theory was constructed for the vehicle routing problem with fuzzy demand and fuzzy time windows, and an improved hybrid algorithm was used to solve the model. In the study in [26], a multistage fuzzy multiobjective task allocation planning model was developed to address the fuzzy target threat in the problem of coordinated ground attack by a drone, and a Nash equilibrium solution based on a multi-strategy fusion algorithm was proposed. The fuzzy planning approach emphasizes constraint satisfaction by transforming the objective function and constraints of a fuzzy plan into a general planning model suitable for UAV tasking.

At the level of intelligent algorithmic solutions, there is a growing body of literature that applies the grey wolf optimization (GWO) algorithm to address the optimization problems associated with UAVs. The works in [27,28,29,30,31] demonstrated the efficacy of the GWO algorithm in addressing optimization issues across diverse domains. Among these, literature [27] conducted research on the specific trajectory planning problem of bridge inspection, while literature [28,29,30,31] proposed different improvement strategies for the convergence speed and search capability of the GWO algorithm. Among them, the work in [28] improved the grey wolf search strategy to enhance the convergence speed and combined it with the implementation of variability in the differential evolution algorithm to promote the search capability. And the work in [29] put forward a novel relative distance adaptation strategy with the objective of enhancing the convergence speed. This was combined with the simulated annealing algorithm for alternative position updates, with the intention of improving the search process with different capabilities. In contrast, the work in [30] employed a Gaussian variation strategy and a spiral function to enhance the search capability, and the work in [31] enhanced the convergence speed by integrating two improved grey wolf algorithms, wherein reinforcement learning is employed for policy switching.

Motivated by the above algorithm research, this paper not only proposes relevant improvements to the convergence speed and search ability of the algorithm as well but also designs a novel four-dimensional grey wolf information strategy to enhance the global optimization ability, a weak drawback of the classical GWO algorithm, by spatial coverage of the solution vector.

In conclusion, this paper addresses the dynamic task assignment problem for heterogeneous multi-UAVs in uncertain environments. Firstly, based on the binary interval number theory, a fuzzy multiconstraint planning model is constructed for the uncertain environment with the optimization objective of minimizing cost. In addition, this paper sets up dynamic time slices to realize dynamic task allocation with the adjustment of time slices. Finally, the four-dimensional information grey wolf optimization (4DI-GWO) algorithm is designed to search for the global optimal solution.

The main novelties of this study are as follows:

Aiming at the task allocation problem in uncertain environments, a fuzzy multiconstraint planning model is constructed based on binary interval number theory with a series of uncertainties, such as deviations in the target location information, the uncertainty of the duration of the target’s execution, and so on.
The dynamic task allocation is carried out when multi-UAVs detect and attack against ground targets. The time factor in dynamic task allocation is fully considered, based on which time slices are designed to adjust the number of assignments according to the decision maker’s preference.
A four-dimensional information grey wolf optimization algorithm is proposed that greatly enhances the global search capability and the initial packaged information of the community of grey wolves with adaptive tuning. The residual performances of the algorithm, such as convergence speed, are ensured by the remaining improvement strategies.

The remainder of this paper is organized as follows: the dynamic task allocation problem in uncertain environments is defined and described in Section 2. The dynamic allocation method based on the 4DI-GWO algorithm for heterogeneous multi-UAVs is described in Section 3. In Section 4, simulations are conducted to validate the algorithms and methods. Finally, we conclude the paper in Section 5.

2. Task Allocation Model for Uncertain Environments

2.1. Problem Description

This paper examines the task allocation of multiple heterogeneous UAVs for coordinated missions against ground targets in an uncertain environment. The scenario considered is a real application environment where target-related parameters, such as the target location information, the mission sustainment time, and the probability of UAV survival, are uncertain. In the proposed scenario, there are no no-fly zones, terrain obstacles, or sudden threats. All UAVs fly at the same altitude and have access to the same information. The task assignment scheme must be optimized to minimize task failure rates and improve scheme robustness with multiple constraints under these uncertainties. Figure 1 illustrates the confrontation between Red’s heterogeneous multi-UAV swarms and Blue’s ground targets.

There are three types of UAVs, including Reconnaissance UAVs, Strike UAVs, and Reconnaissance–Strike UAVs, with a total number of

M

,

U = [U_{1}, U_{2}, \dots, U_{M}]

. Reconnaissance UAVs

U^{R}

are designed solely for targeted reconnaissance missions, while Strike UAVs

U^{S}

are designed solely for targeted strike missions. Reconnaissance–Strike UAVs

U^{R S}

are capable of performing both reconnaissance and strike missions. The attributes of the ith UAV are represented by the multinomials

〈{Uid}_{i}, {Utype}_{i} {, Uval}_{i} {, Usp}_{i} {, Upos}_{i} {, Uv}_{i}〉

, which represent the code, the type, the value of the UAV itself, the probability of striking the target, the position of the UAV, and the cruising speed of the UAV. The cruising speed of the UAV is set to be constant during the mission. In this paper, it is assumed that both the Reconnaissance–Strike and Strike UAVs carry enough strike resources to accomplish the strike mission for all targets as well.

With respect to the target parameters, this paper sets

N

targets,

T = [T_{1}, T_{2}, \dots, T_{N}]

, in a two-dimensional uncertain mission environment. Each target requires a detection mission and an attack mission, i.e., there are 2N missions in the mission environment. The multivariate group

〈{Tid}_{j} {, Ttype}_{j}, {Tval}_{j} {, Tsp}_{j} {, Tpos}_{j} {, Ct}_{j}^{R} {, Ct}_{j}^{S}〉

is used to represent the relative parameters of the jth ground target, including the code, the type, the intrinsic value, the probability that the target destroys the UAV, the coordinates’ position, the duration of the reconnaissance mission, and the duration of the strike mission. In this paper, it is assumed that if one wants to obtain the corresponding benefit of destroying the target, as demonstrated in Equation (5), it can only be achieved after the target is destroyed.

2.2. Fuzzy Variables Handling

In a complex application environment, the Red side can easily obtain information about heterogeneous UAVs. Still, the information about the Blue targets to be attacked may be imprecise, including the relative distance,

D

, between the UAVs and the targets, the time to execute the reconnaissance or strike mission,

C t^{R}, C t^{S}

, and the probability,

S p

, of UAV survival during the mission. Compared with the clear mission environment, the above imprecise key parameters make it difficult to solve the mission assignment problem accurately. Therefore, this paper abstracts uncertain variables into interval numbers, and a fuzzy-constrained planning model is constructed based on binary interval number theory.

\{\begin{cases} D = [D^{L}, D^{U}] \\ C t^{R} = [C t^{R L}, C t^{R U}] \\ C t^{S} = [C t^{S L}, C t^{S U}] \\ S p = [S p^{L}, S p^{U}] \end{cases}

(1)

where

D^{L}, D^{U}, C t^{R L}, C t^{R U}, C t^{S L}, C t^{S U}, S p^{L}, S p^{U} \in R

,

D^{L} < D^{U}, C t^{R L} < C t^{R U}, C t^{S L} < C t^{S U}

, and

S p^{L} < S p^{U}

.

Interval number is a powerful tool for describing uncertainty information, which has been widely used in the fields of multiattribute uncertainty decision and fuzzy control. When describing uncertainty information using interval numbers, it is necessary to design an interval number ordering criterion in addition to solving the basic interval number synthesis problem. Therefore, this paper applies the interval ordering method based on the possibility degree, which defines a measure that reflects the extent to which one interval is larger than another and derives an ordering between intervals based on this measure. The specific sorting rules are as follows.

Given the interval numbers

a = [a^{L}, a^{U}]

and

b = [b^{L}, b^{U}]

,

p (a \geq b)

represents the degree of likelihood that

a

is greater than or equal to

b

. This degree of likelihood is a measure of the relationship between the two intervals, and it generally takes a value between 0 and 1. The value is determined according to the definition of different likelihood formulas. The intervals

a

and

b

obey a uniform distribution and their values are independent of each other, so calculating the probability

p (a \geq b)

is equivalent to calculating the probability that

u > v

, where

u

and

v

represent random values in the intervals

a = [a^{L}, a^{U}]

and

b = [b^{L}, b^{U}]

. The results of the calculation are as follows:

p (a ⩾ b) = \{\begin{cases} 1 b^{L} \leq b^{U} \leq a^{L} \leq a^{U} \\ 1 - \frac{{(b^{U} - a^{L})}^{2}}{2 L (a) L (b)} b^{L} \leq a^{L} < b^{U} \leq a^{U} \\ \frac{(a^{L} + a^{U} - 2 b^{L})}{2 L (b)} b^{L} \leq a^{L} \leq a^{U} < b^{U} \\ \frac{(2 a^{U} - b^{U} - b^{L})}{2 L (a)} a^{L} \leq b^{L} \leq b^{U} \leq a^{U} \\ \frac{{(a^{U} - b^{L})}^{2}}{2 L (a) L (b)} a^{L} < b^{L} \leq b^{L} \leq b^{U} \\ 0 a^{L} < a^{U} < b^{I} < b^{U} \end{cases}

(2)

where

L (a) = a^{U} - a^{L}, L (b) = b^{U} - b^{L}

.

In addition, the binary interval theory is used later in the optimization part of the 4DI-GWO algorithm to compute fitness while updating the grey wolf position. Specifically, when calculating the fitness value, it is first divided into the upper and lower fitness bounds and computed by the corresponding parameter upper and lower bounds, respectively. The final fitness value is then weighted and summed to obtain the final fitness output. In the updating position operation, the degree of probability of the interval number order is used as the foundation for selecting candidate solutions and updating iterations.

2.3. Benefit and Cost Functions

The cost of destroying targets by heterogeneous multi-UAVs consists of two aspects: (1) the cost of the existing threat while UAVs performing missions,

C_{1}

; (2) the cost of the UAVs’ range to complete own mission sets,

C_{2}

.

If the survival probability of the

i th

UAV after passing the

j th

target is

S p_{i}

and

T s p_{j}

is the probability that the

j th

target destroys the UAV, it will be available to

T s p_{j} = 1 - S p_{i}

. Therefore, the threat cost of a single

i th

UAV operating with

m

missions is

C_{1} = Uva 1_{i} [1 - \prod_{j = 1}^{m} (1 - T s p_{j})]

(3)

Assuming that obstacles are not taken into account, the UAV will tend to perform tasks closer to its position, resulting in lower fuel consumption. The range cost can be expressed as

C_{2} = \frac{D_{i, j}}{D_{\max, j}}

(4)

where

D_{\max, j}

is the farthest distance of all UAVs from the

j th

task, namely

D_{\max, j} = \max_{i \in U} (D_{i j})

.

The benefit of destroying a target by heterogeneous multi-UAVs refers to the value of destruction caused to the target, which is defined as the value of the target and the probability of destruction, the magnitude of which reflects the importance of the target and the execution capability of the UAV, and which can guide the final allocation result to maximize the mission effectiveness during decision optimization. Let the probability of destruction of the

i th

UAV performing the

j th

mission be

{Usp}_{i}

, then the benefit of destroying the target is

C_{3} = {Tval}_{j} {Usp}_{i}

(5)

2.4. Task Allocation Model

The task allocation problem of heterogeneous multi-UAVs with uncertain environments is modeled as a fuzzy multiconstraint model. First, the problem is transformed into a single-objective fuzzy constraint model by the linear weighted sum method, and the linear scale transformation method is used to transform each of the quantities into a value within the set of [0, 1]. Different weight vectors are set to balance the influence of each factor on the allocation results, where weights

ω_{1}

,

ω_{2}

, and

ω_{3}

represent the weight of the threat cost

C_{1}

of attacking the target, the weight of flight cost

C_{2}

, and the weight of gain

C_{3}

of destroying the target, respectively. The different weights reflect the different decision preferences of commanders and decision-makers. Therefore, based on the fuzzy theory, the fuzzy constraint model for the heterogeneous multi-UAVs’ task allocation problem in the above uncertain environment is as follows:

\min f = \sum_{i = 1}^{M} \sum_{j = 1}^{N} \sum_{h = 1}^{2} (ω_{1} \frac{C_{1}}{\max_{i \in U} {Uval}_{i}} + ω_{2} C_{2} - ω_{3} \frac{C_{3}}{\max_{j \in T} {Tval}_{j}}) x_{i j h}

(6)

st \sum_{i = 1}^{M} \sum_{j = 1}^{N} \sum_{h = 1}^{2} x_{i j h} = 2 N, \forall i \in U, \forall j \in T

(7)

\sum_{i = 1}^{M} x_{i j h} = 1, \forall j \in T, h \in {1, 2}

(8)

V_{i} \leq V_{\max i}, \forall i \in U

(9)

x_{i j h} \in \{\begin{matrix} 0, 1 \end{matrix}\}, \forall i \in U, \forall j \in T, h \in \{\begin{matrix} 1, 2 \end{matrix}\}

(10)

\sum_{j = 1}^{N} \sum_{h = 1}^{2} | T_{j h} | = 2 N

(11)

C t_{j}^{R} < C t_{j}^{S}

(12)

Equations (7)–(12) define the specific constraints for the execution of each target’s detection and attack mission. Equation (7) ensures that each mission can be executed, while Equation (8) ensures that each mission can only be executed once. Equation (9) specifies the range constraints of each UAV, where

V_{\max i}

represents the maximum range of the

i th

UAV. Equation (10) defines the value of the decision variable

x_{i j h}

. In Equation (11),

T_{j h}

is the

h th

type of mission

(h = 1, 2)

for the

j th

target, with

h = 1

indicating a reconnaissance mission and

h = 2

indicating a strike mission, which can only be carried out after the reconnaissance mission has been completed for each target. In Equation (12),

C t^{R}, C t^{S}

are the moments when the UAVs swarm conduct reconnaissance or strike missions against the

j th

target, respectively. From the perspective of engineering application, the reconnaissance UAV should be assigned to carry out reconnaissance on the

j th

target first, and then the striking UAV should be assigned to carry out a destruction mission on it. Therefore, there exists a chronological sequence when the reconnaissance/strike mission is assigned to the same target, as shown in Equation (12).

3. Dynamic Task Allocation Based on 4DI-GWO Algorithm

This paper presents a dynamic task assignment for heterogeneous multi-UAVs in uncertain environments. Section 2 elaborates on the handling of uncertain environments, while Section 3 investigates the concrete dynamic task allocation strategy. The technology roadmap of this paper is outlined below.

The bottom half of Figure 2 shows the dynamic task allocation algorithm based on 4DI-GWO. For achieving dynamic allocation of heterogeneous multi-UAVs, dynamic time slices are set. With respect to the optimization algorithm, multiple strategies are combined to compose the 4DI-GWO algorithm with improved performance. The specifics of the algorithm are demonstrated in the subsequent subsection.

3.1. Dynamic Time Slices Setting

Dynamic tasking is an improvement on static tasking that takes into account the dynamics of the mission progress. It is a multistage problem, where the results of the current phase influence the allocation scheme of the subsequent phases. The decision-making concept can be summarized as a continuous cycle of “observe–shoot–observe”. During each observation, it is essential to identify the current target and weapon. This information is used to allocate and strike targets. After the strikes have been carried out, another observation is conducted to determine the target and weapon for the next stage. This process is repeated until the end of the cycle.

In uncertain environments, heterogeneous multi-UAVs are inherently dynamic when performing tasks. A clear sequential relationship exists between reconnaissance and strike missions for ground targets in heterogeneous UAV swarms, with reconnaissance preceding the strike. Additionally, the UAV swarm’s reconnaissance and strike range is limited, and it will inevitably adopt a mission mode from the outside to the inside layer by layer. The success of the previous phase’s strikes and the availability of drone information for the next phase must be dynamically confirmed.

The core of UAVs’ dynamic assignment lies in the division of phases, which directly affects the specific information of the attacking and defending parties involved in this phase. This paper introduces dynamic time slices by dividing the complete process into different phases in the time dimension. Each time slice corresponds to an allocation phase, and the information of the attacking and defending parties within the phase is obtained through computation. The specific dynamic task allocation process is shown in Figure 3.

Figure 3 above illustrates the cyclic process of dynamic task allocation for heterogeneous multi-UAVs. The fuzzy task execution time is divided into a certain number of time slices

w

, which are then connected to form a complete mission process. The number of time slices

w

also determines the number of dynamic assignments.

3.2. Basic GWO Algorithm

The grey wolf optimization (GWO) algorithm [32] is a new type of group intelligence optimization algorithm that simulates the social hierarchy and hunting behavior of grey wolves with the advantages of simple implementation, fast convergence, and strong global search capability. The GWO algorithm treats each grey wolf in a wolf pack as a potential solution and solves the problem through several stages, including initial grading, encircling, hunting, attacking, and so on.

During the initial phase of the GWO algorithm, a set of

N

grey wolves are randomly distributed within a specified range of search domains. The fitness value of each grey wolf

X_{i} (t)

is denoted as

f (X_{i} (t))

. The three grey wolves with the highest fitness values are labeled as

α, β

and

δ

, while the remaining wolves are labeled as

ω

. The model for the encirclement behavior of the grey wolves during the moving process is as follows:

\{\begin{cases} A = 2 a (t) r_{1} - a (t) \\ C = 2 r_{2} \\ a (t) = 2 - 2 t / MaxIter \end{cases}

(13)

where

A

and

C

are the coefficients and the control parameter

a (t)

decreases linearly from 2 to 0 during the iteration process.

r_{1}

and

r_{2}

are random vectors in [0, 1].

MaxIter

is the maximum number of iterations. Then, the distances between an individual grey wolf and

α

,

β

,

δ

wolves in the wolf pack are calculated and combined to determine the direction in which the individual moves toward the prey to attack.

\{\begin{matrix} D_{a} = C_{1} X_{a} - X (t) \\ D_{β} = C_{2} X_{β} - X (t) \\ D_{δ} = C_{3} X_{δ} - X (t) \end{matrix}

(14)

\{\begin{cases} X_{1} (t) = X_{a} (t) - A_{1} D_{a} (t) \\ X_{2} (t) = X_{β} (t) - A_{2} D_{β} (t) \\ X_{3} (ι) = X_{δ} (t) - A_{3} D_{δ} (t) \end{cases}

(15)

where

D_{α}, D_{β}, D_{δ}

denote the distance between the current candidate grey wolf and

α

,

β

,

δ

wolves.

X_{α}, X_{β}, X_{δ}

are the positions of

α, β, δ

wolves, and

C_{1}, C_{2}, C_{3}

as well as

A_{1}, A_{2}, A_{3}

are obtained from Equation (13).

After a certain number of rounds of iteration and position updates for individuals, the wolf initiates the attack and successfully completes the hunt. The value of

a

decreases gradually as the wolves approach their prey, ultimately resulting in a successful hunt and obtaining the optimal solution.

3.3. Improvement Strategies of 4DI-GWO

Based on the analysis in Section 3.1, the GWO algorithm possesses faster convergence speed through directional optimization, but at the same time, it is easy to fall into the local optimum [33,34], which is unacceptable when dealing with uncertain, highly dynamic, and complex heterogeneous UAV dynamic task allocation problems. It is necessary to design a series of improvement strategies to enhance the algorithm’s performance during practical applications and guarantee global optimization of the allocation scheme.

To improve algorithm performance and optimization, the four-dimensional information grey wolf optimization (4DI-GWO) algorithm is proposed in this paper. Firstly, the algorithm expands the information carried by individual grey wolves into four dimensions. It can dramatically improve population diversity while ensuring the computational effort of the optimization by solving the information in three-dimensional space and introducing a time dimension to realize adaptive adjustment of grey wolf information with the number of iterations. Secondly, the nonlinear factorial convergence strategy is used to adjust the control parameters in the algorithm to improve its convergence speed. The search direction is then controlled by a weighted combination of position-updating strategies. Finally, mutation vectors are introduced based on the mutation factor strategy to improve the global search ability of the algorithm and avoid falling into the local optimum. The proposed algorithm exhibits excellent performance and can handle dynamic, uncertain, and complex optimization problems by enhancing the aforementioned strategies at various levels.

3.3.1. Four-Dimensional Information Strategy

To improve global optimization search capability, this paper proposes a four-dimensional information strategy. The grey wolf’s one-dimensional information is extended to three-dimensional space, significantly expanding the initial population. This expansion maximizes the coverage of the initial population in the solution space, enhancing the algorithm’s global search ability. Additionally, the disorderly growth of the population is prevented by adaptively adjusting in the time dimension.

During population initialization, the location of each grey wolf is represented in a four-dimensional coordinate system, namely:

x_{p} = i I_{p} + j J_{p} + k K_{p} + t T_{p}

(16)

The four-dimensional information of the

p th

grey wolf individual,

x_{p}

, is expressed as a vector

(I_{p}, J_{p}, K_{p}, T_{p})

.

i, j, k

represent the three orientations of the grey wolf individual in the spatial coordinate system, and

t

represents its orientation in the time dimension, which is equal to the current number of iterations. The coordinates corresponding to the three axes of the grey wolf individual in the spatial coordinate system are represented by

I_{p}, J_{p}, K_{p}

, and the time-consuming threshold of the individual in the time dimension is represented by

T_{p}

,

T_{p} > 0

. In the spatial coordinate system, the function’s variable interval is set to

[A_{L}, B_{L}]

,

L = 1, 2, \dots, M

where

A_{L}

and

B_{L}

are constants. The mode value

ρ_{L}

, pitch angle

ϑ

, and yaw angle

ψ

are randomly generated. The relationships between the mode value

ρ_{L}

, pitch angle

ϑ

, yaw angle

ψ

and the four-dimensional information

(I_{p}, J_{p}, K_{p}, T_{p})

are as follows:

ρ_{L} = [0, \frac{B_{L} - A_{L}}{2}], ϑ = [- 2 π, 2 π], ψ = [- 2 π, 2 π]

(17)

i I_{p} + j J_{p} + k K_{p} + t T_{p} = (ρ_{L} (i \sin θ \cos ϕ + j \sin θ \sin ϕ + k \cos θ)) / (1 + e^{t - T_{p}})

(18)

A total of

N

grey wolf information is obtained in 3D space, which is adaptively adjusted by the current number of iterations

t

. Equation (18) not only characterizes the expansion of grey wolf information from one-dimensional to three-dimensional space but also realizes the adaptive adjustment of population information according to the number of iterations as shown in Figure 4.

The grey wolf information in the three directions is updated according to Equation (19), where the update is performed in the same way in all three directions, namely

H = i, j, k

:

\{\begin{cases} D_{H α} = |C_{1} X_{H α} - X_{H}|, D_{H β} = |C_{2} X_{H β} - X_{H}|, D_{H δ} = |C_{3} X_{H δ} - X_{H}| \\ X_{H 1} = X_{H α} - A_{1} D_{H α}, X_{H 2} = X_{H β} - A_{2} D_{H β}, X_{H 3} = X_{H δ} - A_{3} D_{H δ} \\ X_{H} (t + 1) = (X_{H 1} + X_{H 2} + X_{H 3}) / 3 \end{cases}

(19)

where

C_{m} = 2 r_{2}, A_{m} = 2 a r_{1} - a (m = 1, 2, 3)

. Then, when calculating the fitness function value, the coordinate values are decoded and converted into real numbers as follows:

ρ_{n} = \sqrt{X_{I n}^{2} + X_{J n}^{2} + X_{K n}^{2}}, n = 1, 2, \dots, N / 3

(20)

R V_{n} = ρ_{n} sgn (\sin (\frac{\sqrt{X_{I n}^{2} + X_{J n}^{2}}}{ρ_{n}})) + \frac{B_{L} + A_{L}}{2}, n = 1, 2, \dots, N / 3

(21)

where

ρ_{n}

denotes the

n th

dimensional module,

X_{I n}, X_{J n}, X_{K n}

are the gene information in the three directions, and

R V_{n}

is converted into the real variable.

The proposed strategy improves global optimality by increasing population diversity, which also avoids problems related to curse of dimensionality and excessive computation when solving complex optimization problems.

3.3.2. Nonlinear Factor Convergence Strategy

The GWO algorithm relies heavily on the control parameter

a

, which directly affects the value of parameter A and plays a crucial role in regulating the algorithm’s global and local optimal search. However, the base algorithm sets the control parameter to decrease linearly, which contradicts the algorithm’s nonlinear convergence process. To enhance the algorithm’s performance, a nonlinear control parameter is introduced as follows:

a = a_{0} (2 - \ln (1 + (e^{2} - 1) t / MaxIter))

(22)

where

a_{0}

is the initial constant value of the control parameter, e is the base of the natural logarithm,

t

is the current number of iterations, and

MaxIter

is the maximum number of iterations. This nonlinear decreasing search method can not only achieve a better global search by decreasing the degree of decay of

a

value in the early stages but also accelerate the degree of decay in the later stages of the search to obtain a better local search. The algorithm’s ability to balance global and local search is improved by the nonlinear factor convergence strategy.

3.3.3. Weighted Combinatorial Position Update Strategy

To enhance the algorithm’s convergence speed, this paper utilizes the weighted combinatorial position update strategy. By comparing the fitness value of the current

n th

individual,

f_{n}

, with the average fitness value of the grey wolf pack,

f_{a v g}

, the different update strategies are applied. If

f_{n}

is superior to

f_{a v g}

, we continue to use the original strategy. Meanwhile, if

f_{n}

is inferior to

f_{a v g}

, we changed the grey wolf’s position using Equation (23) as follows:

X_{H} (t + 1) = \{\begin{cases} (X_{H 1} \frac{f_{add} - f_{n}}{f_{add} - f_{α}} + X_{H 2} \frac{f_{add} - f_{n}}{f_{add} - f_{β}} + X_{H 3} \frac{f_{add} - f_{n}}{f_{add} - f_{δ}}) / S, f_{n} \leq f_{a v g} \\ (X_{H 1} + X_{H 2} + X_{H 3}) / 3, f_{n} > f_{a v g} \end{cases}

(23)

where

f_{a d d}

is the sum of the maximum and minimum fitness values,

f_{a d d} = f_{α} + f_{\min}

, and

S

is the sum of the coefficients with respect to

X_{H 1}, X_{H 2}, X_{H 3}

.

f_{α}

is the optimal adaptation value,

f_{β}

is the suboptimal adaptation value, and

f_{δ}

is the sub-suboptimal adaptation value, respectively. It should be noted that the weighted combinatorial position update strategy is an improvement over the four-dimensional information strategy in terms of the position update strategy outlined in Equation (19). Given the differing objectives of the above two strategies, one focused on the expansion of the initial population and the other on improving convergence speed, it is important to differentiate between the two and clarify this point.

3.3.4. Mutation Operator Introducing Strategy

To enhance global search capability, we introduce the first mutation operator and the second mutation operator into the 4DI-GWO algorithm.

The first mutation operator:

The mutation vectors in the first mutation operator are selected randomly from the population to explore the search space around the optimal solution.

X_{m u t 1}^{i} = w (3 X_{α}^{i} - K_{1} X_{r 1}^{i} - K_{2} X_{r 2}^{i}) + (1 - w) K_{3} X_{r 3}^{i}

(24)

where the first mutation operator

X_{m u t 1}^{i}

is generated based on Equation (24).

X_{α}^{i}

represents the optimal position, while

w

represents the optimal solution weight, taking values between 0 and 1.

X_{r 1}^{i}, X_{r 2}^{i}, X_{r 3}^{i}

are three randomly selected mutation positions, and

K_{1}, K_{2}, K_{3}

are the corresponding coefficients, all taking values between 0 and 1.

The second mutation operator:

The second mutation operator searches in various directions using different mutation vectors to converge toward a globally optimal solution.

X_{m u t 2}^{i} = \frac{\sum_{j = 4}^{C} X_{r j}^{i}}{N - 4} + K_{4} (\sum_{j = 4}^{C} X_{r j}^{i} - (C - 4) X_{r 4}^{i})

(25)

where the second mutation operator

X_{m u t 2}^{i}

is generated based on Equation (25), the constant value

C

is taken as an integer greater than 4, namely

C \in (4, + \infty] \subset N

,

X_{r 4}^{i}

and

X_{r j}^{i}

are the mutation positions selected randomly, namely

j \in [4, C]

, and the fourth coefficient

K_{4}

takes values between 0 and 1 as well.

The first mutation operator and the second mutation operator represent mutations of the position near the optimal solution and the global random position, respectively. These mutations effectively improve the weak global search capability of the basic GWO algorithm.

3.4. Solution Process of 4DI-GWO

Figure 5 illustrates the flow of the 4DI-GWO algorithm for solving the heterogeneous multi-UAV tasking problem in an uncertain environment. The improvements made in this paper are highlighted in the dotted box and summarized in the following steps.

Step 1: The algorithm parameters are initialized by determining the dimension of the grey wolf position and the upper and lower boundaries of the search field based on the current information on UAVs and targets. The number of grey wolves $N$ in the search space and the maximum number of iterations $MaxIter$ are also determined. $N$ grey wolves are then randomly distributed in the given search field and the relevant control parameter $a_{0}$ is initialized.
Step 2: Expanding the grey wolf population using the four-dimensional information strategy not only increases the number of potential solutions but also allows for adaptive adjustment of the solution space with each iteration.
Step 3: Calculate the fitness of each grey wolf in the population and determine the current optimal solution $X_{α}$ , optimal solution $X_{β}$ , and suboptimal solution $X_{δ}$ .
Step 4: The control parameters $a$ are adaptively adjusted based on the nonlinear factor convergence strategy, which leads to the calculation of coefficients $A$ and $C$ .
Step 5: Determine the adaptive inertia weights and learn the optimal search direction of the population to obtain the learned solution $X (t + 1)$ based on the weighted combinatorial position update strategy.
Step 5: To jump out of the local optimum, two different mutation factors are applied according to the mutation operator’s introducing strategy.
Step 6: Check if the algorithm meets the termination condition. If it does, output the current optimal position and decode the algorithm to obtain the optimal heterogeneous multi-UAV tasking scheme under the uncertain environment described above. If not, proceed to Step 2 to continue the search.

4. Discussion

In this section, the effectiveness and superiority of the proposed dynamic task allocation strategy in an uncertain environment are verified through numerical simulation. Experiment 1 confirms that the suggested approach works well in an ambiguous setting. Experiment 2 uses several sets of simulations with varying loads to confirm the strategy’s robustness. Experiment 3 confirms the superior algorithmic performance of 4DI-GWO by comparing the simulations of several algorithms under challenging circumstances. Experiment 4 uses multiple tests to confirm the strategy’s dynamic allocation performance. Experiment 5 validates the efficacy of the strategy in varying levels of uncertainty environments.

The parameters of the algorithm are set as follows: MaxIter = 500; Population size PS = 30. All algorithms and test programs are simulated using MATLAB 2021a in this study, and the hardware information is Intel (R) Core (TM) i5-10300H CPU@ 2.50 GHz (Intel, Santa Clara, CA, USA), RTX 2060 14 GB, DDR4 16 GB, 512 GB SSG. Symbols and simulation parameters are summarized in Appendix A.

Experiment 1

To test the 4DI-GWO algorithm’s ability to solve the dynamic task allocation scheme for heterogeneous multi-UAVs with uncertain information, such as mission execution time, relative distances of UAV targets, and the survival probability of the UAVs, this paper assumes four UAVs conduct reconnaissance and strike missions against 10 enemy targets in a 10 km × 10 km mission area, namely M = 4, N = 10. The dynamic time slice is set to w = 1. The UAV and mission attribute tables are shown in Table 1 and Table 2, respectively.

In Table 1, RS in UAV type stands for the Reconnaissance–Strike UAV. The information on the UAVs is defined using real numbers considering the existing UAV technology with interaircraft networking and communication capabilities.

In uncertain environments, parameters related to the target are expressed as binary interval numbers. These parameters include the probability of a ground target striking the UAV, the geographic location of the target, and the time required to perform different tasks. The fitness change curve for the 4DI-GWO algorithm-solving process is depicted in Figure 6. Table 3 displays the results of task allocation for heterogeneous multi-UAVs.

As shown in Figure 6, the algorithm’s convergence is fast due to the introduction of the four-dimensional information strategy, the nonlinear factor convergence strategy, and the weighted combinatorial position update strategy. A feasible solution can be obtained at iteration number t = 26. Additionally, by incorporating the mutation operator strategy, the 4DI-GWO algorithm is capable of escaping local optimization multiple times during the later iterations (t = 150 and t = 180), resulting in improved global optimization capabilities.

Table 3 shows the mission sequence and flight information for each UAV. The ground targets were all assigned and executed in the order of reconnaissance followed by the strike. Additionally, the high-value target with the highest value was assigned to two Reconnaissance–Strike UAVs with better capability, which is desirable.

In summary, this experiment shows that the 4DI-GWO algorithm is feasible for solving the heterogeneous multi-UAVs’ task assignment problem under conditions of uncertain target location, uncertain target strike probability, and uncertain mission execution time.

Experiment 2

To evaluate the impact of changes in the number of UAVs and targets on the performance of the 4DI-GWO algorithm, this paper conducts tests by manipulating the number of UAVs and targets separately. Specifically, the number of UAVs is increased while keeping the number of targets constant, and vice versa. Experiment 2 allows us to assess the performance of the 4DI-GWO algorithm under different conditions. Figure 7a,b shows the curve of the fitness value during the solution process.

Figure 7a shows that as the number of UAVs increases, the fitness value also increases at the beginning of the iteration. However, as more UAVs are added, more possibilities arise, causing the blue solid line to converge to a lower fitness value through more rounds of iterative optimization, namely more UAVs could result in a better allocation scheme. Furthermore, discounting the late jump out of the local optimum, the convergence times of all three states are approximately equal. This indicates that the 4DI-GWO algorithm is not only effective but also highly robust in terms of changes in UAVs.

Figure 7b shows that the convergence speed of the 4DI-GWO algorithm decreases gradually as the number of targets increases. More targets result in more flight attrition and unknown strike risks. In addition, the average number of tasks assigned to each UAV also increases, increasing both the task list completion time and the flight cost of the UAV. Therefore, the final fitness value of the proposed algorithm also increases with the number of targets.

In general, Figure 7a,b demonstrates that the 4DI-GWO algorithm can effectively solve task allocation problems in uncertain environments with favorable convergence speed, regardless of the number of targets or UAVs. The proposed 4DI-GWO algorithm can obtain feasible solutions under all the above settings, and the algorithm’s generalization performance is confirmed to a certain extent.

Experiment 3

To determine the superiority of the proposed algorithm, the most complex and challenging task setup is chosen in this experiment, consisting of 10 UAVs and 16 ground targets. The proposed 4DI-GWO algorithm and other classical optimization algorithms, namely PSO [35], GWO [32], IGWO [33], GA [36], and ACO [37], are independently run 100 times under the above task loads.

The average values of the final mission assignment results are statistically recorded in Table 4. The best results are shown in bold, where BST, AVG, WST, STD, and AVGIter denote the optimal, average, worst, standard deviation of the fitness value, and the number of iterations to obtain the first feasible solution obtained by the algorithm in 100 randomized runs. AVGDis and AVGComTime are the average of all UAV travel distances and the average of mission list completion times in 100 randomized runs, respectively.

Table 4 shows that the proposed 4DI-GWO algorithm outperforms the other four algorithms under difficult task load settings in terms of the best fitness value (BST), average fitness value (AVG), and worst fitness value (WST), demonstrating its superior optimization capability. The bolded values in the Table 4 are the optimal values for each performance index. Although the number of iterations at the moment of convergence is slightly higher for the 4DI-GWO algorithm than for GWO and IGWO, this is due to the fact that 4DI-GWO searches over a wider range and has more initial solution vectors. In other words, only a few more iterative searches are needed to carry out a better global search over a wider range. Such a trade-off is perfectly acceptable. Notably, in some instances, algorithms, except for IGWO and 4DI-GWO, may not be able to locate the viable solution due to the excessively large worst fitness value. Additionally, the standard deviation (STD) of the 4DI-GWO algorithm’s adaptation is lower than that of the IGWO algorithm, which demonstrates the stability of the 4DI-GWO. Additionally, while the 4DI-GWO algorithm does not have the smallest number of iterations before stabilization, it validates the effectiveness of the proposed algorithm’s strategy for escaping local optima and its robust global convergence capability. The allocation schemes produced by the proposed algorithm outperform those of other algorithms in terms of both flight range and flight time.

Figure 8 shows the variation curves of the average fitness values of the six algorithms under the above task load setting. It is evident that the PSO algorithm converges to local optimality early and has a higher adaptation value than the other algorithms due to its inability to find a feasible solution. Similarly, the ACO algorithm also faces challenges with its global search ability. Although the GA algorithm is less likely to fall into local optimality, its optimization speed is slower. Compared with other intelligent algorithms, the classical GWO algorithm has strong optimization ability on its own. The IGWO algorithm, which adds the DLH search strategy, has a faster convergence speed and achieves a lower final fitness value. This paper proposes the 4DI-GWO algorithm, which not only achieves a convergence speed similar to that of the IGWO algorithm in the early stages through the combination of various improvement strategies but also exhibits a strong global search capability. Additionally, the final average adaptability value is lower than that of other algorithms.

It is well demonstrated that the 4DI-GWO algorithm, which incorporates multiple improvement strategies, outperforms the other five algorithms in solving the heterogeneous multidrone task assignment problem in uncertain environments.

Next, this paper evaluates the time and space resources required by the 4DI-GWO algorithm to process the problem by extrapolating the computational complexity, a crucial indicator of algorithm efficiency.

In terms of time complexity, the time complexity of generating the initial population in Steps 1–2 is O(N). Assuming that the time complexity of the performance metric f is T(f), the time complexity of solving the individuals in Step 3 is O(NT(f) + N). After that, the time complexity of updating the iterative optimal solution in Steps 4–6 is O(MaxIter*N*T(f)). So, the time complexity of 4DI-GWO is O(n*N) + O(N*T(f) + N) + O(MaxIter*N*T(f)) = O(MaxIter*N*T(f)), where MaxIter is the maximum number of iterations and N is the population size. The time complexity of the proposed algorithm is equivalent to that of the classical GWO algorithm, PSO algorithm, and GA algorithm and superior to that of the ACO algorithm because of the ACO’s further node information updating.

The spatial complexity of the 4DI-GWO algorithm refers to the storage space required during algorithm operation. Specifically, the 4DI-GWO algorithm typically only needs to store the current best solution, candidate solutions, and some data structures related to the search and optimization process. In implementation, the spatial complexity of the 4DI-GWO algorithm is approximately O(3N), where 3N signifies the candidate solution’s expansion to three dimensions as a consequence of the four-dimensional grey wolf information strategy. While the transformation does not significantly alter the solution time, it does affect the storage space. Although the space complexity of the proposed algorithm is larger than that of the classical GWO algorithm, it remains within an acceptable range.

Experiment 4

To verify the effect of the number of dynamic time slices on the effectiveness of the task assignment scheme and the flight condition, this experiment chooses to change the number of dynamic time slices and performs the simulation verification based on the eight abovementioned types of task loads. Figure 9a–h shows the three fitness values and task completion times for 10 random runs of the 4DI-GWO algorithm with different numbers of time slices and different task loads.

Figure 9a–h shows that as the number of time slices increases, the time cost decreases under the eight task load settings, resulting in a general downward trend in the fitness value of the allocation scheme. This is because the dynamic allocation algorithm analyzes the observation allocation results, target threat level, UCAV destruction, and the statistical analysis of the force information of each party more frequently. And the 4DI-GWO algorithm is suitable for optimizing complex situations. As the number of time slices increases, mission decision making in multi-UAV uncertain environments becomes more refined.

Comparing Figure 9a,e–g, it is evident that the dynamic time-slice setting can significantly enhance the optimization effect and reduce the overall task elapsed time when possessing a certain number of UAVs. Furthermore, it is obvious from Figure 9h that increasing the number of time slices can significantly obtain a lower fitness value and reduce the elapsed time of the complex and computationally intensive task, despite the large number of targets involved. The dynamic time slice setting results in a more efficient task allocation scheme.

Experiment 5

To ascertain the impact of varying degrees of uncertainty in environmental information on the efficacy of the proposed algorithm, Experiment 5 employs a simulation validation approach, utilizing environmental information with distinct degrees of uncertainty. These degrees of uncertainty encompass deterministic, slightly uncertain, moderately uncertain, and extremely uncertain environments.

In this experiment, the boundaries of the intervals

λ = [λ^{L}, λ^{U}]

describing the critical parameter information in the uncertain environment are borrowed from the extreme value

λ_{1}, λ_{2}

of the ternary fuzzy number

\tilde{λ} = (λ_{1}, λ_{c}, λ_{2})

, namely

λ^{L} = λ_{1}, λ^{U} = λ_{2}

. In this way, the uncertain parameter information

λ

can be expressed in terms of the degree of uncertainty

χ

, i.e.,

λ^{L} = (1 - χ) λ_{c}, λ^{U} = (1 + χ) λ_{c}

, where

λ_{c}

is the intermediate number of the ternary fuzzy number set as a constant value in the experiment to ensure the control variables and the degree of uncertainty

χ \in [0, 1)

. To ascertain the impact of the degree of uncertainty associated with the uncertain information on the efficacy of the task assignment scheme, Figure 10 presents the fitness value and the average task completion time of the task assignment scheme of the 4DI-GWO algorithm for 10 random runs in M = 4 N = 10 cases with varying degrees of uncertainty, specifically

χ

equal to 0, 0.3, 0.6, and 0.9, respectively.

As illustrated in Figure 10, when the UAV swarm is situated in a deterministic environment, namely

χ = 0

, the task allocation is relatively straightforward and readily converges to the optimal solution with the minimized fitness values and task completion time. As the degree of uncertainty increases, the algorithm becomes more challenging to solve, resulting in a delay in the start of the task, which in turn leads to an increase in the average task completion time of the UAVs and an increase in the fitness value of the task allocation scheme. A comparison of deterministic and extremely uncertain environments reveals a significant difficulty gap. This finding corroborates the assertion that the proposed algorithm can perform task assignments in the presence of diverse levels of uncertainty and is resilient to fluctuations in the uncertainty of key parameters.

Experiment 6

In order to further verify the maximum number of targets that can be handled by the proposed allocation strategy, Experiment 6 further enhances the number of targets on the basis of the existing simulation, while ensuring that the number of UAVs remains unchanged. The upper limit of the number of targets that can be handled by the UAV swarm in the current posture, M = 4, is given by the simulation results.

It is evident from Figure 11 that when the number of targets changes from 11 to 19, the proposed strategies are all optimally configured, and the final convergence of the stable fitness values is below 120. Nevertheless, when the number of targets is increased to 20, the algorithm’s optimized fitness value remains at approximately 180, indicating a local optimum. It is evident that the current UAV swarm is unable to handle 20 targets. The allocation strategy sacrifices the fitness value to satisfy multiple constraints in the performance metrics. This also demonstrates that the number of UAVs should be selected to approximately align with the number of targets and that the number of targets that can be handled by the current UAV swarm should not exceed 20.

5. Conclusions

In this paper, a dynamic task allocation strategy based on the 4DI-GWO algorithm is designed to solve the tasking problem of multiple heterogeneous UAVs against ground targets in uncertain environments. To simulate the real application environment, we construct the fuzzy chance-constrained planning model with the fuzzy processing of key parameters related to the blue side, including target position information, mission execution time, and target strike probability. The proposed tasking strategy is generated based on the 4DI-GWO algorithm, which is improved by the four-dimensional information strategy, nonlinear factor convergence strategy, weighted combinatorial position update strategy, and mutation operator introducing strategy. Compared with the classical GWO algorithm, the 4DI-GWO algorithm exhibits stronger global search capability and faster convergence speed, making it better suited for complex uncertain task allocation problems. In addition, the allocation algorithm is further upgraded into a dynamic allocation policy by setting dynamic time slices, improving the strategy performance, and handling more complex situations. Finally, the numerical simulation validates that the strategy can generate optimal heterogeneous multi-UAV dynamic tasking schemes under complex and uncertain environments. Additionally, the comparative analysis demonstrates that the proposed method has strong generalization and fast solving speed.

Future research on the task allocation problem may be conducted under denial environments, i.e., completely unknown environments, which is more difficult than the existing investigation of task allocation in an uncertain environment, with equally more constraints. Further study can be conducted at the level of the underlying algorithms and methods, such as the combination of the Alternating Direction Method of Multipliers algorithm [38] to enhance the performance through the integration of diverse approaches. And it is equally challenging and rewarding to conduct research on task allocation for large-scale UAV swarms, which can handle more targets.

Author Contributions

Conceptualization, Z.J. and H.H.; methodology, Z.J.; software, Z.J.; validation, Z.J., H.H. and Y.B.; formal analysis, Y.B.; investigation, Z.J. and Y.B.; resources, Z.J.; data curation, Y.B.; writing—original draft preparation, Z.J.; writing—review and editing, Z.J., H.H. and T.Y.; visualization, Z.J.; supervision, H.H. and T.Y.; project administration, H.H.; funding acquisition, H.H. All authors have read and agreed to the published version of the manuscript.

Funding

The author acknowledges funding received from the following science foundations: National Natural Science Foundation of China (No. 62176214, 61973253, 62101590), Natural Science Foundation of the Shaanxi Province, China (2021JQ-368).

Data Availability Statement

Contact the first/corresponding author, please.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

The appendix contains tables: Table A1, which provides a comprehensive overview of all symbols and descriptions, and Table A2, which displays the simulation parameters and associated values.

Table A1. Overview of all symbols and descriptions.

Symbol	Description	Symbol	Description	Symbol	Description
$U$	UAV	$T$	Target	$D$	Relative distance
$U^{R}$	Reconnaissance UAV	$Tid$	Target’s code	$S p$	Probability of UAV survival
$U^{S}$	Strike UAV	$Ttype$	Target’s type	$C_{1}$	UAV’s cost of the threat performing missions
$U^{R S}$	Reconnaissance–Strike UAV	$Tval$	Target’s value	$C_{2}$	UAV’s cost of range to complete mission sets
$Uid$	UAV’s code	$Tsp$	Probability of target striking the UAV	$C_{3}$	UAV’s benefit of destruction to target
$Utype$	UAV’s type	$Tpos$	Target’s position	$ω_{1}$	Weight of the threat cost
$Uval$	UAV’s value	${Ct}^{R}$	Duration of the reconnaissance mission	$ω_{2}$	Weight of flight cost
$Usp$	Probability of UAV striking the target	${Ct}^{S}$	Duration of the strike mission	$ω_{3}$	Weight of gain of destroying the target
$Upos$	UAV’s position	$N$	Number of targets	$x$	Decision variable
$Uv$	UAV’s speed	$MaxIter$	Maximum number of iterations	$h$	Mission variable
$M$	Number of UAVs	$PS$	Population size	$w$	Number of time slices

Table A2. Overview of simulation parameters and associated values.

Simulation Parameter	Value
$MaxIter$	500
$PS$	30
$M$	4
$N$	10
$w$	1
$ω_{1}$	0.4
$ω_{2}$	0.3
$ω_{3}$	0.3

The parameters presented in Table A2 represent the base parameters, and the individual values vary in accordance with the specific experimental conditions.

References

Javaid, S.; Saeed, N.; Qadir, Z.; Fahim, H.; He, B.; Song, H.; Bilal, M. Communication and Control in Collaborative UAVs: Recent Advances and Future Trends. IEEE Trans. Intell. Transport. Syst. 2023, 24, 5719–5739. [Google Scholar] [CrossRef]
Yu, X.; Gao, X.; Wang, L.; Wang, X.; Ding, Y.; Lu, C.; Zhang, S. Cooperative Multi-UAV Task Assignment in Cross-Regional Joint Operations Considering Ammunition Inventory. Drones 2022, 6, 77. [Google Scholar] [CrossRef]
Li, W.; Lyu, Y.; Dai, S.; Chen, H.; Shi, J.; Li, Y. A Multi-Target Consensus-Based Auction Algorithm for Distributed Target Assignment in Cooperative Beyond-Visual-Range Air Combat. Aerospace 2022, 9, 486. [Google Scholar] [CrossRef]
Wu, X.; Zhang, M.; Wang, X.; Zheng, Y.; Yu, H. Hierarchical Task Assignment for Multi-UAV System in Large-Scale Group-to-Group Interception Scenarios. Drones 2023, 7, 560. [Google Scholar] [CrossRef]
Shahid, S.; Zhen, Z.; Javaid, U.; Wen, L. Offense-Defense Distributed Decision Making for Swarm vs. Swarm Confrontation While Attacking the Aircraft Carriers. Drones 2022, 6, 271. [Google Scholar] [CrossRef]
Liu, S.; Liu, W.; Huang, F.; Yin, Y.; Yan, B.; Zhang, T. Multitarget Allocation Strategy Based on Adaptive SA-PSO Algorithm. Aeronaut. J. 2022, 126, 1069–1081. [Google Scholar] [CrossRef]
Yin, Y.; Guo, Y.; Su, Q.; Wang, Z. Task Allocation of Multiple Unmanned Aerial Vehicles Based on Deep Transfer Reinforcement Learning. Drones 2022, 6, 215. [Google Scholar] [CrossRef]
Wang, W.; Lv, M.; Ru, L.; Lu, B.; Hu, S.; Chang, X. Multi-UAV Unbalanced Targets Coordinated Dynamic Task Allocation in Phases. Aerospace 2022, 9, 491. [Google Scholar] [CrossRef]
Deng, H.; Huang, J.; Liu, Q.; Zhao, T.; Zhou, C.; Gao, J. A Distributed Collaborative Allocation Method of Reconnaissance and Strike Tasks for Heterogeneous UAVs. Drones 2023, 7, 138. [Google Scholar] [CrossRef]
Cui, J.; Liu, Y.; Nallanathan, A. Multi-Agent Reinforcement Learning-Based Resource Allocation for UAV Networks. IEEE Trans. Wirel. Commun. 2020, 19, 729–743. [Google Scholar] [CrossRef]
Multi-Criterion Multi-UAV Task Allocation under Dynamic Conditions. J. King Saud Univ. Comput. Inf. Sci. 2023, 35, 101734. [CrossRef]
Han, H.; Bai, X.; Hou, Y.; Qiao, J. Multitask Particle Swarm Optimization with Dynamic On-Demand Allocation. IEEE Trans. Evol. Computat. 2023, 27, 1015–1026. [Google Scholar] [CrossRef]
Liu, P.; Xu, S.L.; Zhang, D. Multi⁃missile Dynamic Weapon Target Assignment Algorithm Based on Particle Swarm Optimization. J. Nanjing Univ. Aeronaut. Astronaut. 2023, 55, 108–115. [Google Scholar] [CrossRef]
Zhao, M.; Li, D. Collaborative Task Allocation of Heterogeneous Multi-Unmanned Platform Based on a Hybrid Improved Contract Net Algorithm. IEEE Access 2021, 9, 78936–78946. [Google Scholar] [CrossRef]
Qiu, S.M.; Liu, L.C.; Du, X.L. Weapon Target Allocation Based on Multi-objective Whale Optimization Algorithm. Comput. Appl. Softw. 2023, 40, 254–276. [Google Scholar]
Qiu, S.M.; Bai, C.C.; Lv, Y.N. Dynamic Weapon Target Allocation Based on the Multi-objective Whale Optimization Algorithm. J. Ordnance Equip. Eng. 2023, 44, 153–159. [Google Scholar]
Qiu, S.M.; Wang, X.K.; Du, X.L. Weapon Target Assignment Based on Improved Multi-objective Simplified Swarm Optimization. Comput. Appl. Softw. 2023, 40, 242–249. [Google Scholar]
Zheng, F.Y.; Liu, L.W.; Cheng, Y.H. Multi-objective control allocation strategy of compound rotorcraft. Acta Aeronaut. Astronaut. Sin. 2019, 40, 246–261. [Google Scholar]
Zheng, F.Y.; Wang, F.; Zhen, Z.Y. Control allocation of multi-objective adaptive probabilistic guidance for advanced layout unmanned aerial vehicle. Control Theory Appl. 2022, 39, 2366–2376. [Google Scholar]
De Curtò, J.; De Zarzà, I.; Roig, G.; Cano, J.C.; Manzoni, P.; Calafate, C.T. LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments. Electronics 2023, 12, 2814. [Google Scholar] [CrossRef]
He, Y.; Zhang, C.Y.; Li, S.S. Unmanned aerial vehicle carriers scheduling problem based on two-stage robust optimization. J. Syst. Eng. 2020, 35, 838–848+864. [Google Scholar] [CrossRef]
Zhao, Y.L.; Song, Y.X.; Zhao, J.C. Multi-UAV cooperative reconnaissance mission planning based on robust optimization. J. Nav. Univ. Eng. 2021, 33, 48–54. [Google Scholar]
Whitbrook, A.; Meng, Q.; Chung, P.W.H. Addressing Robustness in Time-Critical, Distributed, Task Allocation Algorithms. Appl. Intell. 2019, 49, 1–15. [Google Scholar] [CrossRef]
Li, C.Y.; Cao, K.H.; Feng, S.X. Resource scheduling with uncertain execution time in cloud computing. J. Harbin Univ. Sci. Technol. 2019, 24, 85–91. [Google Scholar] [CrossRef]
Fan, H.M.; Wu, J.X.; Geng, J. Hybrid genetic algorithm for solving fuzzy demand and time windows vehicle routing problem. J. Syst. Manag. 2020, 29, 107–118. [Google Scholar]
Zhao, Y.L.; Song, Y.X.; Zhang, J.J. Fuzzy game decision-making of unmanned aerial vehicles air-to-ground attack based on the particle swarm optimization integrating multiply strategies. Control Theory Appl. 2019, 36, 1644–1652. [Google Scholar]
Nguyen, L.V.; Le, T.H.; Ha, Q.P. Grey Wolf Optimization-Based Path Planning for Unmanned Aerial Vehicles in Bridge Inspection. In Proceedings of the 2024 IEEE/SICE International Symposium on System Integration (SII), Ha Long, Vietnam, 8 January 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 810–815. [Google Scholar]
Yu, X.; Jiang, N.; Wang, X.; Li, M. A Hybrid Algorithm Based on Grey Wolf Optimizer and Differential Evolution for UAV Path Planning. Expert Syst. Appl. 2023, 215, 119327. [Google Scholar] [CrossRef]
Feng, J.; Sun, C.; Zhang, J.; Du, Y.; Liu, Z.; Ding, Y. A UAV Path Planning Method in Three-Dimensional Space Based on a Hybrid Gray Wolf Optimization Algorithm. Electronics 2023, 13, 68. [Google Scholar] [CrossRef]
Liu, X.; Li, G.; Yang, H.; Zhang, N.; Wang, L.; Shao, P. Agricultural UAV Trajectory Planning by Incorporating Multi-Mechanism Improved Grey Wolf Optimization Algorithm. Expert Syst. Appl. 2023, 233, 120946. [Google Scholar] [CrossRef]
Kumar, R.; Singh, L.; Tiwari, R. Novel Reinforcement Learning Guided Enhanced Variable Weight Grey Wolf Optimization Algorithm for Multi-UAV Path Planning. Wirel. Pers. Commun. 2023, 131, 2093–2123. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey Wolf Optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Huang, H.; Jiang, Z.; Dong, Y.; Weng, W.; Bi, T.; Shen, Y. UCAVs Collaborative Target Assignment Based on Improved GWO Algorithm. In Proceedings of the 2023 35th Chinese Control and Decision Conference (CCDC), Yichang, China, 20 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 4934–4940. [Google Scholar]
Zhang, A.; Yang, M.; Bi, W.H.; Zhang, B.C. Task allocation of heterogeneous multi-UAVs in uncertain environment based on multi-strategy integrated GWO. Acta Aeronaut. Astronaut. Sin. 2023, 44, 148–164. [Google Scholar]
Kennedy, J. Particle Swarm Optimization. In Encyclopedia of Machine Learning; Sammut, C., Webb, G.I., Eds.; Springer: Boston, MA, USA, 2011; pp. 760–766. ISBN 978-0-387-30768-8. [Google Scholar]
Katoch, S.; Chauhan, S.S.; Kumar, V. A Review on Genetic Algorithm: Past, Present, and Future. Multimed. Tools Appl. 2021, 80, 8091–8126. [Google Scholar] [CrossRef] [PubMed]
Dorigo, M.; Birattari, M.; Stutzle, T. Ant Colony Optimization. IEEE Comput. Intell. Mag. 2006, 1, 28–39. [Google Scholar] [CrossRef]
Tang, T.; Toh, K.-C. Self-Adaptive ADMM for Semi-Strongly Convex Problems. Math. Prog. Comp. 2024, 16, 113–150. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the Red and Blue attack and defense confrontation.

Figure 2. Block diagram of dynamic task allocation for uncertain environment.

Figure 3. Flowchart of dynamic task allocation.

Figure 4. Four-dimensional grey wolf information strategy diagram.

Figure 5. Flowchart of 4DI-GWO algorithm.

Figure 6. Fitness curve of 4DI-GWO.

Figure 7. Fitness curves for different task loads: (a) fitness curves of 4DI-GWO with the increasing number of UAVs; (b) fitness curves of 4DI-GWO with the increasing number of targets.

Figure 8. Fitness curve of different algorithms.

Figure 9. Optimization effectiveness over dynamic time slices: (a) M = 4, N = 10; (b) M = 4, N = 12; (c) M = 4, N = 14; (d) M = 4, N = 16; (e) M = 6, N = 10; (f) M = 8, N = 10; (g) M = 10, N = 10; (h) M = 10, N = 16.

Figure 10. Optimization effectiveness over different degrees of uncertainty.

Figure 11. Maximum number of targets simulation.

Table 1. Information on UAVs.

Number	Type	Value	Probability of Strike Target	Position/(m)	Velocity /(m/s)
U₁	Reconnaissance	55	0	(450, 700)	45
U₂	R-S	80	0.9	(750, 630)	40
U₃	R-S	80	0.9	(980, 1400)	40
U₄	Strike	60	0.99	(700, 1250)	35

Table 2. Information on target.

Number	Type	Value	Probability of Strike UAV	X- Coordinate	Y- Coordinate	Duration of Reconnaissance Mission	Duration of Strike Mission
T₁	AAGun	65	[0.3, 0.55]	[5247, 5587]	[8542, 8625]	[71, 90]	[227, 301]
T₂	Radar	74	0	[6023, 6245]	[8378, 8546]	[75, 92]	[230, 300]
T₃	Radar	69	0	[6501, 6654]	[4987, 5382]	[81, 103]	[212, 278]
T₄	AAGun	53	[0.32, 0.56]	[9754, 10,547]	[8351, 8747]	[78, 105]	[210, 281]
T₅	Missile	86	[0.74, 0.93]	[7049, 7691]	[8247, 8678]	[72, 97]	[211, 282]
T₆	Missile	91	[0.78, 0.98]	[7148, 7596]	[5874, 6231]	[71, 95]	[219, 290]
T₇	Radar	78	0	[5892, 6214]	[5674, 6007]	[70, 92]	[218, 291]
T₈	Missile	86	[0.69, 0.91]	[5639, 5987]	[8402, 8643]	[76, 102]	[213, 285]
T₉	AAGun	55	[0.29, 0.51]	[8005, 8276]	[6998, 7264]	[77, 99]	[220, 294]
T₁₀	High-value target	100	0	[7849, 8021]	[5624, 5913]	[68, 87]	[230, 287]

Table 3. Result of task allocation.

UAV Number	Sequence of Tasks	Flight Distance	Flight Duration
U₁	T₉→T₅→T₈→T₂	14,852	875
U₂	T₁→T₆→T₇→T₄→T₁₀→T₈	30,124	1496
U₃	T₆→T₄→T₃→T₇→T₅→T₁₀	18,478	1701
U₄	T₃→T₁→T₂→T₉	16,253	1647

Table 4. Performance of different algorithms.

Algorithm	BST	AVG	WST	STD	AVGIter	AVGDis	AVGComTime
PSO	116.1463	141.862	165.1256	39.4124	349	54,787	3627
GA	118.7854	125.1259	160.7956	12.1124	304	49,587	2687
ACO	119.6329	126.7852	151.2549	22.5655	397	42,784	2215
GWO	114.6985	116.2549	141.3654	7.7854	170	36,989	1878
IGWO	113.8548	114.2368	127.2596	2.1451	175	31,217	1689
4DI-GWO	105.2485	108.5417	115.8741	0.9897	179	27,899	1622

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, H.; Jiang, Z.; Yan, T.; Bai, Y. Dynamic Task Allocation for Heterogeneous Multi-UAVs in Uncertain Environments Based on 4DI-GWO Algorithm. Drones 2024, 8, 236. https://doi.org/10.3390/drones8060236

AMA Style

Huang H, Jiang Z, Yan T, Bai Y. Dynamic Task Allocation for Heterogeneous Multi-UAVs in Uncertain Environments Based on 4DI-GWO Algorithm. Drones. 2024; 8(6):236. https://doi.org/10.3390/drones8060236

Chicago/Turabian Style

Huang, Hanqiao, Zijian Jiang, Tian Yan, and Yu Bai. 2024. "Dynamic Task Allocation for Heterogeneous Multi-UAVs in Uncertain Environments Based on 4DI-GWO Algorithm" Drones 8, no. 6: 236. https://doi.org/10.3390/drones8060236

Article Menu

Dynamic Task Allocation for Heterogeneous Multi-UAVs in Uncertain Environments Based on 4DI-GWO Algorithm

Abstract

1. Introduction

2. Task Allocation Model for Uncertain Environments

2.1. Problem Description

2.2. Fuzzy Variables Handling

2.3. Benefit and Cost Functions

2.4. Task Allocation Model

3. Dynamic Task Allocation Based on 4DI-GWO Algorithm

3.1. Dynamic Time Slices Setting

3.2. Basic GWO Algorithm

3.3. Improvement Strategies of 4DI-GWO

3.3.1. Four-Dimensional Information Strategy

3.3.2. Nonlinear Factor Convergence Strategy

3.3.3. Weighted Combinatorial Position Update Strategy

3.3.4. Mutation Operator Introducing Strategy

3.4. Solution Process of 4DI-GWO

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI