1. Introduction
In recent years, with the rapid development of artificial intelligence and biomimetic technology, multi-agent systems (MASs) (such as drone swarms, robot swarms, multiple unmanned boats, and sensor networks) have been widely used in collaborative search and rescue, target tracking, fault detection, and other tasks due to their distributed, flexible, scalable, and low-cost characteristics. The decision making and analysis of multi-agent behavior for controlling large-scale systems have come to the forefront of current research in the field of control, as these systems rely on agent communication to achieve the overall expected behavior. With the joint efforts of numerous researchers, MASs have made significant progress in research areas such as consensus [
1,
2], cluster [
3], flocking/swarm [
4,
5], and hunting control. However, with the expansion of the application scope, some challenges have also arisen for practical engineering problems. How to efficiently and reliably control MASs to perform various tasks and fully realize their potential application value has gradually become one of the current research focuses. Among these difficulties, the MAS target hunting problem is particularly challenging. It requires concise and flexible control instructions to guide agents in coordination with neighboring individuals to complete target hunting with limited information. Research in this area is important in many fields, such as the military, transportation, and the environment. It offers multiple application prospects in many scenarios, such as search, surveillance, rescue, environmental monitoring, marine scientific research, and border law enforcement. In this context, this paper proposes a new coordination technique based on practical applications, aiming to solve the dynamic problem of multi-target hunting control.
The target-hunting problem in MASs is technically related to consistency theory and distributed control problems. Initially, Olfati-Saber [
6] was the first to consider the flocking problem of MASs, providing a solution for further research regarding the hunting problem while revealing optimality and theoretically achievable issues. Furthermore, Talebi S. P. et al. [
7] proposed a distributed framework for controlling the state space processes of agent networks. According to the existing literature, more researchers, such as Kou et al. [
8,
9,
10], who pay more attention to the structural relationships between agents and hunting formations during the hunting process, have focused on studying individual hunting targets. Xie et al. [
11] studied the formation control problem during the hunting process and applied the gray wolf tracking strategy to the single-target hunting problem. However, only centralized hunting was implemented in the paper. Guo et al. [
12,
13] proposed a local information control law that only utilizes the relative position information between the target and its neighbors to achieve dynamic target hunting. The results showed that multi-robot systems can coordinate and estimate the motion speed of the target. Meanwhile, Huang et al. [
14] considered a more complex siege scenario, based on an autonomous underwater vehicle (AUV), and a multi-AUV collaborative hunting algorithm, based on bionic neural networks. The underwater environment was first modeled, and then an efficient capture path was planned for the AUV to surround the target. Further, Zengin et al. [
15] studied the collaborative target-hunting problem in adversarial environments, and a collaborative strategy for tracking and attacking targets using UAVs was developed. Fan et al. [
16] proposed a solution for collaborative capture in three-dimensional environments that can achieve both capture and obstacle navigation. This solution includes a 3-D synchronous encirclement strategy, a path planning algorithm, obstacle avoidance, and a cascaded PI (proportional integral) controller. Simulation results demonstrate that this solution can search and capture static or dynamic targets in obstacle environments. Chen et al. [
17] were is no longer limited to homogeneous agents for hunting, further considering the collaborative hunting of heterogeneous underwater robots, proposing a new time-competitive mechanism to build an efficient dynamic hunting coalition. In order to improve the efficiency of target hunting for multi-AUVs, a hunting algorithm based on dynamic prediction of moving target trajectory was proposed [
18] A negotiation method was used to allocate appropriate ideal hunting points for each underwater vehicle. Finally, the desired hunting points were quickly reached through the deep reinforcement learning (DRL) algorithm to achieve the hunting of moving targets. The problem of smooth switching from tracking control to bracketing control was further investigated by Yu et al. [
19]. The above studies are all aimed at single-target hunting. Multi-target hunting is more complex than single-target hunting. Du et al. [
20] studied cooperative tracking strategies and proposed a method based on multi-agent reinforcement learning (MARL). By introducing a parameter-sharing scheme, the proposed method achieved higher hunting rates in a shorter period of time. However, it lacks in-depth discussion on the issue of tracking task allocation, which may hinder maneuverability in the process of multi-target tracking. Regarding the collaborative multi-target pursuit of unmanned surface vehicles (USVs), Xia et al. [
21] modeled the collaborative hunting problem of USV fleets as a decentralized, partially observable Markov decision process. They proposed a distributed partially observable multi-target hunting proximal strategy optimization algorithm suitable for USVs. Experiments have shown that even when certain USVs are damaged, the self-organizing ability of the entire USV fleet still maintains an advantage. The hunting of multiple targets mentioned above is achieved using machine learning methods. The advantage of reinforcement learning is that it does not require establishing a system dynamic model, as it is a model-free method. However, a drawback of machine learning methods is that continuous learning is required to achieve good results, and the learning process consumes a significant amount of time and computing power. Reinforcement learning methods mainly address MASs at the cooperative level, and the information processing between MASs is usually centralized; therefore, it cannot effectively reflect the distributed swarm advantage among agents. Given the shortcomings of reinforcement learning methods and the excellent collaborative ability demonstrated by swarm behavior, this paper proposes a novel trapping strategy based on the research results of the aforementioned literature.
In order to solve the problem of hunting multiple dynamic targets that are scattered and escaping, it is necessary to not only design an appropriate hunting strategy but also to consider how to assign the tasks. The result of good task assignment plays a key role in the efficiency of multi-target hunting. Trigui et al. [
22] studied the distributed allocation algorithm for multi-robots. They proposed two algorithms: the distributed market algorithm and the improved distributed market algorithm. Compared with the centralized Hungarian algorithm, these two algorithms obtained approximate optimal solutions in task allocation, which can effectively reduce the cost of task allocation. Liang et al. [
23] further investigated the interactive topology and protocol of task allocation in MASs. In order to achieve interactive communication, an extended contract network protocol, based on point-to-point topology, was proposed, which improved the efficiency and quality of task allocation. Jin et al. [
24,
25] investigated coordination behavior strategies for task assignment using a competitive and cooperative approach. Their strategy enables distributed task assignments and ensures the fairness of the tasks. Liu et al. [
26] proposed a multi-task allocation algorithm based on a self-organizing map (SOM) for unmanned surface vessels (USVs) to implement complex ocean operations. For the dynamic assignment problem of multi-robots, they used the multi-target optimization method for estimation, in order to achieve optimal task allocation [
27]. Shi et al. [
28,
29] investigated heuristic algorithms for task assignment, and the results proved that heuristics are efficient, stable, and computationally affordable. This research has studied different task allocation problems from their respective practical application perspectives. However, for the multi-target hunting problem of scattered escape, relying solely on self-organizing task allocation based on limited speed and displacement information may result in too few or no agents around the target, thereby preventing successful capture. Therefore, this paper also considers the limited detection and communication capabilities of the agent itself, forming a multi-level, distributed task allocation method and ultimately, successfully hunting various targets.
Driven by the aforementioned issues outlined previously and based on the original work, this paper introduces a novel approach to self-organization task allocation and hunting control strategy that integrates fuzzy logic and heuristic optimization algorithms, considering the practical engineering application of multi-target hunting in multi-agent systems. By establishing appropriate target models and collaborative strategies, multi-agent systems can achieve the coordinated hunting of targets during the hunting process. The main contributions of this article are as follows:
In this paper, we first focus on the self-organizing allocation of tasks for multi-agent and multi-target hunting and propose a multi-agent multi-target task self-organization allocation algorithm for a dynamic environment. The algorithm combines fuzzy logic and heuristic optimization algorithm, determines the evaluation factor of task allocation, and then implements a globally distributed task assignment based on an improved particle swarm optimization algorithm, with practical application scenario constraints to achieve optimal global system performance. In the hunting control strategy section, an attraction/repulsive force model, based on potential field function, was designed and introduced to achieve the hunting control strategy of predicting the target’s motion trajectory. Without knowing the target’s motion state, only the relative position and velocity information between the agent and the target can be used for hunting control of the target. This can ensure that various agents can collaborate to reach the target area, hunt the target, and ultimately form an encirclement based on the set hunting radius for hunting.
The organizational structure of this article is as follows:
Section 2 lays some preliminary groundwork;
Section 3 establishes a dynamic, multi-target, self-organizing task allocation model;
Section 4 introduces the design of the hunting strategy and the proof of stability;
Section 5 verifies the effectiveness of the theoretical results through simulation experiments; and finally, in
Section 6, a brief summary is provided.
2. Preliminary and Problem Formulation
Assuming that all agents are isomorphic—that is, each agent has the same dynamic model and functional attributes—the communication connection of multi-agent systems can be represented by a graph: . represents the set composed of N agent nodes in , and represents agent . is the edge set, representing the node connections in graph . is the adjacency matrix, and represents the connection weight between agent and agent . To ensure connectivity, assume that is a connected graph.
Considering the limited detection distance of any actual sensor, the detection range refers to the maximum distance at which an agent can detect other agents and respond to a target, and the detection radius of the intelligent agent is set to
.
Here,
is the distance between agent
and agent
. Equation (1) shows that if mutual detection is not possible, a connection cannot be established. This is in line with reality. Agents establish mutual connections based on the detection distance. This paper only considers that the topology of the multi-agent system is connected, based on the actual situation.
is a safe distance, which is the minimum distance that prevents collisions between any two agents
3. Establishment of a Task Allocation Model
Assume that there are targets in system. The goal of the task allocation is to obtain decision inputs that maximize the overall performance indicators under certain constraints. A target must have three agents participating in the hunting. However, relying solely on information about relative position and velocity for task allocation may result in fewer than three agents being assigned to a certain target. Therefore, this paper considers the two-step task allocation method. The first step is the initial allocation, in which the agents obtain target information through their detection and communication with neighbors. Then, task allocation evaluation factors are achieved through a fuzzy logic system according to target information, and the target with the largest evaluation factor is taken as the initial assignment target. The second step is allocation optimization. Considering the constraints of hunting, each can only choose one target, and a target requires at least three agents to participate in the hunting. The agents with the same target automatically form a subgroup, and every agent is optimally assigned to different targets through an improved distributed self-organizing particle swarm algorithm. The task assignment evaluation factor of the agents within each target subgroup should be maximized, and the optimal task allocation result is finally obtained after optimization, as follows.
3.1. Modeling of Task Allocation Evaluation Factors
The factors influencing the task assignment assessment include the position and speed of the agent and the target, as well as the regulating ability of the agent. Based on the above influencing factors, a task allocation evaluation equation is established as follows:
where
is the relative speed evaluation factor between the agent and the target, indicating the strength of the agent’s regulating ability.
is the relative position evaluation factor, representing the distance between the agent and the target.
Here,
is the initial velocity of agent
,
is the velocity of target
, and
is the relative position between agent
and target
,
.
Here,
is the position vector of agent
, and
is the position vector of target
. We obtain
,
, and
, respectively, in
Section 3.2.1 and
Section 3.2.2.
3.2. Solving the Task Allocation Evaluation Factor Model
Analyzing the task allocation evaluation equation, we can find several variables in Equations (4) and (5), and the function relationship is difficult to solve. It is difficult to find an accurate mathematical model to solve the equation. The fuzzy logic reasoning method can solve the problem, without an accurate function relationship. In artificial intelligence, fuzzy reasoning technology is a very important technique that can understand and process fuzzy and uncertain information, thereby achieving more intelligent decision making and computation. Fuzzy reasoning is known for its qualitative analysis, which can express the laws of objective things in standardized and concise manner using qualitative analysis. Fuzzy reasoning is good at considering problems from multiple perspectives, establishing connections between things, and paying special attention to summarizing the overall characteristics of things, estimating the process of time, and reaching approximate and flexible conclusions. Therefore, we can deal with the problem more flexibly, with strong adaptability and robustness, using the fuzzy logic reasoning method. The speed efficiency factor and position efficiency factor can be solved separately, and the solution results can be used as input to solve the task allocation evaluation factor.
3.2.1. Solution of the Speed Evaluation Factor
At a certain moment, agent
and target
are, respectively, in the positions shown in
Figure 1. The black triangle represents agent
, and the red circle represents target
. The speed of agent
is
,
, and the angle between the moving directions of agent
and target
is
. When agent
selects target
as the target, it will ultimately generate a speed that tends to be consistent with the direction of target
. Therefore, the speed that agent
requires to change is as follows:
We can design a fuzzy logic inference system using
and
as the input variables for solving the speed efficiency factor, and the speed efficiency factor
as the output variable. The domain of each variable is set as follows:
We can define fuzzy set language variables and select appropriate fuzzy word sets for each variable, as show in
Table 1.
The fuzzy language variables corresponding to include z0, ps, pm, and pb. The fuzzy language variables corresponding to include nb, nm, ns, z0, ps, pm, and pb. The fuzzy language variables corresponding to include nb, nm, ns, z0, ps, pm, and pb.
Next, we can determine the fuzzy rules of inference. The following fuzzy rules of inference are designed according to the constraints of the agents and other actual conditions, combined with existing experience. They are listed in
Table 2.
3.2.2. Solving the Path Evaluation Factor
The agents need to communicate with neighboring agents during the hunting process. If the agents are all clustered toward the system center, the communication loss will be reduced, and the communication stability among the agents will be improved, which is conducive to improving hunting efficiency. The design of the position evaluation factor also needs to take into account the movement state of the agent. Hence, the agent clusters toward the center, and the position evaluation factor is positive. The position evaluation factor is negative when agent moves in the direction opposite to the center.
The position of agent
relative to the target during the process of target hunting can be divided into two cases. If the distance between agent
and the center of the MAS is longer than that between target
and the center of the MAS, then agent
moves toward the center of the task area, and the position evaluation factor is positive. If the distance from agent
to the center of the MAS is shorter than the distance of target
to the center of the MAS, then agent
will move away from the center of the task area, and the position evaluation factor is negative.
Here,
represents the distance from agent
to the center of the MAS, and
represents the distance from target
to the center of the MAS,
.
Based on
and
,
is solved using the fuzzy logic reasoning method as the output. The theoretical domain of each variable is set as follows:
We can set the fuzzy set language variable of each variable separately:
The fuzzy language variables corresponding to include nb, nm, ns, z0, ps, pm, and pb. The fuzzy language variables corresponding to include nb, nm, ns, z0, ps, pm, and pb. The fuzzy language variables corresponding to include z0, ps, pm, pb pbb, and pbbb.
Then, we can determine the fuzzy rule of inference. According to
and
, the fuzzy rules of inference are shown in
Table 3.
3.3. Self-Organizing Distributed Collaborative Task Allocation Optimization Model
Firstly, the agent obtains the task allocation evaluation factor of the target through self-detection and communication with neighbor agents. This factor quantifies the probability of task completion. Then, each agent self-organizes to form a task assignment subgroup according to the task assignment evaluation factor, and every subgroup must make the final assignment result of every subgroup member optimal to satisfy the constraints. The distributed task allocation method adopts the improved particle swarm optimization algorithm, with good search ability and strong robustness, to achieve self-organized task allocation optimization. Particle swarm optimization is an optimization algorithm based on swarm intelligence. It has strong global search capabilities. The algorithm adopts the idea of swarm intelligence and can efficiently search for the global optimal solution by utilizing group collaboration and information sharing. It has a fast convergence speed. The algorithm can quickly converge during the search process, and compared to some traditional optimization algorithms, such as the genetic algorithm or ant colony algorithm, it can find the approximate optimal solution faster. It is not sensitive to initial values. The algorithm can also achieve similar optimization results under different initial values, which makes it more robust. The proposed method is flexible and can be applied to various scenarios with different constraints.
Here,
is the number of the neighbor agents of target
, and
is the hunting factor, indicating that agent
hunts target
.
The constraints of the multi-agent hunting multi-target task allocation model are as follows:
indicates that each agent hunts only one target.
indicates that a target requires at least three agents for hunting.
Agent cannot hunt target with .
Using the above organization, a multi-agent system can globally distribute task allocation and achieve the global optimal allocation result, while meeting the constraints.
4. Hunting Strategy
The continuous time motion model of the system is as follows:
where
denotes the position of agent
,
denotes the velocity of agent
, and
denotes the control input of agent
.
denotes the position of target
, and
denotes the velocity of target
.
is the weight of the velocity error between the agents,
is the weight of the position error between agent
and target
, and
is the weight of the velocity error between agent
and target
.
Here,
, and
a and
b are constant and
. Equation (13) gives the control strategy for the multi-agent system. The first summation in Equation (13) is a control term that ensures the convergence of all the agents in the system, the second summation term maintains the velocity consensus of all the agent, the third makes certain that the agent is close to the target, and the fourth term guarantees that the agent can maintain the velocity consensus with the target. Thus, using Equation (13), we may suppose that the subgroup of multi-agent systems can finally hunt its target.
Under the consideration of the control inputs , the system can achieve stability. We can use the Lyapunov function method for proof.
Firstly, we can define the position error of agent : , and the velocity error is: .
Then, the Lyapunov function is chosen as:
From Wu et al. [
30], we know that the positive semidefinite matrix
and
are in existence for the derivation of the Lyapunov function,
because is a positive semidefinite matrix. Then, we may obtain , . Thus, . This inequality tells us that agent can catch up with the target. Meanwhile, according to the LaSalle invariance principle, for each hunting subgroup, .
The above hunting control model shows that the entire hunting process is achieved by obtaining the position error and velocity error of the target through the agent. That is to say, throughout the entire hunting process, the agent does not need to know the target’s motion state to hunt.
5. Simulation Analysis
This section provides the multi-target hunting simulation. In the simulation, the targets scatter in all directions, and the graph composed of all agents is not completely connected, but is connected at first. More importantly, the number of agents near some targets is less three, so we must initially finish the task allocation work before hunting begins. We can randomly generate three scattered escape targets and 12 agents in the area of X = [−1000 m,1000 m] and Y = [−1000 m,1000 m] to satisfy the above conditions. The detection distance is m, and the collision avoidance distance is m.
A schematic diagram of the 12 agents and three targets at the initial moment is shown in
Figure 2.
In
Figure 2, the red dot represents the agent, the green pentagram represents the target, the arrow direction represents the velocity direction, and the length of the arrow represents the speed. We can set the target to move at a constant speed of 3 m/s, according to the initial speed direction. The initial speed of the agent is 5 m/s, and the agents must change the speed in real-time, according to the assigned task situation, to ensure fast and efficient target hunting.
The adjacency matrix is composed of 12 agents:
where 0 indicates no connection between the corresponding agents, and 1 indicates a connection relationship between the corresponding agents. The connection between the agents is shown in
Figure 3.
Figure 3 shows that the initial topology graph of the multi-agent systems is connected.
Based on the information detected by the agent and the information obtained through communication with neighboring agents, the task assignment evaluation factor matrix is obtained through a fuzzy logic system:
The parameters of the improved particle swarm optimization algorithm used in the distributed task allocation method are set as follows: the number of particles is 50, the number of iterations is 1000, the inertia weight is 0.8, and the learning factors are c1 = 2.0, c2 = 2.0.
The optimal task allocation results are as follows:
where the first row indicates each of the 12 agents, and the second row indicates the serial number of the target assigned to each agent.
The evaluation factor value for optimal fitness/optimal task allocation is 7.87.
Figure 4 schematically shows the change curves of the number of iterations and optimal fitness/optimal task assignment evaluation factors.
The relevant parameters for the hunting phase are set as follows: the step period is 0.25 s, and the number of iterations is 2000; meanwhile, , , and .
Figure 5 and
Figure 6 show the position change process of the multi-agents. To intuitively understand the entire process of multi-agent hunting targets,
Figure 5 shows the 2-D array diagrams of the changes in the positions of the agents and targets, respectively. The figures also display the real-time coordinate values of the targets, which further helps us observe the motion process.
Figure 6 adds a time dimension and displays a 3-D array of changes in the positions of the agent and the target.
From the position change graph, it can be seen that in
Figure 5(1) and
Figure 6(1), starting from the initial position, each agent has gradually moved toward the target direction of task allocation after 35 iteration steps. In
Figure 5(2) and
Figure 6(2), after 112 iteration steps, agents are already reaching the vicinity of targets 1 and 3. In
Figure 5(3) and
Figure 6(3), after 180 iteration steps, targets 1 and 3 are no longer visible, which indicates that the agent has reached a position close to targets 1 and 3.
Figure 5(4) and
Figure 6(4) shows that after 253 iteration steps, target 1 has been hunted by three agents. In
Figure 5(5) and
Figure 6(5), after iteration to 451 iteration steps, the three targets represented by the green pentagram can no longer be seen. In
Figure 5(7) and
Figure 6(7), after 1528 iteration steps, the agent formed an encirclement cluster that hunts the target. In
Figure 5(8) and
Figure 6(8), after 1528 until 2000 iteration steps, the agent always follows the formed bounding encirclement and moves forward with the target, without losing or leaving the target. What requires further explanation here is that the agent does not know the motion state of the target. However, the agent can adjust, step by step, its position, velocity, and acceleration through its position error, velocity error, and acceleration error, with the target under the control of the hunting control strategy, always rounding up the target at a certain safe distance.
Figure 7 shows the complete trajectory of the multi-agent encirclement of dynamic multiple targets. All three targets move at a uniform speed in their respective directions.
Figure 7(1) shows the hunting process of multiple dynamic targets in a 2-D environment of the multi-agent system. Compared to
Figure 7(1),(2) has added a time coordinate axis.
Figure 8,
Figure 9 and
Figure 10 show the process and results of multi-agent subgroups surrounding targets 1, 2, and 3, respectively. In the trajectory distribution diagram in
Figure 7,
Figure 8,
Figure 9 and
Figure 10, the black line shows the motion trajectory of the agent, and the green line indicates the motion trajectory of the target.
Figure 11 shows the variation in the relative positions of each agent relative to their hunting target, which indicates that each agent can approach and gather around the target it has chosen.
Figure 8,
Figure 9 and
Figure 10 show that after successful hunting of targets 1 and 3, the comparative position of the agent relative to its target remains unchanged. In contrast, after successfully hunting target 2, the agent rotates relative to its target. The reason for this is that the number of agents around target 1 and target 3 is both three, which is an odd number. For any agent, the other two agents are symmetrically distributed on both sides of the line connecting the agent and the target, and the forces can cancel each other out, without any force that drives its rotation. Target 2 is not such a case. The number of hunting agents is six, which is an even number, and the forces cannot cancel each other out, so the agents rotate around the target.
Figure 12 shows the variation process of the distance between any two agents of the six agents around target 2.
Figure 12(1) clearly shows that the six agents involved in hunting target 2 gradually approach each other over time and eventually maintain a close distance range. The locally enlarged image in
Figure 12(2) shows that the distance between the six agents was ultimately maintained within a range of 20 m.
Through the above simulation experiments, we can see that the hunting task can be successfully accomplished. It should be further emphasized that our simulation experiments consider the case in which the number of agents is small, and even though the number of agents is only three, the roundup can be successfully achieved. If the number of agents is greater, hunting will be easier to achieve, and the effect will also be better. This further illustrates the superiority of our designed hunting system.