Next Article in Journal
ROS System Facial Emotion Detection Using Machine Learning for a Low-Cost Robot Based on Raspberry Pi
Next Article in Special Issue
Passification-Based Robust Phase-Shift Control for Two-Rotor Vibration Machine
Previous Article in Journal
An Enhanced Deep Learning-Based DeepFake Video Detection and Classification System
Previous Article in Special Issue
Adaptive Discontinuous Control for Fixed-Time Consensus of Nonlinear Multi-Agent Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Intelligent Distributed Swarm Control for Large-Scale Multi-UAV Systems: A Hierarchical Learning Approach

Department of Electrical and Biomedical Engineering, University of Nevada, Reno, NV 89557, USA
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(1), 89; https://doi.org/10.3390/electronics12010089
Submission received: 11 November 2022 / Revised: 14 December 2022 / Accepted: 19 December 2022 / Published: 26 December 2022

Abstract

:
In this paper, a distributed swarm control problem is studied for large-scale multi-agent systems (LS-MASs). Different than classical multi-agent systems, an LS-MAS brings new challenges to control design due to its large number of agents. It might be more difficult for developing the appropriate control to achieve complicated missions such as collective swarming. To address these challenges, a novel mixed game theory is developed with a hierarchical learning algorithm. In the mixed game, the LS-MAS is represented as a multi-group, large-scale leader–follower system. Then, a cooperative game is used to formulate the distributed swarm control for multi-group leaders, and a Stackelberg game is utilized to couple the leaders and their large-scale followers effectively. Using the interaction between leaders and followers, the mean field game is used to continue the collective swarm behavior from leaders to followers smoothly without raising the computational complexity or communication traffic. Moreover, a hierarchical learning algorithm is designed to learn the intelligent optimal distributed swarm control for multi-group leader–follower systems. Specifically, a multi-agent actor–critic algorithm is developed for obtaining the distributed optimal swarm control for multi-group leaders first. Furthermore, an actor–critic–mass method is designed to find the decentralized swarm control for large-scale followers. Eventually, a series of numerical simulations and a Lyapunov stability proof of the closed-loop system are conducted to demonstrate the performance of the developed scheme.

1. Introduction

The concept of swarming in multi-agent systems (MASs) has been adopted from the biological swarming behavior in nature ranging from bacteria to more advanced mammals [1]. Examples of swarming behavior include flock of birds [2], school of fish [3], and cooperation of ants [4], which form groups of MASs to achieve certain tasks such as threatening predators, foraging food, and energy-efficient flying during migration. Over the past few decades, the biological swarm system has been widely adopted by researchers [5,6,7,8,9,10,11,12,13]. A survey of the recent development in the control and optimization of swarm systems was presented in [14]. Aside from that, the cooperative control problem for the swarming system was studied in [15]. Skobelev et al. [16] proposed a prototype system using a swarm of unmanned aerial vehicles (UAVs), which includes coordinated flight plans with reconfiguration in the presence of disruptive events. In [17], a new formation control scheme with smooth distributed consensus control for multi-UAV systems was developed. The authors in [18] proposed intelligent flight decision techniques for UAVs for cooperative target tracking using an end-to-end, cooperative, multi-agent reinforcement learning scheme. Zhao et al. [19] developed a flocking obstacle avoidance algorithm with shared obstacle information. Despite all of these recent efforts, the traditional cooperative control of large-scale multi-agent systems (LS-MASs) suffers from the well-known “Curse of Dimensionality” [20] and communication reliability [21] problem.
To address these challenges, a mixed game theory is proposed and utilized to formulate the optimal swarming problem for large scale multi-agent systems. The overall structure of the developed mixed game theoretical swarming is shown in Figure 1. First, the LS-MAS is reformulated as a multi-group large-scale leader–follower system by dividing a large number of agents into several subgroups with a leader and a large number of followers in each group. The objective of each group is to achieve optimal distributed swarming behavior. To achieve desired swarming, a mixed game theory was developed by seamlessly integrating (1) the cooperative game [22,23] to ensure the collective swarming behavior among multi-group leaders, where the leaders from each group cooperate with other leaders to maintain the overall multi-group swarming while avoiding inter-group collision, (2) the Stackelberg game [24,25] to connect the leader with its corresponding followers, where a set of coupling functions is introduced by the Stackelberg game in order to maintain leader–follower cohesion in each group, which guarantees that the followers from each group successfully achieve the swarming behavior of their respective leaders, and (3) the mean field game (MFG) [26,27,28] between non-cooperative followers used to continue the collective swarming behavior from the leaders to large-scale followers. The MFG deal involves the “Curse of Dimensionality” and the communication complexity problem of a traditional cooperative game. A probability density function (PDF) (i.e., mass) is employed in the MFG to replace the large number of agents’ state information. Since the mass function has the same dimension as the state space, it eliminates the correlation between rising computing complexity and the agents’ increasing population by reducing the dimension of the cost function.
However, to obtain the mass function, a new partial differential equation (PDE) (i.e., the Fokker–Planck–Kolmogorov (FPK) [29] equation), is adopted. Then, the optimal decentralized swarming control for large-scale followers in each group can be obtained by solving coupled Hamiltonian–Jacobi–Bellman (HJB) and FPK equations [30]. Meanwhile, the distributed optimal swarming control for multi-group leaders can be solved by using the multi-player cooperative HJB equation. However, it is very difficult and even impossible to solve those PDEs in real time due to its complexity and coupling. Hence, adaptive dynamic programming [31] and a reinforcement learning [32] algorithm were adopted. Specifically, a hierarchical learning structure was developed to obtain the optimal swarming control for the multi-group leader–follower system. This includes (1) multi-agent actor–critic-based distributed optimal swarm control for multi-group leaders and (2) actor–critic-mass (ACM)-based decentralized swarm control for large-scale followers. In ACM learning-based control, the actor neural network (NN) is used to approximate the optimal decentralized control, the critic NN is used for approximating the optimal evaluation function, and the mass NN approximates the FPK solution. The main contributions of the article are as follows:
  • A novel mixed game theory is developed with cooperative leaders and non-cooperative followers in order to achieve multi-group optimal swarming control which addresses the challenge of the curse of dimensionality and unrealistic communication.
  • A hierarchical learning structure with actor–critic-based, leader-distributed swarming and actor–critic–mass-based, large-scale followers decentralized swarming is implemented in real time to learn the solution of the overall intelligent optimal swarming control.
The structure of this paper is as follows. Section 2 provides the significance of the developed algorithm, and Section 3 includes the problem formulation. In Section 4, the novel mixed game hierarchical learning-based intelligent optimal swarming control scheme is developed. Then, the numerical simulation is shown in Section 5 to demonstrate the effectiveness of the proposed design.

2. Significance of Mixed Game Theory-Based Intelligent Distributed Swarm Control

The traditional cooperative control [33,34,35] of multi-agent systems requires communication among all agents to achieve optimal control, which encounters issues with significant computational complexity and the requirement of low-latency communication in real time. In particular, when a swarm contains a massive population which is known as an LS-MAS, it suffers from the following challenges: (1) a high-quality communication network is required for exchanging information in an LS-MAS to achieve the conventional cooperative swarming behavior, although in practice, maintaining these communication networks is unrealistic, and (2) each agent must be aware of the states of the other agents to accomplish the desired multi-group swarming behavior. As the number of agents increases, the computational complexity problem increases substantially, which brings about the well-known “Curse of Dimensionality” [20] problem. Aside from that, most of the existing studies focus on the swarming behavior of single-group MASs with a limited number of agents while avoiding multi-group large-scale MASs, whose examples in a practical environment are ubiquitous. For instance, dividing an MAS into multiple groups can help it effectively handle multi-task missions, whereas a single group cannot, such as in predator formation during multi-prey hunting or cooperative searching for multiple objectives. Recently, several studies in MASs addressed this issue to some extent. Zhang et al. [36] developed a type of multi-group formation tracking control, where the agents are divided into several subgroups to form different desired sub-formations. A distributed impulsive control method with and without the input delay for multi-group formation tracking control was presented in [37]. Moreover, collision avoidance among multi-group UAVs was studied in [38]. In this paper, the developed mixed game theory-based distributed control addressed the challenges of the existing single-group traditional multi-agent cooperative control. At first, the large-scale multi-agent system is divided into multiple groups. Then, each group is reformulated with a leader and a large number of followers. The leader from each group plays the cooperative game to achieve the optimal swarming behavior for all the agents and ensure collision avoidance between the group of agents. The leader just needs to know the PDF information of the followers’ group instead of knowing the state information of each follower. This reduces the dimension of the cost function significantly. However, the leaders still need to exchange the state information with other group leaders. The communication cost of exchanging information is still low compared with a large number of agents, as the agents are divided into a few of subgroups. On the other hand, this upper-level communication between leaders significantly ensures the optimal swarming behavior and collision avoidance of the group of agents. In addition, the followers from each group are guided by their respective leaders and play a non-cooperative game inside the group to reduce the computational complexity and communication problem significantly. To validate the effectiveness of the developed algorithm, the performance of the developed method is further compared against the existing traditional cooperative control method in the simulation (Section 5.2).

3. Problem Formulation

Consider an LS-MAS being reformulated as a multi-group large-scale leader–follower system, with each group having a leader with a large number of followers. Then, assume there are M groups with M leaders as well as N F i being the number of followers in the ith group. Next, the dynamics of the leader and follower q in group i are defined as follows.
Leader:
d x L i = [ F a ( x L i ) + G a ( x L i ) u L i ] d t + 2 ν i d ω L i
where x L i R m denotes the system state and u L i R n denotes the control input. Moreover, ω L i R m denotes independent wiener processes which represents the environmental noise, and ν i is a non-negative parameter. The functions F a ( . ) and G a ( . ) represent the intrinsic dynamics of the leaders.
Follower:
d x F , q i = [ F s ( x F , q i ) + G s ( x F , q i ) u F , q i ] d t + 2 ν i ω F , q i
where x F , q i R m is the state and u F , q i R n is the control input of the follower q. Moreover, ω F , q i R m denotes the wiener process. The functions F s ( . ) and G s ( . ) represent the intrinsic dynamics of the followers.

3.1. Multi-Group Optimal Swarming Control Formulation

In this section, the mixed game theory is designed to formulate the optimal swarming control for multi-group large-scale leader–follower systems. Next, the details of the developed scheme are given as follows:
Collective Swarming among Multi-Group Leaders: To achieve the collective swarming behavior for all groups in the system, a predefined reference trajectory x d ( t ) R m is given to all leaders ahead of the mission. Next, the desired formation vector with respect to the reference trajectory for a group’s ith leader can be denoted as λ L i R m . Then, the desired trajectory for the leader in group i is defined as follows:
x L , d i ( t ) = x d ( t ) + λ L i
Then, the tracking error of the leader i is defined as e L i ( t ) = x L i ( t ) x L , d i ( t ) with the tracking error dynamic as follows:
d e L i = d x L i d x L , d i = [ F a r ( e L i ) + G a r ( e L i ) u L i ] d t + 2 ν i d ω L i
where F a r ( e L i ) = F a ( e L i + x L , d i ) ( d x L , d i / d t ) and G a r ( e L i ) = G a ( e L i + x L , d i ) . Additionally, the tracking error function for the leader in group i is given as
Φ SE , L ( x L i ) = x L i x L , d i Q SE , L 2
where Q SE , L is a matrix. To achieve the common goal for all groups (i.e., the collective swarming behavior), each leader needs to communicate with the leaders in the neighborhood groups to avoid inter-group collisions. Let G = { A , V , E } is a graph that describes the connection among the leaders of M groups, with A = [ a i j ] R M × M is an adjacency matrix (i.e., A T = A ). In addition, V ( G ) = { 1 , 2 , . . . , M } denotes the set of leader vertices, and E V × V is the set of edges. The element a i j of the adjacency matrix is defined as a i j = a j i = 1 if ( i , j ) E (i.e., the leaders of group i and j are connected); otherwise, a i j = a j i = 0 . Moreover, we assume that a i i = 0 . To define the neighborhood of each leader, sensing and communicating distance is needed. Let h > 0 denote the communicating distance. Then, the neighborhood set M L i of the leader of group i is defined as follows:
M L i ( t ) = { j V : x L i ( t ) x L j ( t ) Q 1 2 < h , j i }
where Q 1 is a positive definite matrix. Additionally, the communication between group leader i and j can be possible if j M L i ( t ) . Furthermore, collision avoidance between leaders and their respective followers needs to be addressed for multi-group leader–follower swarming control in order to ensure safe path planning and guide the group to their desired swarming movement. The cost function for collision among leaders is
Φ CA , L ( x L i , x L i ) = w CA , L j M L i a i , j exp { x L i x L j Q CA , L 2 d L i , j } 1 1
where Q CA , L is a weighting matrix, w CA , L is a weighting parameter, and x L i = { x L j } j M L i . In addition, d L i , j is the separating distance between leaders i and j, which is chosen so that it always ensures the avoidance of collision between two groups (i.e., collisions between any two leaders and their corresponding followers). Moreover, to achieve collective swarming behavior while avoiding group collision, the cohesion between the leader and large-scale followers needs to be maintained. In this regard, the leader–follower coupling function is introduced as follows.
Leader–Follower Coupling Functions for Swarming: To achieve a group swarming behavior, a set of coupling functions is defined by the Stackelberg game ([24]). The coupling function that forces the leader to keep cohesion with their corresponding followers is defined as follows:
Φ CP , L ( x L i , ρ i ) = w CP , L x L i E { x F i } Q CP , L 2
where ρ i ( x F i , t ) is the probability density function (PDF) of the large-scale followers’ state in the ith group, E { x F i } is the expected value, and w CP , L and Q CP , L are the weighting matrix and weighting parameter, respectively. Now, the leader–follower swarming coupling function is
Φ CP , F ( x F , q i , x L i ) = w CP , F exp { r i x F , q i x L i Q CP , F 2 } 1 1
where Q CP , F is a weighting matrix, w CP , F is a weighting parameter, and r i is the maximum safe distance of the ith group’s followers. This distance r i ensures inter-group collision avoidance by keeping the followers within the safe distance limit.
Leader–Follower Coupling Functions for Swarming: The followers from each group track their respective leaders in order to achieve the group swarming behavior. The tracking error of the follower q in group i is derived as e F , q i ( t ) = x F , q i ( t ) x L i ( t ) , with the tracking error dynamics
d e F , q i = F s r ( e F , q i ) + G s r ( e F , q i ) u F , q i d t + 2 ν i d ω F , q i
where
F s r ( e F , q i ) = F s ( e F , q i + x L i ) F a ( x L i ) ; G s r ( e F , q i ) = G s ( e F , q i + x L i ) G a ( x L i ) u L i u F , q i
Now, the followers’ tracking error cost function can be derived as follows:
Φ SE , F ( x F , q i ) = x F , q i x L i Q SE , F 2
where Q SE , F is a positive definite matrix. Then, the same group large scale follower collision avoidance function is as follows:
Φ CA , F ( x F , q i , ρ i ) = w CA , F x F i ρ i ( x F i , t ) [ ε 2 + x F , q i E { x F i } Q CA , F 2 ] β d x F i
where Q CA , F and w CA , F are the weighting matrix and weighting parameter, respectively. Moreover, ε and β are positive constants. Furthermore, the cohesion function of the followers with their respective group center can be derived as follows:
Φ C , F ( x F , q i , ρ i ) = w C , F x F , q i E { x F i } Q C , F 2
with the weighting matrix Q C , F and weighting parameter w C , F . This function helps each follower from the same group to stay close to the other members of the group.
Overall Optimal Swarming Control for Multi-Group LS-MAS: The goal of the leaders is to achieve overall swarming control by minimizing the following cost function.
Leader:
V L ( x L , ρ ) = i = 1 M V L i ( x L i , x L i , ρ i ) = i = 1 M E 0 Φ L ( x L i , x L i , u L i ) + Φ CP , L ( x L i , ρ i ) d t
where Φ L ( x L i , x L i , u L i ) = Φ SE , L ( x L i ) + Φ CA , L ( x L i , x L i ) + u L i R L i 2 , V L i is the cost function of the ith group leader and R L i is the positive weight matrix. The functions Φ SE , L ( x L i ) , Φ CA , L ( x L i , x L i ) , and Φ CP , L ( x L i , ρ i ) derived in Equations (5), (7), and (8) are the tracking error, collision avoidance, and coupling function of the leader, respectively. Moreover, the objective of the follower q in the group i is to minimize the following cost function to achieve the group swarming behavior.
Follower:
V F , q i ( x F , q i , x L i , ρ i ) = E 0 Φ F ( x F , q i , x L i , u F , q i ) + Φ CA , F ( x F , q i , ρ i ) + Φ C , F ( x F , q i , ρ i ) d t s . t . x F , q i , o x L i < r i
where Φ F ( x F , q i , x L i , u F , q i ) = Φ SE , F ( x F , q i ) + Φ CP , F ( x F , q i , x L i ) + u F , q i R F , q i 2 , R F , q i is the positive weight matrix and x F , q i , o is the initial state of the follower q. The functions Φ SE , F ( x F , q i ) , Φ CP , F ( x F , q i , x L i ) , Φ C , F ( x F , q i , ρ i ) , and Φ CA , F ( x F , q i , ρ i ) derived in Equations (9) and (11)–(13) are the follower tracking error, coupling, cohesion, and collision avoidance functions, respectively. Here, the initial states of the followers from all groups are subject to a constraint in order to keep the followers inside a specific region defined by the distance r i . This distance is the radius of an enclosing circle which acts as a collision-avoiding region for the respective groups.

3.2. Mixed Game Theory-Based Multi-Group, Large-Scale Leader–Follower-Distributed Optimal Swarming Control

To achieve the overall swarming behavior for all groups, the overall cost function of the leaders’ (Equation (14)) and followers’ individual cost function (Equation (15)) need to be minimized. Now, the leader-follower Hamiltonian can be obtained using Bellman’s principle of optimality [39] and optimal control [20]. The Hamiltonian of all the leaders is as follows:
H L [ x L , x L V L ( x L , ρ ) ] = i = 1 M H L i [ x L i , x L i V L i ( x L i , x L i , ρ i ) ]
with individual leaders distributed Hamiltonian:
H L i [ x L i , x L i V L i ( x L i , x L i , ρ i ) ] = E Φ L ( x L i , x L i , u L i ) + Φ CP , L ( x L i , ρ i ) + x L i V L i T ( x L i , x L i , ρ i ) [ F a ( x L i ) + G a ( x L i ) u L i ]
The Hamiltonian for the follower q from the ith group is derived as follows:
H F , q i [ x F , q i , x F , q i V F , q i ( x F , q i , x L i , ρ i ) ] = E Φ F ( x F , q i , x L i , u F , q i ) + Φ CA , F ( x F , q i , ρ i ) + Φ C , F ( x F , q i , ρ i ) + x F , q i V F , q i T ( x F , q i , x L i , ρ i ) + [ F s ( x F , q i ) + G s ( x F , q i ) u F , q i ]
Now, the multi-group leaders Hamiltonian–Jacobi–Bellman (HJB) equation can be derived from the cooperative game is as follows:
E Φ CP , L ( x L i , ρ i ) = E t V L i ( x L i , x L i , ρ i ) 2 ν i Δ V L i ( x L i , x L i , ρ i ) + H L i [ x L i , x L i V L i ( x L i , x L i , ρ i ) ]
Furthermore, a coupled HJB and Fokker–Planck–Kolmogorov(FPK) equations for the large number of followers, using MFG can be obtained as follows:
Follower (HJB):
E Φ CA , F ( x F , q i , ρ i ) + Φ C , F ( x F , q i , ρ i ) = E t V F , q i ( x F , q i , x L i , ρ i ) 2 ν i Δ V F , q i ( x F , q i , x L i , ρ i ) + H F , q i [ x F , q i , x F , q i V F , q i ( x F , q i , x L i , ρ i ) ]
Follower (FPK):
E t ρ i ( x F , q i , t ) 2 ν i Δ ρ i ( x F , q i , t ) d i v ( ρ i D p H F , q i [ x F , q i , x F , q i V F , q i ( x F , q i , x L i , ρ i ) ] ) = 0
Then, according to optimal control theory, the mixed game-based distributed optimal swarming control for a multi-group leader–follower system can be attained as follows:
Leader:
u L i ( x L i ) = 1 2 E R L i 1 G a T ( x L i ) x L i V L i ( x L i , x L i , ρ i )
Follower:
u F , q i ( x F , q i ) = 1 2 E R F , q i 1 G s T ( x F , q i ) x F , q i V F , q i ( x F , q i , x L i , ρ i )
Remark 1.
The coupled HJB-FPK equations of the follower and the HJB equation for the leader must be solved in real time in order to determine the optimal control policy. The backward HJB and forward FPK equations, however, are coupled multidimensional nonlinear PDEs that are difficult to solve. Therefore, in this paper, a hierarchical learning-based, multi-actor-critic–mass NN is developed to learn the optimal control online.

4. Hierarchical Learning-Based Intelligent Optimal Distributed Swarming Control

4.1. Hierarchical Learning-Based Control for Multi-Group Leader–Follower Systems

In this section, a hierarchical learning-based actor–critic–mass algorithm (see Figure 2) is developed and explained in detail. It includes (1) multi-agent actor–critic neural networks (NNs) to obtain the cooperative game-based distributed optimal swarm control for multi-group leaders and (2) actor–critic–mass-based decentralized control for large-scale followers within the same game to obtain the mean field game-based swarming. Additionally, the Stackelberg game integrates distributed swarming control for leaders and decentralized control for followers into a unified framework and further obtains the overall distributed intelligent swarming control for multi-group leader–follower systems.
Ideal Actor–Critic Set-up for Leaders:
Critic : V L i * ( x L i , x L i , ρ i ) = E W V , L i T ϕ V , L i + ε HJB L i Actor : u L i * ( x L i , x L i , ρ i ) = E W u , L i T ϕ u , L i + ε u , L i
Ideal Actor–Critic–Mass Set-up for Followers:
Critic : V F , q i * ( x F , q i , x L i , ρ i ) = E W V , F , q i T ϕ V , F , q i + ε HJB F , q i Actor : u F , q i * ( x F , q i , x L i , ρ i ) = E W u , F , q i T ϕ u , F , q i + ε u , F , q i Mass : ρ i ( x F , q i , t ) = E W ρ , F , q i T ϕ ρ , F , q i + ε FPK F , q i
where W V , L i , W u , L i , W V , F , q i , W u , F , q i , and W ρ , F , q i are the respective weights for the leader and follower critic, actor, and mass neural network (NN). Also, ϕ ( . ) is the activation function and ε is the reconstruction error of the respective neural networks. Then, the estimated cost and control functions of the leader are as follows:
Estimated Actor-Critic for Multi-Group Leaders:
Critic : V ^ L i ( x L i , x L i , ρ ^ ¯ i ) = E W ^ V , L i T ϕ ^ V , L i Actor : u ^ L i ( x L i , x L i , ρ ^ ¯ i ) = E W ^ u , L i T ϕ ^ u , L i
Please be aware that the leader from each group collects the estimated PDF from each decentralized follower in the same group. Then, the leader evaluates the statistic average of the received PDF (i.e., ρ ^ ¯ i = F , q ρ ^ F , q i ). Next, The residual errors that result from substituting the approximation in Equation (26) into Equation (19) can be utilized to adjust the weights of the leader actor and critic neural networks by using the residual errors given as following equations:
E e HJB L i = E Φ CP , L ( x L i , ρ ^ ¯ i ) + W ^ V , L i T [ t ϕ ^ V , L i + 2 ν i Δ ϕ ^ V , L i H ^ L , W i ]
E e u , L i = E W ^ u , L i T ϕ ^ u , L i + 1 2 R L i 1 G a T ( x L i ) x L i ϕ ^ V , L i
where the estimated Hamiltonian of the leader H ^ L i is defined as
H ^ L i = H L i [ x L i , x L i ϕ ^ V , L i ] = W ^ V , L i T H ^ L , W i
Now, let
E Ψ V , L i ( x L i , x L i , ρ ^ ¯ i ) = E t ϕ ^ V , i + 2 ν i Δ ϕ ^ V , i H ^ L , W i
and
E Φ CP , L ( x L i , ρ ^ ¯ i ) = E Φ CP , L ( x L i , ρ ˜ ¯ i ) + Φ CP , L ( x L i , ρ i )
Then, the estimation error from the Equation (27) is therefore rewritten follows:
E e HJB L i = E Φ CP , L ( x L i , ρ i ) + Φ CP , L ( x L i , ρ ˜ ¯ i ) + W ^ V , L i T Ψ V , L i ( x L i , x L i , ρ ^ ¯ i )
Next, the effect of the reconstruction error is considered by substituting the optimal function from Equation (24) to the HJB Equation (19):
E Φ CP , L ( x L i , ρ i ) + W V , L i T Ψ V , L i ( x L i , x L i , ρ ¯ i ) + ε HJB L i = 0
By substituting Equation (30) into Equation (29), the following HJB error equation can be obtained with reconstruction error as:
E e HJB L i = E Φ CP , L ( x L i , ρ ˜ ¯ i ) W ˜ V , L i T Ψ ^ V , L i ( x L i , x L i , ρ ^ ¯ i ) W V , L i T Ψ ˜ V , L i ( x L i , x L i , ρ ˜ ¯ i ) ε HJB L i
Again, the leader actor NN error is as follows:
E e u , L i = E W ˜ u , L i T ϕ ^ u , L i ( x L i , x L i , ρ ^ ¯ i ) W u , L i T ϕ ˜ u , L i ( x L i , x L i , ρ ˜ ¯ i ) 1 2 R L i 1 G a T ( x L i ) x L i V ˜ L i ε u , L i
where W ˜ V , L i = W V , L i W ^ V , L i and W ˜ u , L i = W u , L i W ^ u , L i . In addition, the approximated functions can be represented as follows:
E Ψ V , L i ( x L i , x L i , ρ ˜ ¯ i ) = E Ψ V , L i ( x L i , x L i , ρ i ) Ψ ^ V , L i ( x L i , x L i , ρ ^ ¯ i ) E ϕ ˜ u , L i ( x L i , x L i , ρ ˜ ¯ i ) = E ϕ ˜ u , L i ( x L i , x L i , ρ i ) ϕ ^ u , L i ( x L i , x L i , ρ ^ ¯ i )
Next, the multi-agent actor–critic NNs for the leader are updated by using residual errors (Equations (31) and (32)) with the gradient descent method as follows.
Actor–Critic Update Laws for Multi-Group Leaders:
E W ^ ˙ V , L i = E α V , L i Ψ ^ V , L i ( x L i , x L i , ρ ^ ¯ i ) e HJB L i T 1 + Ψ ^ V , L i ( x L i , x L i , ρ ^ ¯ i ) ) 2
E W ^ ˙ u , L i = E α u , L i ϕ ^ u , L i ( x L i , x L i , ρ ^ ¯ i ) e u , L i T 1 + ϕ ^ u , L i ( x L i , x L i , ρ ^ ¯ i ) 2
Next, the decentralized optimal swarming control for large-scale followers is as follows.
Estimated Actor–Critic Mass for Followers in the Same Group:
Critic : V ^ F , q i ( x F , q i , x L i , ρ ^ F , q i ) = E W ^ V , F , q i T ϕ ^ V , F , q i Actor : u ^ F , q i ( x F , q i , x L i , ρ ^ F , q i ) = E W ^ u , F , q i T ϕ ^ u , F , q i Mass : ρ ^ F , q i ( x F , q i , t ) = E W ^ ρ , F , q i T ϕ ^ ρ , F , q i
The residual error after the substitution of (35) into the HJB and FPK equations (Equations (20) and (21)):
E e HJB F , q i = E Φ CA , F ( x F , q i , ρ ^ F , q i ) + Φ C , F ( x F , q i , ρ ^ F , q i ) + W ^ V , F , q i T t ϕ ^ V , F , q i + 2 ν i Δ ϕ ^ V , F , q i H ^ F , q , W i ]
E e u , F , q i = E W ^ u , F , q i T ϕ ^ u , F , q i + 1 2 R F , q i 1 G s T ( x F , q i ) x F , q i ϕ ^ V , F , q i
E e FPK F , q i = E W ^ ρ , F , q i T t ϕ ^ ρ , F , q i 2 ν i Δ ϕ ^ ρ , F , q i d i v ( ϕ ^ ρ , F , q i ) D p H ^ F , q i
where
H ^ F , q i = H F , q i [ x F , q i , x F , q i ϕ ^ V , F , q i ] = W ^ V , F , q i T H ^ F , q , W i
Again, let
E Ψ V , F , q i ( x F , q i , x L i , ρ ^ F , q i ) = E t ϕ ^ V , F , q i + 2 ν i Δ ϕ ^ V , F , q i H ^ F , q , W i E Ψ ρ , F , q i ( x F , q i , V ^ F , q i , t ) = E t ϕ ^ ρ , F , q i 2 ν i Δ ϕ ^ ρ , F , q i d i v ( ϕ ^ ρ , F , q i ) D p H ^ F , q i E Φ CA , F ( x F , q i , ρ ˜ F , q i ) = E Φ CA , F ( x F , q i , ρ F , q i ) Φ CA , F ( x F , q i , ρ ^ F , q i ) E Φ C , F ( x F , q i , ρ ˜ F , q i ) = E Φ C , F ( x F , q i , ρ F , q i ) Φ C , F ( x F , q i , ρ ^ F , q i )
The estimation errors in Equations (36) and (38) can be simplified as follows:
E e HJB F , q i = E Φ CA , F ( x F , q i , ρ F , q i ) + Φ CA , F ( x F , q i , ρ ˜ F , q i ) + Φ C , F ( x F , q i , ρ F , q i ) + Φ C , F ( x F , q i , ρ ˜ F , q i ) + W ^ V , F , q i T Ψ V , F , q i ( x F , q i , x L i , ρ ^ F , q i )
E e FPK F , q i = E W ^ ρ , F , q i T Ψ ρ , F , q i ( x F , q i , V ^ F , q i , t )
By substituting the optimal functions from Equation (25) into Equations (20) and (21), we obtain
E Φ CA , F ( x F , q i , ρ F , q i ) + Φ C , F ( x F , q i , ρ F , q i ) + W V , F , q i T Ψ V , F , q i ( x F , q i , x L i , ρ F , q i ) + ε HJB F , q i = 0
E W ρ , F , q i T Ψ ρ , F , q i ( x F , q i , V F , q i , t ) + ε FPK F , q i = 0
By substituting Equations (41) and (42) into Equations (39) and (40), respectively, we obtain
E e HJB F , q i = E Φ CA , F ( x F , q i , ρ ˜ F , q i ) + Φ C , F ( x F , q i , ρ ˜ F , q i ) W ˜ V , F , q i T Ψ ^ V , F , q i ( x F , q i , x L i , ρ ^ F , q i ) W V , F , q i T Ψ ˜ V , F , q i ( x F , q i , x L i , ρ ˜ F , q i ) ε HJB F , q i
E e FPK F , q i = E W ˜ ρ , F , q i T Ψ ^ ρ , F , q i ( x F , q i , V ^ F , q i , t ) W ρ , F , q i T Ψ ˜ ρ , F , q i ( x F , q i , V ˜ F , q i , t ) ε FPK F , q i
Similarly, we can obtain
E e u , F , q i = E W ˜ u , F , q i T ϕ ^ u , F , q i ( x F , q i , x L i , ρ ^ F , q i ) W u , F , q i T ϕ ˜ u , F , q i ( x F , q i , x L i , ρ ˜ F , q i ) 1 2 R F , q i 1 G s T ( x F , q i ) x F , q i V ˜ F , q i ε u , F , q i
where W ˜ V , F , q i = W V , F , q i W ^ V , F , q i , W ˜ u , F , q i = W u , F , q i W ^ u , F , q i and W ˜ ρ , F , q i = W ρ , F , q i W ^ ρ , F , q i . In addition, we have
E Ψ ˜ V , F , q i ( x F , q i , x L i , ρ ˜ F , q i ) = E Ψ V , F , q i ( x F , q i , x L i , ρ F , q i ) Ψ ^ V , F , q i ( x F , q i , x L i , ρ ^ F , q i ) E { ϕ ˜ u , F , q i ( x F , q i , x L i , ρ ˜ F , q i ) = E ϕ u , F , q i ( x F , q i , x L i , ρ F , q i ) ϕ ^ u , F , q i ( x F , q i , x L i , ρ ^ F , q i ) E Ψ ˜ ρ , F , q i ( x F , q i , V ˜ F , q i , t ) = E Ψ ρ , F , q i ( x F , q i , V F , q i , t ) Ψ ^ ρ , F , q i ( x F , q i , V ^ F , q i , t )
Using the residual errors in Equations (43)–(45), the actor–critic-mass NNs for the followers are tuned by using gradient descent method as follows.
Actor–Critic–Mass Update Laws for Followers:
E W ^ ˙ V , F , q i = E α V , F , q i Ψ ^ V , F , q i ( x F , q i , x L i , ρ ^ F , q i ) e HJB F , q i T 1 + Ψ ^ V , F , q i ( x F , q i , x L i , ρ ^ F , q i ) 2
E W ^ ˙ u , F , q i = E α u , F , q i ϕ ^ u , F , q i ( x F , q i , x L i , ρ ^ F , q i ) e u , F , q i T 1 + ϕ ^ u , F , q i ( x F , q i , x L i , ρ ^ F , q i ) 2
E W ^ ˙ ρ , F , q i = E α ρ , F , q i Ψ ^ ρ , F , q i ( x F , q i , V ^ F , q i , t ) e FPK F , q i T 1 + Ψ ^ ρ , F , q i ( x F , q i , V ^ F , q i , t ) 2
where α V , L i , α u , L i , α V , F , q i , α u , F , q i , and α ρ , F , q i are the learning rates:
Remark 2.
The functions Ψ ^ V , L i ( x L i , x L i , ρ ^ ¯ i ) , ϕ ^ u , L i ( x L i , x L i , ρ ^ ¯ i ) , Ψ ^ V , F , q i ( x F , q i , x L i , ρ ^ F , q i ) , ϕ ^ u , F , q i ( x F , q i , x L i , ρ ^ F , q i ) , and Ψ ^ ρ , F , q i ( x F , q i , V ^ F , q i , t ) must satisfy the persistent excitation (PE) condition [20] in order to let the weights converge.

4.2. Optimal Swarming Control Performance Analysis

The performance of all NNs as well as the closed-loop swarming system stability are given in this section:
Theorem 1.
Let E { W ^ V , L i } and E { W ^ V , F , q i } be updated as Equations (33) and (46) with the learning rates α V , L i > 0 and α V , F , q i > 0 , respectively. Then, the error between the actual and approximated critic NN weights E { W ˜ V , L i } and E { W ˜ V , F , q i } as well as the optimal evaluation function approximation errors (i.e., E { V ˜ L i } = E { V L i V ^ L i } and E { V ˜ F , q i } = E { V F , q i V ^ F , q i } ) are uniformly ultimately bounded (UUB). Moreover, E { W ^ V , L i } , E { W ˜ V , F , q i } , E { V ˜ L i } , and E { V ˜ F , q i } are asymptotically stable while the reconstruction errors are sufficiently small. Next, the bounds for the evaluation function approximation errors V ˜ L i and V ˜ F , q i are as follows:
E V ˜ L i ( t ) = E W ˜ V , L i T ϕ ^ V , L i + W V , L i T ϕ ˜ V , L i + ε HJB L i E W ˜ V , L i T ϕ ^ V , L i + l ϕ V , L i E W V , L i | ρ ˜ ¯ i + E ε HJB L i b W , V , L i E ϕ ^ V , L i + l ϕ V , L i E W V , L i b ¯ ρ , F , q i + E ε HJB L i b V , L i ( t )
Similarly, we have
E V ˜ F , q i ( t ) = E W ˜ V , F , q i T ϕ ^ V , F , q i + W V , F , q i T ϕ ˜ V , F , q i + ε HJB F , q i E W ˜ V , F , q i T ϕ ^ V , F , q i + l ϕ V , F , q i E W V , F , q i ρ ˜ F , q i + E ε HJB F , q i b W , V , F , q i E ϕ ^ V , F , q i + l ϕ V , F , q i E W V , F , q i b ρ , F , q i + E { ε HJB F , q i b V , F , q i ( t )
where l ϕ v , L i and l ϕ v , F , q i are the Lipschitz constants of the critic activation functions ϕ V , L i and ϕ V , F , q i , respectively. Additionally, b ¯ ρ , F , q i can be calculated by taking the average of the mass bound b ρ , F , q i of each follower.
Proof. 
See Appendix A. □
Theorem 2.
Let E { W ^ ρ , F , q i } be updated as in Equation (48), where the learning rate α ρ , F , q i > 0 . Then, the error between the actual and approximated mass NN weights E { W ˜ ρ , F , q i } as well as the mass function approximation errors (i.e., E { ρ ˜ F , q i } = E { ρ F , q i ρ ^ F , q i } ) are uniformly ultimately bounded (UUB). Moreover, E { W ˜ ρ , F , q i } and E { ρ ˜ F , q i } are asymptotically stable while the reconstruction errors are sufficiently small. The bound for the mass approximation errors ρ ˜ F , q i is as follows:
E ρ ˜ F , q i ( t ) = E W ˜ ρ , F , q i T ϕ ^ ρ , F , q i + ε FPK F , q i E W ˜ ρ , F , q i T ϕ ^ ρ , F , q i + E ε FPK F , q i b W , ρ , F , q i E ϕ ^ ρ , F , q i + E { ε FPK F , q i b ρ , F , q i ( t )
Proof. 
See Appendix B. □
Theorem 3.
Let E { W ^ u , L i } and E { W ^ u , F , q i } be updated as in Equations (34) and (47), where the learning rates are α u , L i > 0 and α u , F , q i > 0 , respectively. Then, the error between the actual and approximated actor NN weights E { W ˜ u , L i } and E { W ˜ u , F , q i } as well as the optimal control approximation errors (i.e., E { u ˜ L i } = E { u L i u ^ L i } and E { u ˜ F , q i } = E { u F , q i u ^ F , q i } , respectively) are uniformly ultimately bounded (UUB). Moreover, E { W ˜ u , L i } , E { W ˜ u , F , q i } , E { u ˜ L i } , and E { u ˜ F , q i } are asymptotically stable while the reconstruction errors are sufficiently small. Moreover, the bounds for the actor approximation errors u ˜ L i and u ˜ F , q i are as follows:
E u ˜ L i ( t ) = E W ˜ u , L i T ( t ) ϕ ^ u , L i + W u , L i T ϕ ˜ u , L i + ε u , L i E W ˜ u , L i T ϕ ^ u , L i + l ϕ u , L i E W u , L i ρ ˜ ¯ i + E ε u , L i b W , u , L i E ϕ ^ u , L i + l ϕ u , L i E W u , L i b ¯ ρ , F , q i + E ε u , L i b u , L i ( t )
In addition, we have
E u ˜ F , q i ( t ) = E W ˜ u , F , q i T ϕ ^ u , F , q i + W u , F , q i T ϕ ˜ u , F , q i + ε u , F , q i E W ˜ u , F , q i T ϕ ^ u , F , q i + l ϕ u , F , q i E W u , F , q i ρ ˜ F , q i + E ε u , F , q i b W , u , F , q i E ϕ ^ u , F , q i + l ϕ u , F , q i E W u , F , q i b ρ , F , q i E ε u , F , q i b u , F , q i ( t )
where l ϕ u , L i and l ϕ u , F , q i are the Lipschitz constants of the actor activation functions ϕ u , L i and ϕ u , F , q i , respectively.
Proof. 
See Appendix C. □
Next, the closed-loop stability of a multi-group, large-scale leader–follower swarming system with developed hierarchical learning control is analyzed:
Lemma 1.
With the given stochastic error dynamics in Equations (4) and (10), there exist optimal control policies for leaders and followers u L i and u F , q i which satisfy
E e L i T F a r ( e L i ) + G a r ( e L i ) u L i + 2 ν i d ω L i d t γ 1 E e L i 2
E e F , q i T F s r ( e F , q i ) + G s r ( e F , q i ) u F , q i + 2 ν i d ω F , q i d t γ 2 E e F , q i 2
Theorem 4
(Closed-Loop Stability). Let the leaders’ and followers’ critic, mass, and actor NN weights be updated as in Equations (33)–(48). Additionally, consider the learning rates α V , L i , α V , F , q i , α u , L i , α u , F , q i , and α ρ , F , q i to be greater than zero. Then, E { W ˜ V , L i } , E { W ˜ V , F , q i } , E { W ˜ u , L i } , E { W ˜ u , F , q i } , E { W ˜ ρ , F , q } , E { V ˜ L i } , E { V ˜ F , q i } , E { u ˜ L i } , E { u ˜ F , q i } , E { ρ ˜ F , q i } , E { e L i } , and E { e F , q i } are all UUB. Moreover, E { W ˜ V , L i } , E { W ˜ V , F , q i } , E { W ˜ u , L i } , E { W ˜ u , F , q i } , E { W ˜ ρ , F , q } , E { V ˜ L i } , E { V ˜ F , q i } , E { u ˜ L i } , E { u ˜ F , q i } , E { ρ ˜ F , q i } , E { e L i } , and E { e F , q i } are asymptotically stable while the reconstruction errors are sufficiently small.
Proof. 
See Appendix D. □

5. Simulation Results

The developed mixed game theory and hierarchical learning-based intelligent distributed swarming control algorithm is implemented in a very large-scale unmanned aerial vehicle (UAV) system. The experiment aims to validate the effectiveness of the swarming behavior of multiple UAV groups by using the developed techniques.

5.1. Performance Evaluation of Mixed Game Theory-Based Intelligent Distributed Swarm Control

Let, there are four groups in an area scaled to 20 × 10 .
Each group had 1 leader and 500 followers. Each leader was given a predefined time-varying trajectory. Then, to produce the swarming behavior, each leader is tracked by his respective followers. Please note that the leader’s location is known to his corresponding followers.
Next, the selected initial positions of the leaders were x L 1 = 2.5 6.6 T , x L 2 = 2 6.2 T , x L 3 = 2.5 5.8 T , and x L 4 = 2.2 5.4 T . Moreover, the initial state of the followers from each group was generated using the following normal distribution. Group 1: N ( μ = [ 2.5 , 6.6 ] T , σ = 0.4 × I 2 ) ; Group 2: N ( μ = [ 2 , 6.2 ] T , σ = 0.4 × I 2 ) ; Group 3: N ( μ = [ 2.5 , 5.8 ] T , σ = 0.4 × I 2 ) ; and Group 4: N ( μ = [ 2.2 , 5.4 ] T , σ = 0.4 × I 2 ) .
Next, the time-varying reference trajectory was defined as follows:
x d ( t ) = 2.9 t + 5 0.01 sin ( 2.5 t ) + 6 T
Additionally, the intrinsic dynamics of the leaders were selected as follows:
F a ( x L i ) = x L , 1 i + 1 2 x L , 2 i 2 0.4 x L , 2 i 2 , G a ( x L i ) = 0 1
with x L i = x L , 1 i x L , 2 i T .
Similarly, the follower’s dynamics could be derived as follows:
F s ( x F , q i ) = x F , q , 1 i + 1 2 x F , q , 2 i 2 0.2 x F , q , 2 i 2 , G s ( x F , q i ) = 1 2
with x F , q i = x F , q , 1 i x F , q , 2 i T .
Furthermore, the parameters for evaluating the cost functions (Equations (14) and (15)) were defined as ν i = 0.02 , h = 1.5 , r i = r j = r = 0.5 , R L i = R F , q i = 5 , w CA , L = 5 , w CP , L = 5 , w CP , F = 5 , w CA , F = 5 , and w C , F = 5 . The total simulation time for this experiment was 15 s.
Next, hierarchical learning-based multi-agent actor–critic neural networks for the leaders and actor–critic–mass neural networks for the followers were constructed. The neural networks learning rate parameters were selected as α V , L i = 2 × 10 6 , α V , F , q i = 2 × 10 5 , α u , L i = 2 × 10 4 , α u , F , q i = 2 × 10 3 , and α ρ , F , q i = 1 × 10 3 .
The trajectories of the multi-group, large-scale UAVs are plotted in Figure 3 for times t = 0 s, t = 5 s, t = 10 s, and t = 15 s. The red curve represents the reference trajectory for all groups. Moreover, the leader trajectories are shown with green curves in the figure. For the follower trajectories, different colors have been used. Figure 3a shows the initial positions of all the leaders and followers and the reference trajectory. Then, the leaders and followers from all groups begin their motion from the left region and move toward the right region. The trajectories of the UAVs with swarming behavior are shown in Figure 3b–d. These figures clearly show that all the leaders tracked the reference trajectory and the followers tracked their respective leaders while avoiding collisions to achieve the collective swarming behavior.
The tracking performance of the leaders and followers from all the groups is verified in Figure 4 and Figure 5. Figure 4 shows the tracking error of the leaders from all four groups. From this figure, it is clear that the error converged to zero after a certain time period. This implies that the leaders attained the desired swarming behavior as time progressed. We also needed to ensure that each follower could track their respective leaders to achieve the group swarming behavior. In this regard, coupling functions Φ CP , L and Φ CP , F were used in the leader and follower cost functions to ensure that the followers could stay close to their respective leaders. To verify the performance of the developed method, the tracking errors of the followers with respect to their corresponding leaders are plotted in Figure 5. The average distance of all the followers with respect to their leaders from each group was calculated and plotted in Figure 5a. This figure implies that the tracking errors of the followers from all groups converged to zero over time. To demonstrate the performance clearly, the PDF of the follower tracking error is shown in Figure 5b. The yellow color from time 5 s to 15 s shows the tracking error with higher probability.
The neural network performance was evaluated by demonstrating the HJB equation error of the leader and the HJB and FPK equation errors of the followers. The HJB errors of the leader and the follower 1 in group 1 have been shown in this simulation. In Figure 6, the HJB equation error of the leader of group 1 is plotted. In the figure, it is clear that the error converged to zero with time. A small window is also demonstrated in this figure for time 14.5 –15 s to observe the performance in detail. This implies the optimality of the leader for achieving the desired swarming behavior. Next, the HJB equation error of follower 1 from group 1 is demonstrated in Figure 7. We can clearly see that the HJB error of the followers converged to 0 after 2.5 s. These two figures confirm the optimality of all groups. To confirm the convergence of the mean-field swarming error, the FPK equation error for group 1 follower 1 is presented in Figure 8. The FPK error figure validates the follower mean field equation solution.

5.2. Performance Comparison of Mixed Game Theory against Traditional Cooperative Control

Finally, the performance of the developed mixed game theory-based distributed control was compared with the traditional cooperative centralized control [33,34,35] to demonstrate the significance of our method. In this comparison, each group in the cooperative swarm control scheme had 1 leader and 100 followers. The other parameter values used in the cooperative swarm control scheme (e.g., the initial positions) were identical to those used for the mixed game theory-based distributed swarm control scheme. Next, the running cost including the communication costs was evaluated to provide a comparison of the performances. The running cost of the leader for this simulation was defined as
J L i = E 0 Φ L ( x L i , x L i , u L i ) + Φ CP , L ( x L i , x ¯ i ) + w c N F i + w c , L ( M 1 ) d t
where Φ L ( x L i , x L i , u L i ) = Φ SE , L ( x L i ) + Φ CA , L ( x L i , x L i ) + u L i R L i 2 and w c N F i is a new term which represents the communication cost of the leader with their followers from group i. Additionally, w c is the communication cost weight, and N F i is the total number of followers in group i, while w c , L ( M 1 ) represents the communication cost of the leader with other leaders from different groups. Here, w c , L is the communication cost weight, and M is the total number of leaders.
Figure 9 shows the performance of the developed algorithm in terms of the running cost for the leader in group 1. From Figure 9, it is clear that the performance of our approach against the cooperative multi-agent centralized approach was better after a certain amount of time. In the beginning, the cost of the cooperative game-based optimal solution was lower. However, the developed algorithm slowly outperformed the cooperative game-based algorithm. The main reason for this is that the cost function of cooperative centralized swarm control is penalized for high amounts of communication between the leader and the followers in the same group. Similarly, the performance of the developed mixed game theory-based distributed swarm control algorithm for the follower in the same group is shown.
The running cost of the follower is defined as
J F , q i = E 0 Φ F ( x F , q i , x L i , u F , q i ) + Φ CA , F ( x F , q i , x ¯ i ) + Φ C , F ( x F , q i , x ¯ i ) + w c , F N F i d t
s . t . x F , q i , o x L i < r i
with Φ F ( x F , q i , x L i , u F , q i ) = Φ SE , F ( x F , q i ) + Φ CP , F ( x F , q i , x L i ) + u F , q i R F , q i 2 , w c , F N F i being a new term which represents the communication cost of follower q from group i. In addition, w c , F is the communication cost weight, and N F i is the total number of followers in group i. Similar to Figure 9, the performance of the developed distributed swarm algorithm in terms of the running costs for the followers in group 1 are demonstrated in Figure 10. The cost of the cooperative centralized approach was penalized here because of the communication among a large number of followers in the same group. From Figure 9 and Figure 10, it is clear that the developed algorithm outperformed the traditional cooperative centralized approach.

6. Conclusions

This paper developed mixed game-based distributed intelligent swarming control along with a hierarchical learning algorithm for multi-group, large-scale leader–follower systems. To attain the collective swarming behavior for a large number of agents, a mixed game theory was designed which included a cooperative game to ensure collective swarming behavior among the multi-group leaders, a Stackelberg game to bond the group leader with its large-scale followers, and a mean field game to continue the collective swarming behavior to all the followers without raising computational complexity by breaking the “Curse of Dimensionality”. Moreover, a hierarchical learning-based actor–critic algorithm was designed to achieve the solution of intelligent optimal swarming control. This structure includes the multi-agent actor–critic neural networks to learn the distributed swarm control for multi-group leaders and actor–critic–mass-based neural networks to learn decentralized swarm control for the large-scale followers. The developed mixed game-based intelligent swarm control optimizes the collective swarming behavior and also adapts to uncertainties in the dynamic environment. Finally, the effectiveness of the developed techniques was validated through Lyapunov analysis and numerical simulations.

Author Contributions

Conceptualization, S.D. and H.X.; methodology, S.D. and H.X.; software, S.D.; validation, S.D. and H.X.; formal analysis, S.D. and H.X.; investigation, H.X.; resources, H.X.; data curation, S.D.; writing—original draft preparation, S.D.; writing—review and editing, H.X.; visualization, S.D.; supervision, H.X.; project administration, H.X.; funding acquisition, H.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Science Foundation grant number 2144646.

Data Availability Statement

Data sharing not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 1

Leader Critic NN: Consider the following Lyapunov function candidate:
L V , L i ( t ) = 1 2 tr E { W ˜ V , L i T W ˜ V , L i }
In addition, the first derivative of the leader–critic NN estimated weight from Equation (33) can be obtained as follows:
E W ˜ ˙ V , L i = E W ^ ˙ V , L i = E α V , L i Ψ ^ V , L i ( x L i , x L i , ρ ^ ¯ i ) e HJB L i T 1 + Ψ ^ V , L i ( x L i , x L i , ρ ^ ¯ i ) ) 2
According to the Lyapunov stability analysis, we take the first derivative of Equation (A1) and substitute the leader–critic NN weight estimation error dynamic from Equation (A2):
L ˙ V , L i ( t ) = tr E { W ˜ V , L i T W ˜ ˙ V , L i } = α V , L i tr E W ˜ V , L i T Ψ ^ V , L i ( x L i , x L i , ρ ^ ¯ i ) e HJB L i T 1 + Ψ ^ V , L i ( x L i , x L i , ρ ^ ¯ i ) ) 2
Then, we let
Ψ ^ V , L i = Ψ ^ V , L i ( x L i , x L i , ρ ^ ¯ i ) ; Ψ ˜ V , L i = Ψ ˜ V , L i ( x L i , x L i , ρ ^ ¯ i )
Now, by substituting Equation (31) into Equation (A3), we can obtain
L ˙ V , L i ( t ) α V , L i tr E W ˜ V , L i T Ψ ^ V , L i 1 + Ψ ^ V , L i 2 Φ ˜ CP , L W ˜ V , L i T Ψ ^ V , L i W V , L i T Ψ ˜ V , L i ε HJB L i T α V , L i tr E W ˜ V , L i T Ψ ^ V , L i Φ ˜ CP , L T 1 + Ψ ^ V , L i 2 α V , L i tr E W ˜ V , L i Ψ ^ V , L i Ψ ^ V , L i T W ˜ V , L i 1 + Ψ ^ V , L i 2 α V , L i tr E W ˜ V , L i T Ψ ^ V , L i Ψ ˜ V , L i T W V , L i 1 + Ψ ^ V , L i 2 α V , L i tr E W ˜ V , L i Ψ ^ V , L i ε HJB L i T 1 + Ψ ^ V , L i 2
Next, the triangle inequality properties are applied to Equation (A4):
L ˙ V , L i ( t ) 1 4 α V , L i E Ψ ^ V , L i 2 W ˜ V , L i 2 1 + Ψ ^ V , L i 2 1 4 α V , L i E Ψ ^ V , L i 2 W ˜ V , L i 2 1 + Ψ ^ V , L i 2 + α V , L i tr ( E { W ˜ V , L i T × Ψ ^ V , L i Φ ˜ CP , L T 1 + Ψ ^ V , L i 2 } ) α V , L i E Φ ˜ CP , L 2 1 + Ψ ^ V , L i 2 + α V , L i E Φ ˜ CP , L 2 1 + Ψ ^ V , L i 2 1 4 α V , L i E Ψ ^ V , L i 2 W ˜ V , L i 2 1 + Ψ ^ V , L i 2 α V , L i tr E W ˜ V , L i T Ψ ^ V , L i Ψ ˜ V , L i T W V , L i 1 + Ψ ^ V , L i 2 α V , L i E W V , L i T Ψ ˜ V , L i 2 1 + Ψ ^ V , L i 2 + α V , L i E W V , L i T Ψ ˜ V , L i 2 1 + Ψ ^ V , L i 2 1 4 α V , L i E Ψ ^ V , L i 2 W ˜ V , L i 2 1 + Ψ ^ V , L i 2 α V , L i tr E W ˜ V , i T Ψ ^ V , L i ε HJB L i T 1 + Ψ ^ V , L i 2 α V , L i E ε HJB L i 2 1 + Ψ ^ V , L i 2 + α V , L i E ε HJB L i 2 1 + Ψ ^ V , L i 2
Now, Equation (A5) can be simplified using 1 4 a 2 ± a b + b 2 = 1 2 a ± b 2 :
L ˙ V , L i ( t ) 1 4 α V , L i E Ψ ^ V , L i 2 W ˜ V , L i 2 1 + Ψ ^ V , L i 2 α V , L i E { 1 1 + Ψ ^ V , L i 2 [ 1 4 Ψ ^ V , L i 2 W ˜ V , L i 2 tr { Ψ ^ V , L i W ˜ V , L i T Φ ˜ CP , L T } + Φ ˜ CP , L 2 ] } α V , L i E { 1 1 + Ψ ^ V , L i 2 [ 1 4 Ψ ^ V , L i 2 W ˜ V , L i 2 + tr { W ˜ V , L i T Ψ ^ V , L i Ψ ˜ V , L i T W V , L i } + W V , L i T Ψ ˜ V , L i 2 ] } α V , L i E { 1 1 + Ψ ^ V , L i 2 [ 1 4 Ψ ^ V , L i 2 W ˜ V , L i 2 + W ˜ V , L i T Ψ ^ V , L i ε HJB L i T + ε HJB L i 2 ] } + α V , L i E Φ ˜ CP , L 2 1 + Ψ ^ V , L i 2 + α V , L i E W V , L i T Ψ ˜ V , L i 2 1 + Ψ ^ V , L i 2 + α V , L i E ε HJB L i 2 1 + Ψ ^ V , L i 2 1 4 α V , L i E Ψ ^ V , L i 2 W ˜ V , L i 2 1 + Ψ ^ V , L i 2 α V , L i E W ˜ V , L i T Ψ ^ V , L i 2 Φ ˜ CP , L 2 1 + Ψ ^ V , L i 2 α V , L i E { W ˜ V , L i T Ψ ^ V , L i 2 + W V , L i T Ψ ˜ V , L i 2 1 + Ψ ^ V , L i 2 } α V , L i E W ˜ V , L i T Ψ ^ V , L i 2 + ε HJB L i 2 1 + Ψ ^ V , L i 2 + α V , L i E Φ ˜ CP , L 2 1 + Ψ ^ V , L i 2 + α V , L i E W V , L i T Ψ ˜ V , L i 2 1 + Ψ ^ V , L i 2 + α V , L i E ε HJB L i 2 1 + Ψ ^ V , i 2
By dropping several negative terms, Equation (A6) can be simplified as follows:
L ˙ V , L i ( t ) 1 4 α V , L i E Ψ ^ V , L i 2 W ˜ V , L i 2 1 + Ψ ^ V , L i 2 + B W , V , L i ( t )
where
B W , V , L i ( t ) = α V , L i [ l Φ CP , L i + l Ψ V , L i E { W V , L i 2 ] ρ ˜ ¯ i 2 } 1 + E { Ψ ^ V , L i 2 } + α V , L i E ε HJB L i 2 1 + Ψ ^ V , L i 2
Here, l Φ CP , L i and l Ψ V , L i are the Lipschitz constants, and ρ ˜ ¯ i is a mass estimation bound.
The leader–critic NN weight estimation error will be UUB, and the bound is
E W ˜ V , L i 2 E ( 1 + Ψ ^ V , L i 2 ) α V , L i Ψ ^ V , L i 2 B W , V , L i b W , V , L i
This completes the proof.
Follower Critic NN: Consider the following Lyapunov function candidate:
L V , F , q i ( t ) = 1 2 tr E { W ˜ V , F , q i T W ˜ V , F , q i }
Additionally, the first derivative of the leader–critic NN estimated weight from Equation (47) can be obtained as follows:
E W ˜ ˙ V , F , q i = E W ^ ˙ V , F , q i = E α V , F , q i Ψ ^ V , F , q i ( x F , q i , x L i , ρ ^ F , q i ) e HJB F , q i T 1 + Ψ ^ V , F , q i ( x F , q i , x L i , ρ ^ F , q i ) 2
According to Lyapunov stability analysis, we take the first derivative of Equation (A19) and substitute the follower–critic NN weight estimation error dynamic from Equation (A11):
L ˙ V , F , q i ( t ) = tr E { W ˜ V , F , q i T W ˜ ˙ V , F , q i } = α V , F , q i tr E W ˜ V , F , q i T Ψ ^ V , F , q i ( x F , q i , x L i , ρ ^ F , q i ) e HJB F , q i T 1 + Ψ ^ V , F , q i ( x F , q i , x L i , ρ ^ F , q i ) 2
Then, we let
Ψ ^ V , F , q i = Ψ ^ V , F , q i ( x F , q i , x L i , ρ ^ F , q i ) ; Ψ ˜ V , F , q i = Ψ ˜ V , F , q i ( x F , q i , x L i , ρ ^ F , q i )
Now, by substituting Equation (43) into Equation (A12), we can obtain
L ˙ V , F , q i ( t ) α V , F , q i tr ( E { W ˜ V , F , q i T Ψ ^ V , F , q i 1 + Ψ ^ V , F , q i 2 { Φ ˜ CA , F + Φ ˜ C , F W ˜ V , F , q i T Ψ ^ V , F , q i W V , F , q i T Ψ ˜ V , F , q i ε HJB F , q i } T } ) α V , F , q i tr E W ˜ V , F , q i T Ψ ^ V , F , q i Φ ˜ CA , F T 1 + Ψ ^ V , F , q i 2 + α V , F , q i tr E W ˜ V , F , q i T Ψ ^ V , F , q i Φ ˜ C , F T 1 + Ψ ^ V , F , q i 2 α V , F , q i tr E W ˜ V , F , q i Ψ ^ V , F , q i Ψ ^ V , F , q i T W ˜ V , F , q i 1 + Ψ ^ V , F , q i 2 α V , F , q i tr E W ˜ V , F , q i T Ψ ^ V , F , q i Ψ ˜ V , F , q i T W V , F , q i 1 + Ψ ^ V , F , q i 2 α V , F , q i tr E W ˜ V , F , q i Ψ ^ V , F , q i ε HJB F , q i T 1 + Ψ ^ V , F , q i 2
Next, the triangle inequality properties are applied to Equation (A13):
L ˙ V , F , q i ( t ) 1 4 α V , F , q i E Ψ ^ V , F , q i 2 W ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i tr × E W ˜ V , F , q i T Ψ ^ V , F , q i Φ ˜ CA , F T 1 + Ψ ^ V , F , q i 2 α V , F , q i E Φ ˜ CA , F 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i E Φ ˜ CA , F 2 1 + Ψ ^ V , F , q i 2 1 4 α V , F , q i E Ψ ^ V , F , q i 2 W ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i tr E W ˜ V , F , q i T Ψ ^ V , F , q i Φ ˜ C , F T 1 + Ψ ^ V , F , q i 2 α V , F , q i E Φ ˜ C , F 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i E Φ ˜ C , F 2 1 + Ψ ^ V , F , q i 2 1 4 α V , F , q i E Ψ ^ V , F , q i 2 W ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 α V , F , q i tr E W ˜ V , F , q i T Ψ ^ V , F , q i Ψ ˜ V , F , q i T W V , F , q i 1 + Ψ ^ V , F , q i 2 α V , F , q i E W V , F , q i T Ψ ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i E W V , F , q i T Ψ ˜ V , F , q i 2 1 + Ψ ^ V , F , q 2 1 4 α V , F , q i E Ψ ^ V , F , q i 2 W ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 α V , F , q i tr E W ˜ V , F , q i T Ψ ^ V , F , q i ε HJB F , q i T 1 + Ψ ^ V , F , q i 2 α V , F , q i E ε HJB F , q i 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i E ε HJB F , q i 2 1 + Ψ ^ V , F , q i 2
Now, Equation (A14) can be simplified as follows:
L ˙ V , F , q i ( t ) α V , F , q i E { 1 1 + Ψ ^ V , F , q i 2 [ 1 4 Ψ ^ V , F , q i 2 W ˜ V , F , q i 2 tr { Ψ ^ V , F , q i W ˜ V , F , q i T Φ ˜ CA , F T } + Φ ˜ CA , F 2 ] } α V , F , q i E { 1 1 + Ψ ^ V , F , q i 2 [ 1 4 Ψ ^ V , F , q i 2 W ˜ V , F , q i 2 tr { Ψ ^ V , F , q i W ˜ V , F , q i T Φ ˜ C , F T } + Φ ˜ C , F 2 ] } α V , F , q i E { 1 1 + Ψ ^ V , F , q i 2 [ 1 4 Ψ ^ V , F , q i 2 W ˜ V , F , q i 2 + tr { W ˜ V , F , q i T Ψ ^ V , F , q i Ψ ˜ V , F , q i T W V , F , q i } + W V , F , q i T Ψ ˜ V , F , q i 2 ] } α V , F , q i E { 1 1 + Ψ ^ V , F , q i 2 [ 1 4 Ψ ^ V , F , q i 2 W ˜ V , F , q i 2 + W ˜ V , F , q i T Ψ ^ V , F , q i ε HJB F , q i + ε HJB F , q i 2 ] } + α V , F , q i E Φ ˜ CA , F 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i E Φ ˜ C , F 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i E W V , F , q i T Ψ ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i E ε HJB F , q i 2 1 + Ψ ^ V , F , q i 2 α V , F , q i E × W ˜ V , F , q i T Ψ ^ V , F , q i 2 Φ ˜ CA , F 2 1 + Ψ ^ V , F , q i 2 α V , F , q i E W ˜ V , F , q i T Ψ ^ V , F , q i 2 Φ ˜ C , F 2 1 + Ψ ^ V , F , q i 2 α V , F , q i E { W ˜ V , F , q i T Ψ ^ V , F , q i 2 + W V , F , q i T Ψ ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 } α V , F , q i E W ˜ V , F , q i T Ψ ^ V , F , q i 2 + ε HJB F , q i 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i E Φ ˜ CA , F 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i E Φ ˜ C , F 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i E W V , F , q i T Ψ ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 α V , F , q i E ε HJB F , q i 2 1 + Ψ ^ V , F , q 2 α V , F , q i E W ˜ V , F , q i T Ψ ^ V , F , q i 2 Φ ˜ CA , F 2 1 + Ψ ^ V , F , q i 2 α V , F , q i E W ˜ V , F , q i T Ψ ^ V , F , q i 2 Φ ˜ C , F 2 1 + Ψ ^ V , F , q i 2 E { α V , F , q i 1 + Ψ ^ V , F , q i 2 × Ψ ^ V , F , q i 2 W ˜ V , F , q i 2 4 + Ψ ^ V , F , q i 2 W ˜ V , F , q i 2 2 + W V , F , q i T Ψ ˜ V , F , q i 2 2 + W V , F , q i T Ψ ˜ V , F , q i 2 } α V , F , q i E W ˜ V , F , q i T Ψ ^ V , F , q i 2 + ε HJB F , q i 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i E Φ ˜ CA , F 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i E Φ ˜ C , F 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i E W V , F , q i T Ψ ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 α V , F , q i E ε HJB F , q i 2 1 + Ψ ^ V , F , q 2 3 4 α V , F , q i E Ψ ^ V , F , q i 2 W ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 3 2 α V , F , q i E W V , F , q i T 2 Ψ ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 α V , F , q i E W ˜ V , F , q i T Ψ ^ V , F , q i 2 Φ ˜ CA , F 2 1 + Ψ ^ V , F , q i 2 α V , F , q i E W ˜ V , F , q i T Ψ ^ V , F , q i 2 Φ ˜ C , F 2 1 + Ψ ^ V , F , q i 2 α V , F , q i E { W ˜ V , F , q i T Ψ ^ V , F , q i 2 + ε HJB F , q i 2 1 + Ψ ^ V , F , q i 2 } + + α V , F , q i E Φ ˜ CA , F 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i E Φ ˜ C , F 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i E W V , F , q i T Ψ ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 + α V , F , q i E ε HJB F , q i 2 1 + Ψ ^ V , F , q 2
By dropping several negative terms, the derivation can be simplified to
L ˙ V , F , q i ( t ) 3 4 α V , F , q i E Ψ ^ V , F , q i 2 W ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 + B W , V , F , q i ( t )
where
B W , V , F , q i ( t ) = α V , F , q i [ l ϕ CA , F i + l Φ C , F i + l Ψ V , F , q i E { W V , F , q i 2 ] ρ ˜ F , q i 2 } [ 1 + E { Ψ ^ V , F , q i 2 } ] 1 + α V , F , q i E ε HJB F , q i 2 1 + Ψ ^ V , F , q 2
Here, l Φ CA , F i , l Φ C , F i , and l Ψ V , F , q i are the Lipschitz constants, and ρ ˜ F , q i is the mass estimation bound.
The follower–critic NN weight estimation error will be UUB, and the bound is
E W ˜ V , F , q i 2 3 E ( 1 + Ψ ^ V , p , i 2 ) α V , p , i Ψ ^ V , i 2 B W , V , F , q i b W , V , F , q i
This completes the proof.

Appendix B. Proof of Theorem 2

Consider the Lyapunov function candidate
L ρ , F , q i ( t ) = 1 2 tr E { W ˜ ρ , F , q i T W ˜ ρ , F , q i }
Additionally, the first derivative of the mass NN estimated weight from (48) can be obtained as follows:
E W ˜ ˙ ρ , F , q i = E W ^ ˙ ρ , F , q i = E α ρ , F , q i Ψ ^ ρ , F , q i ( x F , q i , V ^ F , q i , t ) e FPK F , q i T 1 + Ψ ^ ρ , F , q i ( x F , q i , V ^ F , q i , t ) 2
According to Lyapunov stability analysis, we take the first derivative of Equation (A19) and substitute in the mass NN weight estimation error dynamic from Equation (A20):
L ˙ ρ , F , q i ( t ) = tr E W ˜ ρ , F , q i T W ˜ ˙ ρ , F , q i α ρ , i i tr E W ˜ ρ , i T Ψ ^ ρ , F , q i ( x F , q i , V ^ F , q i , t ) e FPK F , q i T 1 + Ψ ^ ρ , F , q i ( x F , q i , V ^ F , q i , t ) 2
Let
Ψ ^ ρ , F , q i = Ψ ρ , F , q i ( x F , q i , V ^ F , q i , t ) ; Ψ ˜ ρ , F , q i = Ψ ρ , F , q i ( x F , q i , V ˜ F , q i , t )
Now, by substituting Equation (44) into Equation (A21), we can obtain
L ˙ ρ , F , q i ( t ) α ρ , F , q i tr ( E W ˜ ρ , F , q i T Ψ ^ ρ , F , q i 1 + Ψ ^ ρ , F , q i 2 W ˜ ρ , F , q i T Ψ ^ ρ , F , q i W ρ , F , q i T Ψ ˜ ρ , F , q i ε FPK F , q i α ρ , F , q i tr E W ˜ ρ , F , q i T Ψ ^ ρ , F , q i Ψ ^ ρ , F , q i T W ˜ ρ , F , q i 1 + Ψ ^ ρ , F , q i 2 α ρ , F , q i tr E W ˜ ρ , F , q i T Ψ ^ ρ , F , q i Ψ ˜ ρ , F , q i T W ρ , F , q i 1 + Ψ ^ ρ , F , q i 2 α ρ , F , q i tr E W ˜ ρ , F , q i T Ψ ^ ρ , F , q i ε FPK F , q i T 1 + Ψ ^ ρ , F , q i 2
Now, the triangle inequality properties are applied to Equation (A22):
L ˙ ρ , F , q i ( t ) 1 2 α ρ , F , q i E Ψ ^ ρ , F , q i 2 W ˜ ρ , F , q i 2 1 + Ψ ^ ρ , F , q i 2 1 4 α ρ , F , q i × E Ψ ^ ρ , F , q i 2 W ˜ ρ , F , q i 2 1 + Ψ ^ ρ , F , q i 2 α ρ , F , q i tr E W ˜ ρ , F , q i T × Ψ ^ ρ , F , q i Ψ ˜ ρ , F , q i T W ρ , F , q i 1 + Ψ ^ ρ , F , q i 2 α ρ , F , q i E W ρ , F , q i T Ψ ˜ ρ , F , q i 2 1 + Ψ ^ ρ , F , q i 2 1 4 α ρ , F , q i E { Ψ ^ ρ , F , q i 2 W ˜ ρ , F , q i 2 1 + Ψ ^ ρ , F , q i 2 } α ρ , F , q i tr E W ˜ ρ , F , q i T Ψ ^ ρ , F , q i ε FPK F , q i T 1 + Ψ ^ ρ , F , q i 2 α ρ , F , q i E ε FPK F , q i 2 1 + Ψ ^ ρ , F , q i 2 + α ρ , F , q i E W ρ , F , q i T Ψ ˜ ρ , F , q i 2 1 + Ψ ^ ρ , F , q i 2 + α ρ , F , q i E ε FPK F , q i 2 1 + Ψ ^ ρ , F , q i 2
Next, Equation (A23) can be simplified as follows:
L ˙ ρ , F , q i ( t ) 1 2 α ρ , F , q i E Ψ ^ ρ , F , q i 2 W ˜ ρ , F , q i 2 1 + Ψ ^ ρ , F , q i 2 α ρ , F , q i E { 1 1 + Ψ ^ ρ , F , q i 2 [ 1 4 Ψ ^ ρ , F , q i 2 W ˜ ρ , F , q i 2 + tr { W ˜ ρ , F , q i T Ψ ^ ρ , F , q i Ψ ˜ ρ , F , q i T W ρ , F , q i } + W ρ , i T Ψ ˜ ρ , i 2 ] } α ρ , F , q i E { 1 1 + Ψ ^ ρ , F , q i 2 × 1 4 Ψ ^ ρ , F , q i 2 W ˜ ρ , F , q i 2 + W ˜ ρ , F , q i T Ψ ^ ρ , F , q i ε FPK F , q i T + ε FPK F , q i 2 } + α ρ , F , q i E W ρ , F , q i T Ψ ˜ ρ , F , q i 2 1 + Ψ ^ ρ , F , q i 2 + α ρ , F , q i E ε FPK F , q i 2 1 + Ψ ^ ρ , F , q i 2 1 2 α ρ , F , q i E Ψ ^ ρ , F , q i 2 W ˜ ρ , F , q i 2 1 + Ψ ^ ρ , F , q i 2 α ρ , F , q i E W ˜ ρ , F , q i T Ψ ^ ρ , F , q i 2 + W ρ , F , q i T Ψ ˜ ρ , F , q i 2 1 + Ψ ^ ρ , F , q i 2 α ρ , F , q i E W ˜ ρ , F , q i T Ψ ^ ρ , F , q i 2 + ε FPK F , q i 2 1 + Ψ ^ ρ , F , q i 2 + α ρ , F , q i E W ρ , F , q i T Ψ ˜ ρ , F , q i 2 1 + Ψ ^ ρ , F , q i 2 α ρ , F , q i E ε FPK F , q i 2 1 + Ψ ^ ρ , F , q i 2
After simplification, Equation (A24) can be written as follows:
L ˙ ρ , F , q i ( t ) 1 2 α ρ , F , q i E Ψ ^ ρ , F , q i 2 W ˜ ρ , F , q i 2 1 + Ψ ^ ρ , F , q i 2 + B W , ρ , F , q i ( t )
where
B W , ρ , F , q i = α ρ , F , q i [ l Ψ ρ , F , q i E { W ρ , F , q i 2 } ] E { V ˜ F , q i 2 } 1 + E { Ψ ^ ρ , F , q i 2 } + α ρ , F , q i E ε FPK F , q i 2 1 + Ψ ^ ρ , F , q i 2
Here, l Ψ ρ , F , q i is the Lipschitz constant, and V ˜ F , q i is the critic estimation error bound. The mass NN weight estimation error will be UUB, and the bound is
E W ˜ ρ , F , q i 2 E ( 1 + Ψ ^ ρ , F , q i 2 ) α ρ , F , q i Ψ ^ ρ , F , q i 2 B W , ρ , F , q i b W , ρ , F , q i
This completes the proof.

Appendix C. Proof of Theorem 3

Consider the Lyapunov function candidate for the leader
L u , L i ( t ) = 1 2 tr E { W ˜ u , i T W ˜ u , i }
The first derivative of the leader–actor NN estimated weight from (34) can be obtained:
E W ˜ ˙ u , L i = E W ^ ˙ u , L i = E α u , L i ϕ ^ u , L i ( x L i , x L i , ρ ^ ¯ i ) e u , i i T 1 + ϕ ^ u , L i ( x L i , x L i , ρ ^ ¯ i ) 2
According to the Lyapunov stability analysis, we take the first derivative of Equation (A28) and substitute Equation (A29):
L ˙ u , L i ( t ) = tr E W ˜ u , L i T W ˜ ˙ u , L i = α u , L i tr E W ˜ u , L i T ϕ ^ u , L i ( x L i , x L i , ρ ^ ¯ i ) e u , i i T 1 + ϕ ^ u , L i ( x L i , x L i , ρ ^ ¯ i ) 2
Let
ϕ ^ u , L i = ϕ ^ u , L i ( x L i , x L i , ρ ^ ¯ i ) ; ϕ ˜ u , L i = ϕ ˜ u , L i ( x L i , x L i , ρ ˜ ¯ i ) ; V ˜ L i = V ˜ i ( x L i , x L i , ρ ^ ¯ i )
Now, by substituting Equation (32) into Equation (A30), the following equation is obtained:
L ˙ u , L i ( t ) α u , L i tr ( E { W ˜ u , L i T ϕ ^ u , L i 1 + ϕ ^ u , L i 2 { W ˜ u , L i T ϕ ^ u , L i W u , L i T ϕ ˜ u , L i 1 2 R L i 1 G a T ( x L i ) x L i V ˜ L i ε u , L i } T } ) α u , L i tr E W ˜ u , L i T ϕ ^ u , L i ϕ ^ u , L i T W ˜ u , L i 1 + ϕ ^ u , L i 2 α u , L i tr E W ˜ u , L i T ϕ ^ u , L i ϕ ˜ u , L i T W u , L i 1 + Ψ ^ u , L i 2 α u , L i tr × E W ˜ u , L i T ϕ ^ u , L i [ 1 2 R L i 1 G a T ( x L i ) x L i V ˜ L i ] 1 + ϕ ^ u , i 2 α u , L i tr E W ˜ u , L i T ϕ ^ u , L i ε u , L i T 1 + ϕ ^ u , L i 2
After applying the triangle inequality property, Equation (A31) can be written as follows:
L ˙ u , L i ( t ) 1 4 E α u , L i ϕ ^ u , L i 2 W ˜ u , L i 2 1 + ϕ ^ u , L i 2 1 4 E α u , L i ϕ ^ u , L i 2 W ˜ u , L i 2 1 + ϕ ^ u , L i 2 α u , L i tr ( E { W ˜ u , L i T ϕ ^ u , L i ϕ ˜ u , L i T W u , L i 1 + ϕ ^ u , L i 2 } ) α u , L i E W u , L i T ϕ ˜ u , L i 2 1 + ϕ ^ u , L i 2 + α u , L i E W u , L i T ϕ ˜ u , L i 2 1 + ϕ ^ u , L i 2 1 4 α u , L i E ϕ ^ u , L i 2 W ˜ u , L i 2 1 + ϕ ^ u , L i 2 α u , L i tr E W ˜ u , L i T ϕ ^ u , L i [ 1 2 R L i 1 G a T ( x L i ) x L i V ˜ L i ] 1 + ϕ ^ u , L i 2 α u , L i E × 1 2 R L i 1 G a T ( x L i ) x L i V ˜ L i 2 1 + ϕ ^ u , L i 2 + α u , L i E 1 2 R L i 1 G a T ( x L i ) x L i V ˜ L i 2 1 + ϕ ^ u , L i 2 1 4 α u , L i E ϕ ^ u , L i 2 W ˜ u , L i 2 1 + ϕ ^ u , L i 2 α u , L i tr E W ˜ u , L i T ϕ ^ u , L i ε u , L i T 1 + ϕ ^ u , L i 2 α u , L i E ε u , L i 2 1 + ϕ ^ u , L i 2 + α u , L i E ε u , L i 2 1 + ϕ ^ u , L i 2 1 4 α u , L i E ϕ ^ u , L i 2 W ˜ u , L i 2 1 + ϕ ^ u , L i 2 α u , L i E { 1 1 + ϕ ^ u , L i 2 [ 1 4 × ϕ ^ u , L i 2 W ˜ u , L i 2 + tr { W ˜ u , L i T ϕ ^ u , L i ϕ ˜ u , L i T W u , L i } + W u , L i T ϕ ˜ u , L i 2 ] } α u , L i E { 1 1 + ϕ ^ u , L i 2 [ 1 4 × ϕ ^ u , L i 2 W ˜ u , L i 2 + tr { W ˜ u , L i T ϕ ^ u , L i [ 1 2 R L i 1 G a T ( x L i ) x L i V ˜ L i ] } + 1 2 R L i 1 G a T ( x L i ) x L i V ˜ L i 2 ] } α u , L i E { 1 1 + ϕ ^ u , L i 2 [ 1 4 ϕ ^ u , L i 2 W ˜ u , L i 2 + W ˜ u , L i T ϕ ^ u , L i ε u , L i T + ε u , i 2 ] } + α u , L i E W u , L i T ϕ ˜ u , L i 2 1 + ϕ ^ u , L i 2 + α u , L i E { R L i 1 G a T ( x L i ) x L i V ˜ L i 2 4 ( 1 + ϕ ^ u , L i 2 ) } + α u , L i E ε u , L i 2 1 + ϕ ^ u , L i 2 1 4 α u , L i E ϕ ^ u , L i 2 W ˜ u , L i 2 1 + ϕ ^ u , L i 2 α u , L i E W ˜ u , L i T ϕ ^ u , L i 2 + W u , L i T ϕ ˜ u , L i 2 1 + ϕ ^ u , L i 2 α u , L i E W ˜ u , L i T ϕ ^ u , L i 2 + R L i 1 G a T ( x L i ) x L i V ˜ L i 2 1 + ϕ ^ u , L i 2 α u , L i E W ˜ u , L i T ϕ ^ u , L i 2 + ε u , L i 2 1 + ϕ ^ u , L i 2 + α u , L i E W u , L i T ϕ ˜ u , L i 2 1 + ϕ ^ u , L i 2 + α u , L i E R L i 1 G a T ( x L i ) x L i V ˜ L i 2 ( 1 + ϕ ^ u , L i 2 ) + α u , L i E ε u , L i 2 1 + ϕ ^ u , L i 2
Dropping several negative terms and simplifying Equation (A32) yields
L ˙ u , L i ( t ) 1 4 α u , L i E ϕ ^ u , L i 2 W ˜ u , L i 2 1 + ϕ ^ u , L i 2 + α u , L i E W u , L i T ϕ ˜ u , L i 2 1 + ϕ ^ u , L i 2 + α u , i E { R L i 1 G a T ( x L i ) x L i V ˜ L i 2 ( 1 + ϕ ^ u , L i 2 ) } + α u , L i E ε u , L i 2 1 + ϕ ^ u , L i 2 1 4 α u , L i E ϕ ^ u , L i 2 W ˜ u , L i 2 1 + ϕ ^ u , L i 2 + B W , u , L i
where
B W , u , L i ( t ) = α u , L i 1 + ϕ ^ u , L i 2 { l ϕ u , L i E { W u , L i 2 ρ ˜ ¯ i 2 } + E { R L i 1 G a T ( x L i ) 2 V ˜ L i 2 } } + α u , L i E { ε u , L i 2 } 1 + E { ϕ ^ u , L i 2 }
Here, V ˜ i is the critic estimation error bound. The actor NN weight estimation error will be UUB, and the bound is
E W ˜ u , L i 2 E ( 1 + ϕ ^ u , L i 2 ) α u , L i ϕ ^ u , L i 2 B W , u , L i b W , u , L i
Similarly, we can derive the bound for the follower–actor NN:
L ˙ u , F , q i ( t ) 1 4 α u , F , q i E ϕ ^ u , F , q i 2 W ˜ u , F , q i 2 1 + ϕ ^ u , F , q i 2 + B W , u , F , q i
where
B W , u , F , q i ( t ) = α u , F , q i 1 + E { ϕ ^ u , F , q i 2 } { l ϕ u , F , q i E { W u , F , q i 2 ρ ˜ F , q i 2 } + E { R F , q i 1 G s T ( x F , q i ) 2 + V ˜ F , q i 2 } } α u , F , q i E { ε u , F , q i 2 } 1 + E { ϕ ^ u , F , q i 2 }
Here, l ϕ u , F , q i represents the Lipschitz constants, ρ ˜ F , q i is the mass estimation bound, and V ˜ F , q i is the critic estimation error bound. The follower–actor NN weight estimation error will be UUB, and the bound is
E W ˜ u , F , q i 2 E ( 1 + ϕ ^ u , F , q i 2 ) α u , F , q i ϕ ^ u , F , q i 2 B W , u , F , q i b W , u , F , q i
This completes the proof.

Appendix D. Proof of Theorem 4

Consider the Lyapunov function to be
L sys i ( t ) = β 1 2 tr E e L i T ( t ) e L i ( t ) + β 2 2 tr E W ˜ V , L i T ( t ) W ˜ V , L i ( t ) + β 3 2 tr ( E { W ˜ u , L i T ( t ) W ˜ u , L i ( t ) } ) + β 4 2 tr E e F , q i T ( t ) e F , q i ( t ) + β 5 2 tr E W ˜ V , F , q i T ( t ) W ˜ V , F , q i ( t ) + β 6 2 tr ( E { W ˜ ρ , F , q i T ( t ) × W ˜ ρ , F , q i ( t ) } ) + β 7 2 tr E W ˜ u , F , q i T ( t ) W ˜ u , F , q i ( t )
By taking the first derivative and substituting Lemma 1 and Theorems 1–3 given in Equations (A7), (A16), (A25), (A33), and (A35), we have
L ˙ sys i ( t ) = β 1 tr E e L i T ( t ) e ˙ L i ( t ) + β 2 tr E W ˜ V , L i T ( t ) W ˜ ˙ V , L i ( t ) + β 3 tr ( E { W ˜ u , L i T ( t ) W ˜ ˙ u , L i ( t ) } ) + β 4 tr E e F , q i T ( t ) e ˙ F , q i ( t ) + β 5 tr E W ˜ V , F , q i T ( t ) W ˜ ˙ V , F , q i ( t ) + β 6 tr ( E { W ˜ ρ , F , q i T ( t ) × W ˜ ˙ ρ , F , q i ( t ) } ) + β 7 tr E W ˜ u , F , q i T ( t ) W ˜ ˙ u , F , q i ( t ) β 1 tr E e L i T F a r ( e L i ( t ) ) + G a r ( e L i ( t ) ) u L i ( t ) + 2 ν i d ω L i d t β 1 tr ( E e L i T G a r ( e L i ) u ˜ L i } 2 β 1 γ 1 E G a r ( e L i ) u ˜ L i 2 + 2 β 1 γ 1 E G a r ( e L i ) u ˜ L i 2 β 2 α V , L i 4 E Ψ ^ V , L i 2 W ˜ V , L i 2 1 + Ψ ^ V , L i 2 + β 2 B W , V , L i β 3 α u , L i 4 E ϕ ^ u , L i 2 W ˜ u , L i 2 1 + ϕ ^ u , L i 2 + β 3 B W , u , L i + β 4 tr ( E { e F , q i T [ F s r ( e F , q i ( t ) ) + G s r ( e F , q i ( t ) ) u F , q i ( t ) + 2 ν i d ω F , q i d t ] } ) β 4 tr ( E e F , q i T G s r ( e F , q i ) u ˜ F , q i } 2 β 4 γ 2 E G s r ( e F , q i ) u ˜ F , q i 2 + 2 β 4 γ 2 E G s r ( e F , q i ) u ˜ F , q i 2 β 5 3 α V , F , q i 4 E Ψ ^ V , F , q i 2 W ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 + β 5 B W , V , F , q i β 6 α ρ , F , q i 2 E Ψ ^ ρ , F , q i 2 W ˜ ρ , F , q i 2 1 + Ψ ^ ρ , F , q i 2 + β 6 B W , ρ , F , q i β 7 α u , F , q i 4 E ϕ ^ u , F , q i 2 W ˜ u , F , q i 2 1 + ϕ ^ u , F , q i 2 + β 7 B W , u , F , q i γ 1 β 1 2 E e L i 2 + 2 β 1 g l 1 2 γ 1 E u ˜ L i 2 β 2 α V , L i 4 E Ψ ^ V , L i 2 W ˜ V , L i 2 1 + Ψ ^ V , L i 2 + β 2 B W , V , L i β 3 α u , L i 4 E ϕ ^ u , L i 2 W ˜ u , L i 2 1 + ϕ ^ u , L i 2 + β 3 B W , u , L i γ 2 β 4 2 E e F , q i 2 + 2 β 4 γ 2 g l 2 2 E u ˜ F , q i 2 β 5 3 α V , F , q i 4 E Ψ ^ V , F , q i 2 W ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 + β 5 B W , V , F , q i β 6 α ρ , F , q i 2 E Ψ ^ ρ , F , q i 2 W ˜ ρ , F , q i 2 1 + Ψ ^ ρ , F , q i 2 + β 6 B W , ρ , F , q i β 7 α u , F , q i 4 E ϕ ^ u , F , q i 2 W ˜ u , F , q i 2 1 + ϕ ^ u , F , q i 2 + β 7 B W , u , F , q i
where g l 1 and g l 2 are the Lipschitz constants of the dynamic equations G a r ( e L i ) and G s r ( e F , q i ) , respectively. Now, by substituting Equations (49)–(53) into Equation (A38), Equation (A38) can be represented and simplified as follows:
L ˙ s y s i ( t ) b 1 + b 3 + 3 b 4 E W ˜ V , L i T 2 ϕ ^ V , L i 2 + [ 3 b 4 l ψ V , L i 2 + b 2 ] b ¯ ρ , F , q i 2 + 3 b 4 ε HJB L i 2 + b 5 + b 7 3 b 8 W ˜ V , F , q i T 2 ϕ ^ V , F , q i 2 + 2 [ 3 b 8 l ψ V , F , q i 2 W V , F , q i + b 6 ] W ˜ ρ , F , q i T 2 ψ ^ ρ , F , q i 2 + [ 3 b 8 l ψ V , F , q i 2 W V , F , q i + b 6 ] ε FPK F , q i 2 + 3 b 8 ε HJB F , q i 2
where
b 1 = γ 1 β 1 2 E e L i 2 + 2 β 1 g l 1 2 γ 1 E u ˜ L i 2 β 2 α V , L i 4 E Ψ ^ V , L i 2 W ˜ V , L i 2 1 + Ψ ^ V , L i 2 b 2 = β 2 α V , L i [ l Φ CP , L i + l Ψ V , L i E { W V , L i 2 ] } 1 + E { Ψ ^ V , L i 2 } b 3 = β 3 α u , L i 4 E ϕ ^ u , L i 2 W ˜ u , L i 2 1 + ϕ ^ u , L i 2 + β 2 α V , L i E ε HJB L i 2 1 + Ψ ^ V , L i 2 + β 3 α u , i E { ε u , L i 2 } 1 + E { ϕ ^ u , L i 2 } b 4 = β 3 α u , L i 1 + ϕ ^ u , L i 2 l ψ u , L i E { W u , L i 2 ρ ˜ ¯ i 2 } + E { R L i 1 G a T ( x L i ) 2 } b 5 = γ 2 β 4 2 E e F , q i 2 + 2 β 4 γ 2 g l 2 2 E u ˜ F , q i 2 β 5 3 α V , F , q i 4 E Ψ ^ V , F , q i 2 W ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 b 6 = β 5 α V , F , q i [ l ϕ CA , F i + l Φ C , F i + l Ψ V , F , q i E { W V , F , q i 2 ] } [ 1 + E { Ψ ^ V , F , q i 2 } ] 1 b 7 = β 6 α ρ , F , q i 2 E Ψ ^ ρ , F , q i 2 W ˜ ρ , F , q i 2 1 + Ψ ^ ρ , F , q i 2 β 7 α u , F , q i 4 E ϕ ^ u , F , q i 2 W ˜ u , F , q i 2 1 + ϕ ^ u , F , q i 2 + β 5 α V , F , q i E ε HJB F , q i 2 1 + Ψ ^ V , F , q 2 + β 6 α ρ , F , q i E ε FPK F , q i 2 1 + Ψ ^ ρ , F , q i 2 + β 7 α u , F , q i E { ε u , F , q i 2 } 1 + E { ϕ ^ u , F , q i 2 } b 8 = β 6 α ρ , F , q i [ l Ψ ρ , F , q i E { W ρ , F , q i 2 } ] 1 + E { Ψ ^ ρ , F , q i 2 } + β 7 α u , F , q i 1 + E { ϕ ^ u , F , q i 2 } l ϕ u , F , q i E { W u , F , q i 2 ρ ˜ F , q i 2 } + E { R F , q i 1 G s T ( x F , q i ) 2 }
In addition, let
ε NHJB L i = α V , L i E ε HJB L i 2 1 + Ψ ^ V , L i 2 ; ε N u , L i = α u , L i E { ε u , L i 2 } 1 + E { ϕ ^ u , L i 2 } ε NHJB F , q i = α V , F , q i E ε HJB F , q i 2 1 + Ψ ^ V , F , q 2 ; ε N u , F , q i = α u , F , q i E { ε u , F , q i 2 } 1 + E { ϕ ^ u , F , q i 2 } ε NFPK F , q i = α ρ , F , q i E ε FPK F , q i 2 1 + Ψ ^ ρ , F , q i 2
Again, Equation (A39) can be written as follows:
L ˙ s y s i ( t ) γ 1 β 1 2 E e L i 2 β 2 α V , L i 4 E Ψ ^ V , L i 2 1 + Ψ ^ V , L i 2 3 b 4 E W ˜ V , L i 2 [ β 3 α u , L i 4 E ϕ ^ u , i 2 1 + ϕ ^ u , i 2 6 β 1 g l 1 2 γ 1 Ψ ^ u , L i 2 ] W ˜ u , L i 2 γ 2 β 4 2 E e F , q i 2 [ 3 β 5 α V , F , q i 4 E Ψ ^ V , F , q i 2 W ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 3 b 8 ϕ ^ V , F , q i 2 ] W ˜ V , F , q i 2 [ β 7 α u , F , q i 4 E ϕ ^ u , F , q i 2 1 + ϕ ^ u , F , q i 2 6 β 4 g l 2 2 γ 2 E ϕ ^ u , F , q i 2 ] E W ˜ u , F , q i 2 [ β 6 α ρ , F , q i 2 E Ψ ^ ρ , F , q i 2 1 + Ψ ^ ρ , F , q i 2 12 β 4 g l 2 2 γ 2 E W u , F , q i 2 l Ψ u , F , q i 2 E ϕ ^ ρ , F , q i 2 6 b 8 l Ψ V , F , q i 2 E W V , F , q i Ψ ^ ρ , F , q i 2 2 b 6 E Ψ ^ ρ , F , q i 2 ] E W ˜ ρ , F , q i 2 + 6 β 1 g l 1 2 γ 1 E ε u , L i 2 + β 2 ε NHJB L i + β 3 ε N u , L i + 3 b 4 E ε HJB L i 2 12 β 4 g l 2 2 γ 2 E W u , F , q i 2 l ϕ u , F , q i 2 E ε FPK F , q i 2 + 6 β 4 g l 2 2 γ 2 E ε u , F , q i 2 + β 5 ε NHJB F , q i + β 6 ε NFPK F , q i + 6 b 8 l Ψ V , F , q i 2 E W V , F , q i ε FPK F , q i 2 + 2 b 6 E ε FPK F , q i 2 + 3 b 8 ε HJB L i 2 γ 1 β 1 2 E e L i 2 κ V , L i E W ˜ V , L i 2 κ u , L i E W ˜ u , L i 2 γ 2 β 4 2 E e F , q i 2 κ V , F , q i E W ˜ V , F , q i 2 κ u , F , q i E W ˜ u , F , q i 2 κ ρ , F , q i E W ˜ ρ , F , q i 2 + ε C S
with
κ V , L i = β 2 α V , L i 4 E Ψ ^ V , L i 2 1 + Ψ ^ V , L i 2 3 b 4 κ u , L i = β 3 α u , L i 4 E ϕ ^ u , L i 2 1 + ϕ ^ u , L i 2 6 β 1 g l 1 2 γ 1 E ϕ ^ u , L i 2 κ V , F , q i = 3 β 5 α v , F , q i 4 E Ψ ^ V , F , q i 2 W ˜ V , F , q i 2 1 + Ψ ^ V , F , q i 2 3 b 8 E Ψ ^ V , F , q i 2 κ u , F , q i = β 7 α u , F , q i 4 E ϕ ^ u , F , q i 2 1 + ϕ ^ u , F , q i 2 6 β 4 g l 2 2 γ 2 E ϕ ^ u , F , q i 2 κ ρ , F , q i = β 6 α ρ , F , q i 2 E Ψ ^ ρ , F , q i 2 1 + Ψ ^ ρ , F , q i 2 12 β 4 g l 2 2 γ 2 E W u , F , q i 2 l ϕ u , F , q i 2 E Ψ ^ ρ , F , q i 2 6 b 8 l Ψ V , F , q i 2 E W V , F , q i Ψ ^ ρ , F , q i 2 2 b 6 E Ψ ^ ρ , F , q i 2 ε C S = 6 β 1 g l 1 2 γ 1 E ε u , L i 2 + β 2 ε NHJB L i + β 3 ε N u , L i + 3 b 4 ε HJB L i 2 12 β 4 g l 2 2 γ 2 E W u , F , q i 2 l ϕ u , F , q i 2 E ε FPK F , q i 2 + 6 β 4 g l 2 2 γ 2 E ε u , F , q i 2 + β 5 ε NHJB F , q i + β 6 ε NFPK F , q i + 6 b 8 l Ψ V , F , q i 2 E W V , F , q i ε FPK F , q i 2 + 2 b 6 E ε FPK F , q i 2 + 3 b 8 E ε HJB F , q i 2
The derivation of the Lyapunov function L ˙ s y s i ( t ) is less than zero outside a compact set; in other words, we have
E e L i > 2 γ 1 β 1 ε C S ; E W ˜ V , L i > 1 k V , L i ε C S ; E W ˜ u , L i > 1 k u , L i ε C S
E e F , q i > 2 γ 2 β 5 ε C S ; E W ˜ V , F , q i > 1 k V , F , q i ε C S ; E W ˜ u , F , q i > 1 k u , F , q i ε C S
E | W ˜ ρ , F , q i > 1 k ρ , F , q i ε C S
This completes the proof.

References

  1. Topaz, C.M.; Bertozzi, A.L. Swarming patterns in a two-dimensional kinematic model for biological groups. SIAM J. Appl. Math. 2004, 65, 152–174. [Google Scholar] [CrossRef]
  2. Okubo, A. Dynamical aspects of animal grouping: Swarms, schools, flocks, and herds. Adv. Biophys. 1986, 22, 1–94. [Google Scholar] [CrossRef] [PubMed]
  3. Toner, J.; Tu, Y. Flocks, herds, and schools: A quantitative theory of flocking. Phys. Rev. E 1998, 58, 4828. [Google Scholar] [CrossRef] [Green Version]
  4. Kube, C.R.; Bonabeau, E. Cooperative transport by ants and robots. Robot. Auton. Syst. 2000, 30, 85–101. [Google Scholar] [CrossRef]
  5. Li, W.; Shen, W. Swarm behavior control of mobile multi-robots with wireless sensor networks. J. Netw. Comput. Appl. 2011, 34, 1398–1407. [Google Scholar] [CrossRef]
  6. Cao, L.; Cai, Y.; Yue, Y. Swarm intelligence-based performance optimization for mobile wireless sensor networks: Survey, challenges, and future directions. IEEE Access 2019, 7, 161524–161553. [Google Scholar] [CrossRef]
  7. Berman, S.; Halász, A.; Hsieh, M.A.; Kumar, V. Optimized stochastic policies for task allocation in swarms of robots. IEEE Trans. Robot. 2009, 25, 927–937. [Google Scholar] [CrossRef] [Green Version]
  8. Bayındır, L. A review of swarm robotics tasks. Neurocomputing 2016, 172, 292–321. [Google Scholar] [CrossRef]
  9. Jevtic, A.; Gutiérrez, A.; Andina, D.; Jamshidi, M. Distributed bees algorithm for task allocation in swarm of robots. IEEE Syst. J. 2011, 6, 296–304. [Google Scholar] [CrossRef] [Green Version]
  10. Engelen, S.; Gill, E.; Verhoeven, C. On the reliability, availability, and throughput of satellite swarms. IEEE Trans. Aerosp. Electron. Syst. 2014, 50, 1027–1037. [Google Scholar] [CrossRef]
  11. Xu, D.; Zhang, X.; Zhu, Z.; Chen, C.; Yang, P. Behavior-based formation control of swarm robots. Math. Probl. Eng. 2014, 2014, 205759. [Google Scholar] [CrossRef] [Green Version]
  12. Soni, A.; Hu, H. Formation control for a fleet of autonomous ground vehicles: A survey. Robotics 2018, 7, 67. [Google Scholar] [CrossRef] [Green Version]
  13. Tahir, A.; Böling, J.; Haghbayan, M.H.; Toivonen, H.T.; Plosila, J. Swarms of unmanned aerial vehicles—A survey. J. Ind. Inf. Integr. 2019, 16, 100106. [Google Scholar] [CrossRef]
  14. Zhu, B.; Xie, L.; Han, D. Recent developments in control and optimization of swarm systems: A brief survey. In Proceedings of the 2016 12th IEEE international conference on control and automation (ICCA), Kathmandu, Nepal, 1–3 June 2016; pp. 19–24. [Google Scholar]
  15. Lan, X.; Liu, Y.; Zhao, Z. Cooperative control for swarming systems based on reinforcement learning in unknown dynamic environment. Neurocomputing 2020, 410, 410–418. [Google Scholar] [CrossRef]
  16. Skobelev, P.; Budaev, D.; Gusev, N.; Voschuk, G. Designing multi-agent swarm of uav for precise agriculture. In Proceedings of the International Conference on Practical Applications of Agents and Multi-Agent Systems, Toledo, Spain, 20–22 June 2018; pp. 47–59. [Google Scholar]
  17. Kada, B.; Khalid, M.; Shaikh, M.S. Distributed cooperative control of autonomous multi-agent UAV systems using smooth control. J. Syst. Eng. Electron. 2020, 31, 1297–1307. [Google Scholar] [CrossRef]
  18. Xia, Z.; Du, J.; Wang, J.; Jiang, C.; Ren, Y.; Li, G.; Han, Z. Multi-Agent Reinforcement Learning Aided Intelligent UAV Swarm for Target Tracking. IEEE Trans. Veh. Technol. 2022, 71, 931–945. [Google Scholar] [CrossRef]
  19. Zhao, W.; Chu, H.; Zhang, M.; Sun, T.; Guo, L. Flocking control of fixed-wing UAVs with cooperative obstacle avoidance capability. IEEE Access 2019, 7, 17798–17808. [Google Scholar] [CrossRef]
  20. Zhou, Z.; Xu, H. A Novel Mean-Field-Game-Type Optimal Control for Very Large-Scale Multiagent Systems. IEEE Trans. Cybern. 2022, 52, 5197–5208. [Google Scholar] [CrossRef]
  21. Mehlfuhrer, C.; Caban, S.; Rupp, M. Cellular system physical layer throughput: How far off are we from the Shannon bound? IEEE Wirel. Commun. 2011, 18, 54–63. [Google Scholar] [CrossRef]
  22. Branzei, R.; Dimitrov, D.; Tijs, S. Models in Cooperative Game Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008; Volume 556. [Google Scholar]
  23. Gulzar, M.M.; Rizvi, S.T.H.; Javed, M.Y.; Munir, U.; Asif, H. Multi-agent cooperative control consensus: A comparative review. Electronics 2018, 7, 22. [Google Scholar] [CrossRef]
  24. Zhou, Z.; Xu, H. Decentralized Adaptive Optimal Tracking Control for Massive Autonomous Vehicle Systems With Heterogeneous Dynamics: A Stackelberg Game. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 5654–5663. [Google Scholar] [CrossRef]
  25. Yu, M.; Hong, S.H. A Real-Time Demand-Response Algorithm for Smart Grids: A Stackelberg Game Approach. IEEE Trans. Smart Grid 2016, 7, 879–888. [Google Scholar] [CrossRef]
  26. Cardaliaguet, P.; Porretta, A. An introduction to mean field game theory. In Mean Field Games; Springer: Berlin/Heidelberg, Germany, 2020; pp. 1–158. [Google Scholar]
  27. Yang, Y.; Luo, R.; Li, M.; Zhou, M.; Zhang, W.; Wang, J. Mean field multi-agent reinforcement learning. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 5571–5580. [Google Scholar]
  28. Shiri, H.; Park, J.; Bennis, M. Massive autonomous UAV path planning: A neural network based mean-field game theoretic approach. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar]
  29. Bogachev, V.I.; Krylov, N.V.; Röckner, M.; Shaposhnikov, S.V. Fokker–Planck–Kolmogorov Equations; American Mathematical Society: Providence, RI, USA, 2022; Volume 207. [Google Scholar]
  30. Peng, S. Stochastic hamilton–jacobi–bellman equations. SIAM J. Control Optim. 1992, 30, 284–304. [Google Scholar] [CrossRef]
  31. Murray, J.J.; Cox, C.J.; Lendaris, G.G.; Saeks, R. Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 2002, 32, 140–153. [Google Scholar] [CrossRef]
  32. Kaelbling, L.P.; Littman, M.L.; Moore, A.W. Reinforcement learning: A survey. J. Artif. Intell. Res. 1996, 4, 237–285. [Google Scholar] [CrossRef] [Green Version]
  33. Ju, C.; Son, H.I. Multiple UAV systems for agricultural applications: Control, implementation, and evaluation. Electronics 2018, 7, 162. [Google Scholar] [CrossRef] [Green Version]
  34. Gupta, J.K.; Egorov, M.; Kochenderfer, M. Cooperative multi-agent control using deep reinforcement learning. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, Sao Paulo, Brazil, 8–12 May 2017; pp. 66–83. [Google Scholar]
  35. Oroojlooy, A.; Hajinezhad, D. A review of cooperative multi-agent deep reinforcement learning. Appl. Intell. 2022, 1–46. [Google Scholar] [CrossRef]
  36. Zhang, S.; Li, T.; Cheng, X.; Li, J.; Xue, B. Multi-Group Formation Tracking Control for Second-Order Nonlinear Multi-Agent Systems Using Adaptive Neural Networks. IEEE Access 2021, 9, 168207–168215. [Google Scholar] [CrossRef]
  37. Wu, Z.; Liu, X.; Sun, J.; Wang, X. Multi-group formation tracking control via impulsive strategy. Neurocomputing 2020, 411, 487–497. [Google Scholar] [CrossRef]
  38. Luo, L.; Wang, X.; Ma, J.; Ong, Y.S. Grpavoid: Multigroup collision-avoidance control and optimization for UAV swarm. IEEE Trans. Cybern. 2021. [Google Scholar] [CrossRef]
  39. Lewis, F.L.; Vrabie, D.; Vamvoudakis, K.G. Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Control. Syst. Mag. 2012, 32, 76–105. [Google Scholar]
Figure 1. An illustration of mixed game-based leader–follower multi-group swarming.
Figure 1. An illustration of mixed game-based leader–follower multi-group swarming.
Electronics 12 00089 g001
Figure 2. Mixed game-based hierarchical learning structure.
Figure 2. Mixed game-based hierarchical learning structure.
Electronics 12 00089 g002
Figure 3. Large-scale leader–follower multi-group swarming. The reference trajectory is denoted by a red curve, the green curve represents the leader trajectory, and the Followers are represented with multiple colors.
Figure 3. Large-scale leader–follower multi-group swarming. The reference trajectory is denoted by a red curve, the green curve represents the leader trajectory, and the Followers are represented with multiple colors.
Electronics 12 00089 g003
Figure 4. The tracking errors for the leaders.
Figure 4. The tracking errors for the leaders.
Electronics 12 00089 g004
Figure 5. The tracking errors and the tracking error PDF of the followers.
Figure 5. The tracking errors and the tracking error PDF of the followers.
Electronics 12 00089 g005
Figure 6. HJB equation error of the leader in group 1.
Figure 6. HJB equation error of the leader in group 1.
Electronics 12 00089 g006
Figure 7. HJB equation error of follower 1 in group 1.
Figure 7. HJB equation error of follower 1 in group 1.
Electronics 12 00089 g007
Figure 8. FPK equation error of follower 1 in group 1.
Figure 8. FPK equation error of follower 1 in group 1.
Electronics 12 00089 g008
Figure 9. Leader running cost for developed mixed game theory-based distributed approach and the traditional cooperative game-based centralized approach.
Figure 9. Leader running cost for developed mixed game theory-based distributed approach and the traditional cooperative game-based centralized approach.
Electronics 12 00089 g009
Figure 10. Follower running cost for developed mixed game theory-based distributed approach and the traditional cooperative game-based centralized approach.
Figure 10. Follower running cost for developed mixed game theory-based distributed approach and the traditional cooperative game-based centralized approach.
Electronics 12 00089 g010
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dey, S.; Xu, H. Intelligent Distributed Swarm Control for Large-Scale Multi-UAV Systems: A Hierarchical Learning Approach. Electronics 2023, 12, 89. https://doi.org/10.3390/electronics12010089

AMA Style

Dey S, Xu H. Intelligent Distributed Swarm Control for Large-Scale Multi-UAV Systems: A Hierarchical Learning Approach. Electronics. 2023; 12(1):89. https://doi.org/10.3390/electronics12010089

Chicago/Turabian Style

Dey, Shawon, and Hao Xu. 2023. "Intelligent Distributed Swarm Control for Large-Scale Multi-UAV Systems: A Hierarchical Learning Approach" Electronics 12, no. 1: 89. https://doi.org/10.3390/electronics12010089

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop