Formation Control with Connectivity Assurance for Missile Swarms by a Natural Co-Evolutionary Strategy

Chen, Junda; Lan, Xuejing; Zhou, Ye; Liang, Jiaqiao

doi:10.3390/math10224244

Open AccessArticle

Formation Control with Connectivity Assurance for Missile Swarms by a Natural Co-Evolutionary Strategy

¹

School of Mechanical and Electrical Engineering, Guangzhou University, Guangzhou 510006, China

²

School of Aerospace Engineering, Engineering Campus, Universiti Sains Malaysia, Nibong Tebal 14300, Malaysia

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(22), 4244; https://doi.org/10.3390/math10224244

Submission received: 16 September 2022 / Revised: 21 October 2022 / Accepted: 10 November 2022 / Published: 13 November 2022

(This article belongs to the Special Issue Deep Learning and Adaptive Control)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Formation control is one of the most concerning topics within the realm of swarm intelligence. This paper presents a metaheuristic approach that leverages a natural co-evolutionary strategy to solve the formation control problem for a swarm of missiles. The missile swarm is modeled by a second-order system with a heterogeneous reference target, and the exponential of the resultant error is accumulated to be the objective function such that the swarm converges to optimal equilibrium states satisfying specific formation requirements. Focusing on the issue of the local optimum and unstable evolution, we incorporate a novel model-based policy constraint and a population adaptation strategy that significantly alleviates the performance degradation of the existing natural co-evolutionary strategy in terms of slow training and instability of convergence. With application of the Molloy–Reed criterion in the field of network communication, we developed an adaptive topology method that assures connectivity under node failure, and its effectiveness is validated theoretically and experimentally. The experimental results demonstrate that the accuracy of formation flight achieved by this method is competitive with that of conventional control methods and is much more adaptable. More significantly, we show that it is feasible to treat the generic formation control problem as an optimal control problem for finding a Nash equilibrium strategy and solving it through iterative learning.

Keywords:

multi-agent system; formation control; natural co-evolutionary strategy; connectivity

MSC:

93A16; 68W50

1. Introduction

Intelligent control of swarm systems has been widely used in tracking, rescue and delivery [1,2]. Through the interactive cooperation of multiple agents, the swarm can exhibit complex intelligent behaviors, such as cohesion, separation and alignment, known as the Reynolds rules [3]. Among the various research directions, the formation control problem has been widely studied on a variety of system models for its theoretical and practical values.

In this paper, we focus on the formation control of missile swarms. Similar to common formation control problems, formation control for a missile swarm covers formation initialization, contraction, expansion and reconfiguration [4], corresponding with formation producing and formation tracking problems [5]. Based on the sensing capability and interaction level, the current dominant research [6] classifies formation control methods into position-, displacement- and distance-based control. The main difference between these control methods is the ability to sense relative or absolute state information. Thus, we propose classifying angle-based control as a particular case of displacement-based control, in which the relative distance constraint from the relative position is removed. In this paper, we focus on the displacement-based control framework not only for its simplicity and stability but also for its vast realistic application value [7].

In practice, when dealing with large-scale swarm systems, control methods that the system can adopt are usually subjected to limited communication bandwidth, communication quality and other interference issues. Hence, more flexible communication means that tolerance to communication failures is necessary for achieving robust information transmission. However, traditional communication methods based on a fixed communication topology have difficulty coping with unexpected situations and providing continuous and reliable communication. For developing more robust intelligent swarm communication algorithms, two newly developed adaptive communication methods in the field of network control system are known as ad hoc-based networks and cluster-based networks [8]. A recent study utilized the received signal strength instead of localization facilities to measure individual distances [9]. Another approach under the same ad-hoc framework [10] developed a more robust intelligent swarm communication algorithm based on the Molloy–Reed criterion and grey prediction, improving the robustness of the drone swarm network and the reliability of data transmission. Such an algorithm is further employed in the flying ad-hoc network (FANET), which is a distributed and self-organizing communication framework. Considering that distance is usually closely related to communication quality, and the perception of distance does not require additional communication bandwidth, it is highly feasible to treat distance as a major factor in configuring the communication topology. In this paper, we develop an adaptive topology communication method based on minimizing the communication distance and overcome the problem of head failure in the cluster-based communication method to ensure connection stability for formation control.

Formation control methods based on traditional control theory and dynamic systems can provide more robust control for single control objectives and motion patterns [11], but it is hard to achieve high-precision coordination through conventional control methods with multiple objectives or in environments that are too complex to be modeled. That is why intelligent control approaches have been introduced as more flexible alternatives for operating in unstructured or dynamic environments surrounded by multiple uncertainties.

A number of metaheuristic algorithms have been shown to cope well with multi-objective, complex constrained optimization problems and have been widely used in all aspects of formation control, including the formation of optimal configurations and motion planning. A comprehensive survey on the development of such algorithms for aircraft motion planning problems is presented in [12]. In [13], Liu investigated a particle swarm optimization (PSO)-based algorithm in the optimal design of missile formation configuration. In [14], Seung-Mok proposed a cooperative co-evolution PSO algorithm based on the traditional PSO-based model predictive control, which optimizes in a distributed way and improves the speed and performance of the original algorithm. Another population-based metaheuristic algorithm, known as the genetic algorithm (GA), was employed to evolve the positioning strategy of formation-controlled multi-robot systems [15]. A gradient descent-based reinforcement learning method utilizing an actor-critic framework was proposed for optimal consensus control of multi-agent systems. Existing research on consensus control provides another perspective on solving the formation problem, since the general formation control problem is a special kind of consensus problem which requires that certain errors are maintained between the states of neighboring robots rather than identical states. Although existing metaheuristic-based algorithms have been applied to formation control in searches for single-step optimal solutions in real time, these methods are usually slow when running and do not have the ability to migrate and learn. In contrast, neural network-based controllers can be trained to achieve the same performance through iterative learning and can be easily deployed to compact mobile units with low computational performance and cost, although the majority of the computational cost is spent in the training stage.

There is a dearth of existing research using neural network controllers, despite the fact that they are commonly used under the reinforcement learning (RL) paradigms [16] and trained to solve specific control or decision-making problems. In [17], Liu developed a self-organizing map (SOM)-based neural network approach for motion planning in the formation control problem of a multiple autonomous underwater vehicle (AUV) system, where the SOM is an unsupervised neural network serving specific purposes, such as competitive learning and task assignment [18]. This is an iterative algorithm used to search for the most energy-efficient motion path for the agent. Since it takes all individuals’ states as input and the target path as the output, it can be regarded as a centralized method which is not desirable for the practical deployment requirement.

The general idea of the heuristic algorithm using a neural network controller is similar to other algorithms, such as PSO and the ant colony optimization (ACO) algorithm, which are designed to find the optimal solution of the cost function under certain constraints by iterative searching. The essential difference between the former as well as other algorithms is that the former optimizes the neural network weights, and the final optimized controller can be deployed directly. In contrast, the latter optimizes directly in the solution space in a single step. Both methods have their advantages and disadvantages. Since the latter does not need pretraining, it relies too much on single-step computation, and it is difficult to guarantee the speed requirement for real-time operation. Therefore, a neural network controller, as an adaptive and learnable controller, is considered a promising means of intelligent control in the future [19]. Evolutionary strategy can effectively and stably optimize the neural network structure and the network weights [20,21], thus benefiting any possible application scenario that requires observation of large amounts of historical data.

Apart from acting as a controller to generate control commands, neural networks are widely used in control systems for sensing, decision making, trajectory planning and many other purposes. In [22], a modified Grossberg neural network (GNN) was used to generate the shortest path to avoid obstacles and reach a target point. Similar research on optimal path or trajectory design is also available in [23,24]. Lan [25] adopted the radial basis function to estimate the system disturbance, which led to enhanced robustness and adaptability of swarm formation control based on the artificial potential field method. Additional research that implements adaptive neural networks to estimate the uncertain and nonlinear dynamics of the system can be found in [26,27,28]. Furthermore, in [29], Lan used reinforcement learning theory to train a neural network controller that could be applied to the distributed control of swarm systems in an unknown dynamic environment, where the agents in the swarm can perform basic intelligent behaviors such as tracking and obstacle avoidance. Likewise, extensive research has shown that neural networks have certain robustness in many scenarios without being inferior to traditional methods and are relatively more flexible and applicable.

The foremost motivation of this paper is to develop a metaheuristic evolutionary computational approach to solve the formation control problem for multi-agent systems (MASs) while exploring the usage of neural network (NN) controllers. We use a recently proposed natural co-evolutionary strategy (NCES) algorithm [30] to realize such a vision and apply it to the control of a second-order multi-missile system. The contributions of this paper are manifold. First, we incorporate a policy constraint approach to enhance the stability of the algorithm in order to optimize the objective function and find the Nash equilibrium strategy. The proposed algorithm is more flexible for adapting to varying control objectives compared with conventional approaches [4,31,32,33]. Then, an adaptive topology scheme is designed to address the common node failure problem in formation control, and this method can achieve stable communication connections at a relatively low communication cost. Finally, a stable population adaptation method is proposed to further improve the performance of the algorithm and mitigate the local optimum issue. Emulation experiments show that the proposed formation control algorithm can handle tasks such as formation maintenance, reference trajectory tracking and formation switching with high accuracy in the presence of obstacles, and it is immune to disturbances such as randomized initial positions and node failure. The implementation code (https://github.com/GEYOUR/Swarm-Missile-Formation-Control accessed on 9 September 2022) of the parallelized algorithm as well as the experimental platform are available online to encourage further research on the constrained swarm system.

The remainder of this work is constructed as follows. In Section 2, the nonlinear multi-missile system is modeled, and the displacement-based formation control problem is formulated, including the specification of formation patterns. In Section 3, the distributed NCES-based formation control algorithm is proposed based on a neural network controller. In Section 4, numerical experiments are conducted for a variety of scenarios, and the results are presented. Finally, the conclusions and recommendations are presented in Section 5.

2. Preliminaries and Problem Formulation

2.1. System Modeling of a Swarm of Cruise Missiles

This paper focuses on formation control in a two-dimensional space (i.e., the

O X Y

planar space of the inertial coordinate frame). As shown in Figure 1, the swarm consists of multiple missiles, and each missile can be treated as a point mass. In the simplified dynamics model, we do not consider aerodynamic factors or the effect of the missile’s own inertia tensor. Considering a total of N missiles, the ith missile is denoted by

M_{i}

, while

V_{m i}, α_{m i}

,

x_{m i}

and

y_{m i}

are used to represent the speed, heading angle and coordinates of

M_{i}

in the global coordinate frame. Let

x_{i} = {[x_{m i}, y_{m i}, α_{m i}]}^{T}

represent the measurement vector of the missile, which areis not necessarily required hereinafter, and

x_{r} = {[x_{r}, y_{r}, α_{r}]}^{T}

represent the measurement vector of the reference target. For brevity, let

{\hat{x}}_{i} = {[x_{i}^{T}, V_{m i}]}^{T}

be the state vector with speed inserted. Here,

a_{v i}

and

a_{l i}

are defined as the acceleration commands along the direction of the velocity and perpendicular to the direction of the velocity, respectively. As shown in the partial enlargement of the actuation decomposition, the two acceleration commands are independent and perpendicular to each other. The second-order system dynamic of the ith missile can be expressed as

\{\begin{matrix} {\dot{\hat{x}}}_{i} = Ψ ({\hat{x}}_{i}) + Φ ({\hat{x}}_{i}) u_{i} \\ x_{i} = C {\hat{x}}_{i} \end{matrix},

(1)

where

u_{i} = {[a_{v i}, a_{l i}]}^{T}

is the input for

M_{i}

and

\begin{matrix} Ψ ({\hat{x}}_{i}) & = [\begin{matrix} V_{m i} cos α_{m i} \\ V_{m i} sin α_{m i} \\ 0 \\ 0 \end{matrix}], \\ Φ ({\hat{x}}_{i}) & = [\begin{matrix} 0 & 0 \\ 0 & 0 \\ 0 & 1 / V_{m i} \\ 1 & 0 \end{matrix}], \\ C & = [\begin{matrix} I_{3}, 0_{3, 1} \end{matrix}], \end{matrix}

(2)

where

I_{n}

and

0_{m, n}

denote the identity and zero matrices. Note that the system in Equation (1) is similar to the system discussed in [34], which is often called the unicycle-like vehicle. In this paper, the missile is a nonholonomic system with controllable speed that is capable of maneuvering with limited actuation and dynamic constraints. We assume that under a certain communication relationship, the relative state

z_{j i} = x_{j} - x_{i}

can be sensed by missile i, where j represents the neighboring missile. In addition, we make the following assumptions about the system:

Assumption 1.

We assume that the reference target or leader is a first-order model driven by an unknown changeable speed and angular velocity commands, which means that its vertical acceleration is not available, but its lateral acceleration is subject to the same constraints as the follower missiles. Hence, the heterogeneous system may bring in more complexity for mathematical analysis and stability issues.

Assumption 2.

It is assumed that the missile controller applies the cascade control [35] structure such that its altitude is automatically controlled by the inner control loop, which is a stable closed loop. We focus on the design for the missiles’ thrust and lateral acceleration of the outer control loop. This divide-and-conquer approach is widely used in many studies [32,36] to simplify the problem in order to verify the effectiveness of the control method, and it is also valid for three-dimensional dynamic models.

2.2. Formation Control under Displacement-Based Framework

Let us denote the coordinate of the ith node of the formation by

λ_{p i}

. The formation pattern is defined by

λ = {λ_{p i} = {[λ_{x i}, λ_{y i}]}^{T} : i = 1, 2, \dots, N}

, which is the set of Cartesian coordinates of the origin of each node that is expressed in the coordinate system such that the center of the formation is the origin, which should satisfy

\sum_{i = 1}^{N} λ_{x i} = 0, \sum_{i = 1}^{N} λ_{y i} = 0 .

(3)

In the displacement-based framework, each missile has to align its own coordinate system with the global coordinate system and be able to sense the relative positions and orientations of its neighbors, whereas its position in the global coordinate system is not necessarily required. Let

N_{i}

denote the set of neighboring missiles of the ith missile. Suppose that the reference target should be kept in the formation center. Then, the tracking error of the ith missile is

e_{i} = \sum_{j \in N_{i}} [(P_{i} - P_{j}) - (x_{i} - x_{j})] + ζ_{i} (x_{r} + P_{i} - x_{i}),

(4)

where

e_{i} \in R^{3}

and

P_{i} = {[λ_{x i}, λ_{y i}, 0]}^{T}

. If the target state is available to the ith missile, then

ζ_{i} = 1

; otherwise,

ζ_{i} = 0

. It is worth noting that the first term represents the formation maintenance error, while the second term represents the target tracking error when it is able to obtain the target information. However, the error vector is defined in the global coordinate system. In order to standardize the effect of an individual missile’s heading angles on the error vector to the relative coordinate system that is aligned with the missile’s orientation, we define the rotated error vector as

e_{r i} = R_{3} (- α_{i}) e_{i}

, where

R_{3} (- α) \in R^{3 \times 3}

is the three-dimensional rotation matrix defined as

R_{3} (- α_{i}) = [\begin{matrix} cos α_{i} & sin α_{i} & 0 \\ - sin α_{i} & cos α_{i} & 0 \\ 0 & 0 & 1 \end{matrix}] .

(5)

Additionally, there exists

e_{r i} = {[\begin{matrix} e_{i x} & e_{i y} & e_{i α} \end{matrix}]}^{T}

, where the subscript r represents the rotational transformation. Considering the system in Equation (1) and the discretization of the system control loop, we can obtain the following error dynamics:

\dot{e_{r i}} = G_{i} u_{i} + \sum_{j \in N_{i}} F_{i j} u_{j} + D_{i} u_{r} + H_{i},

(6)

where

\begin{matrix} G_{i} & = [\begin{matrix} - τ (L_{i} + ξ_{i}) & e_{i y} / v_{i} \\ 0 & - e_{i x} / v_{i} \\ 0 & - (L_{i} + ξ_{i}) / v_{i} \end{matrix}], \\ F_{i j} & = [\begin{matrix} τ cos α_{j i} & 0 \\ τ sin α_{j i} & 0 \\ 0 & 1 / v_{j} \end{matrix}], \\ D_{i} & = [\begin{matrix} ξ_{i} cos \hat{α_{i}} & 0 \\ ξ_{i} sin \hat{α_{i}} & 0 \\ 0 & ξ_{i} \end{matrix}], \\ H_{i} & = [\begin{matrix} - (L_{i} + ξ_{i}) v_{i} + \sum_{j \in N_{i}} cos α_{j i} v_{j} \\ \sum_{j \in N_{i}} sin α_{j i} v_{j} \\ 0 \end{matrix}] . \end{matrix}

(7)

Here,

L_{i}

is the number of missiles belonging to the set of neighbors

N_{i}

, and

τ

is the simulation step size, where

α_{j i} = α_{j} - α_{i}

and

\hat{α_{i}} = α_{r} - α_{i}

. Furthermore,

u_{i} = {[a_{v i}, a_{l i}]}^{T}

and

u_{r} = {[v_{r}, w_{r}]}^{T}

denote the vector of the control commands of missile i and the reference target, respectively.

Generally, the objective at time t can be defined by a function of the error vector:

J_{i} (t, e_{r i} (t)) = exp (- e_{r i}^{T} (t) K_{C} e_{r i} (t)),

(8)

where

K_{C} = diag [k_{1}, k_{2}, k_{3}]

is the symmetric positive definite matrix.

K_{C}

balances the shape and consistent direction of movement of the formation. The optimal design of the formation controller

u_{i}

can be formulated as a nonlinear optimization problem:

u_{i}^{*} = arg max_{u_{i}} \int_{0}^{T} J_{i} (t, e_{r i} (t)) d t,

(9)

This is subjected to

\begin{matrix} u_{m i n} \leq u_{i} \leq u_{m a x}, \\ V_{m i n} \leq V_{m i} \leq V_{m a x}, \end{matrix}

(10)

in which T denotes the total flight time and

u_{m i n}, u_{m a x}, V_{m i n}

and

V_{m a x}

represent the system constraints.

In this paper, we define two formation patterns—the regular polygon and the straight line formation—which are shown in Figure 2 and denote each formation pattern by

λ_{(β, l_{f})}^{P}

and

λ_{(β, l_{f})}^{L}

, respectively. In this figure, each dot represents a node of the formation, and the nodes are connected to each other under the constraint relationship to form specific formation shapes.

l_{f}

and

β

are the parameters that control the size and rotation of the formation, respectively, and we have

\begin{matrix} λ_{(β, l_{f})}^{P} & = {R_{2} (β) {[l_{f} sin (\frac{2 π}{N} (i - 1)), l_{f} cos (\frac{2 π}{N} (i - 1))]}^{T} : \\ i \in {1, 2, \dots, N}}, \\ λ_{(β, l_{f})}^{L} & = {R_{2} (β) {[0, l_{f} \frac{N - 2 i + 1}{2}]}^{T} : i \in {1, 2, \dots, N}}, \end{matrix}

(11)

In addition,

R_{2} (β)

is a two-dimensional rotation matrix similar to that in Equation (5), which is

R_{2} (β) = [\begin{matrix} cos β & - sin β \\ sin β & cos β \end{matrix}],

(12)

Examples of a variety of formation patterns can be found in [1,37,38], but specific geometric definitions are represented only in this paper.

From the above definition, it can be intuitively observed that the straight line formation is symmetric with respect to the origin of the coordinate system that defines it such that the sum of its x and y coordinates is zero, which satisfies Equation (3). Similarly, when the number of nodes of a regular polygon is even, we can also get this conclusion. Therefore, we only need to prove that when the number of nodes is odd, it still satisfies this prerequisite as follows.

Suppose that the regular polygon formation defined in Equation (11) consists of

N \in {2 n + 1 : n \in Z}

nodes. The Cartesian coordinates of each node in the formation are denoted by

P_{i} = {[λ_{x i}, λ_{y i}]}^{T}

such that the sum of all coordinates is

\sum_{i = 1}^{N} P_{i} = [\begin{matrix} ξ_{x} \\ ξ_{y} \end{matrix}] .

(13)

Suppose node 1 is located at the y axis such that

ξ_{x} = 0

. After rotating the formation by

ρ

, we can obtain the following sum of coordinates by multiplication with the rotation matrix:

R_{2} (ρ) \sum_{i = 1}^{N} P_{i} = [\begin{matrix} ξ_{x} cos ρ - ξ_{y} sin ρ \\ ξ_{x} sin ρ + ξ_{y} cos ρ \end{matrix}] .

(14)

The angle between the vector from the coordinate origin to node k and the corresponding direction of the Y axis is

2 (k - 1) π / N

. If node k is rotated to be on the y coordinate axis, since the formation is again symmetric about the Y axis, the summation of the x coordinates is

ξ_{x} cos (2 (k - 1) π / N) - ξ_{y} sin (2 (k - 1) π / N) = 0, \forall k = 1, 2, \dots, N .

(15)

Since N is an odd number, if k is not equal to 1 or

(N + 1) / 2

, then

2 (k - 1) π / N

is not equal to 0 or

π

, and

sin (2 (k - 1) π / N)

is not constantly equal to 0, and therefore,

ξ_{y} = 0

. Thus, Equation (14) is a zero matrix for an arbitrary

ρ

value, and Equation (3) holds. It is shown that many other formation patterns can be considered as variants of the above two formation patterns, such as row, column and square patterns [1]. Moreover, as discussed in the following section, this formation paradigm can also generate some asymmetric patterns, such as wedge and crescent patterns [25], by deleting nodes appropriately.

3. Applying Natural Co-Evolutionary Strategy to Formation Control via Neural Networks

3.1. Natural Co-Evolutionary Strategy for MASs

With limited sensing capability, an MAS can be modeled as a multi-agent partially observable Markov decision process (POMDP), which is an extension based on the Markov decision process (MDP). Such a process can be expressed as

P (S \times A_{u} \to S^{'}),

(16)

with S and

S^{'}

representing the system state before and after transitions, respectively, and

A_{u} = [u_{1}, \dots, u_{n}]

being the action vector for all agents in the system.

P (\cdot)

is the transition probability function. It is said to be a deterministic problem if

P (\cdot)

is either one or zero; otherwise, it is a stochastic problem.

Considering the simplified system model neglecting uncertainties, the formation control problem in general can be solved using optimization algorithms under a multi-agent POMDP. Assuming that the fitness function measuring the performance of agent i is

f_{i} (θ_{i}, θ_{N_{i}})

, in which

θ_{i}

is its controller (policy) parameter, and

θ_{N_{i}} = {θ_{j} : j \in N_{i}}

is the set of parameters for its neighboring agents, then the objective of optimal control is to find the optimal control strategy

θ_{i}^{*}, i \in (1, \dots, N)

such that

\begin{matrix} F ({f_{i} (θ_{i}^{*}, θ_{N_{i}}) : i \in (1, \dots, N)}) \geq \\ F ({f_{i} (θ_{i} \neq θ_{i}^{*}, θ_{N_{i}}) : i \in (1, \dots, N)}), \end{matrix}

(17)

where

F (\cdot)

denotes the overall performance of all agents in the MAS. Such a solution is often referred to as the Nash equilibrium strategy, meaning that no further improvement can be made to individual solutions without deteriorating the performance. It has been found to be very difficult to find a Nash equilibrium strategy for the nonlinear continuous agent system whose cost function is coupled with neighboring states while satisfying the system constraints in Equation (10) [14,39]. For the formation control problem, which requires sufficient cooperation among neighboring agents,

F (\cdot)

can be regarded as a simple summation of all individuals

f_{i} (\cdot)

, and the formation control problem can be viewed as a constrained dynamic optimization problem with the objective of minimizing the total cost of the formation error. Thus, the Nash equilibrium strategy can be obtained using a co-evolutionary algorithm that is designed for evolving simultaneously to reach the overall optimum fitness.

In a previous work [30], we improved the natural evolutionary strategy (NES) and proposed an NCES algorithm that seeks global optimality for the constrained multi-objective optimization problem in multi-agent systems. In brief, the NCES algorithm is a bio-inspired, population-based algorithm capable of optimizing high-dimensional parameters, such as neural network weights, toward the direction of higher fitness. The NCES algorithm usually proceeds as follows. First, the parameters

θ_{i}

for

i \in (1, \dots, N)

are initialized, and the optimization objective is determined for a specific control problem. The fitness function

f (\cdot)

which is embedded within the system model is thus obtained. In the second step, iterative optimization is performed, and new perturbations

ϵ_{i}

are sampled m times at the beginning of each iteration step, which obey a probability distribution

p (\cdot)

to obtain the perturbed population

θ_{i}^{'} = θ_{i} + ϵ_{i}

corresponding to each agent. Then, their fitness values are evaluated in the system in a distributed way so that each population obtains the following corresponding gradient information:

g_{θ_{i}} = \frac{1}{m σ^{2}} \sum_{i = 1}^{m} f (θ_{i}^{'}, θ_{N_{i}}) ϵ_{i} \prod_{c \in N_{i}} p (ϵ_{c}),

(18)

The parameters are also updated in a gradient ascent manner:

θ_{i} = θ_{i} + η_{α} g_{θ_{i}} .

(19)

Finally, the second step is looped until convergence or the Nash equilibrium strategy is found. Compared with the conventional NES algorithm, the NCES algorithm provides a more accurate estimation of gradients in the presence of multiple interactive agents. More details of the algorithm can be found in [30].

3.2. Distributed Co-Evolutionary Strategy Optimizing a Neural Network Controller

To investigate the applicability of neural network controllers within the framework of formation control problems, this paper proposes adopting a multi-layer perceptron (MLP) neural network controller with a cooperative NES-based training approach. The MLP NN has a single hidden layer of 16 nodes, and its schematic is depicted in Figure 3.

The weighting matrices for each layer are represented by

W_{1 i} and W_{2 i}

.

ϕ (\cdot)

and

ψ (\cdot)

are the activation functions for the hidden layer and output layer. Specifically,

ϕ (\cdot)

is the sigmoid function, and

ψ (\cdot)

is selected as the hyperbolic tangent function

(T a n h)

in order to impose restrictions on the output and satisfy the system constraints. Another advantage of the neural network controller is that saturated control can be achieved by a reasonable choice of activation function without restricting the solution space. With

z = {[α_{m i}, V_{m i}, e_{r i}]}^{T}

denoting the input, the output of the neural network controller is

u_{i} = ψ (W_{2 i} \cdot ϕ (W_{1 i} \cdot z)) .

(20)

Note that the dashed network input TS in Figure 3 represents the transform signal, which is included in the input only when needed, as described in Section 4. The same network configuration will be used in the following experiments for all agents. Applying the previously mentioned NCES algorithm to train this controller, it is first determined that the parameters of the controller are

θ_{i} = [W_{1 i}, W_{2 i}]

. The fitness function is the objective function in Equation (8) (i.e.,

f (θ_{i}, θ_{N_{i}}) = J_{i}

), which is nonlinearly coupled with the states of neighboring agents and can only be evaluated through the interaction feedback within the system. In order to apply the NCES algorithm to the training of the missile formation controller in this paper, the plain NCES algorithm is insufficient to guarantee the global stability and the convergence speed of the algorithm. To further improve the performance of the algorithm and perform specific optimization for the formation problem, we propose a series of supplementary techniques to effectively enhance the convergence speed and accuracy of the algorithm.

3.3. Population Adaptation Technique

In [30], we pointed out the importance of the learning rate for the accuracy of the algorithm and proposed an algorithm for learning rate adaptation. However, adjusting the learning rate often consumes a lot of time. In order to ensure the speed of the algorithm, a novel population size adaptation algorithm is used to adjust the evolutionary process adaptatively. Based on the previous works [40,41], we can learn that the trend of the gradient is related to the complexity of the objective function. Usually, for a more complex optimization region, such as in a multi-modal or noised function, the estimated gradient is gentle, while in a flatter region, such as a spherical space, the gradient is relatively steep. To estimate the accuracy of the estimated natural gradient under the movement of the parameter distribution, the evolution path

ρ_{θ}

is introduced to detect the resistance in the evolutionary process. The population size is adapted based on the length of the evolutionary path following empirical common sense that a larger population size will lead to higher accuracy for the estimated gradient. The weight matrix

W_{1 i}

,

W_{2 i}

of agent i is considered its parameter vector

θ_{i}

, for which

θ_{i} \in R^{s}

and s is the total number of weights. The evolution path in iteration t is calculated by accumulating the square of the Mahalanobis distance of the parameter movement of all agents as

ρ_{θ} (t) = \sum_{i = 1}^{N} {[θ_{i} (t) - θ_{i} (t - 1)]}^{T} Σ^{- 1} [θ_{i} (t) - θ_{i} (t - 1)] .

(21)

Note that

Σ

is the covariance of the probability distribution from which the new populations are sampled, since the evolution path should not depend on the parameterization of the probability distribution. The population size

η_{p} (t)

is then adjusted according to the evolutionary path as follows:

\begin{matrix} η_{p} (t) & = η_{p} (t - 1) (β + (1 - β) \frac{ρ_{θ} (t - 1)}{ρ_{θ} (t)}), \\ η_{p} (t) & = clip (max (η_{p} (t), η_{p} (t - 1)), η_{p}^{min}, η_{p}^{max}), \end{matrix}

(22)

where

β

is a constant factor that determines the growth rate of the population size and

η_{p}^{min}

and

η_{p}^{max}

are the minimum and maximum population size which are sent to the clip function

clip (\cdot)

in case of undesirable adapted values, respectively. Note that since the total optimization complexity tends to increase as evolution progresses, we adopted a non-decreasing strategy to adjust the population size to ensure stability. The initial population size is set to be

η_{p} (0) = 10 + 5 ln (s)

, referring to the default set-up in [42], which should be a good candidate, and the boundaries are determined as

η_{p}^{min} = η_{p} (0), η_{p}^{max} = 4 η_{p} (0) .

(23)

Note that s is the number of parameters of one agent. The above configuration was found to be appropriate according to the experimental results. Through empirical observation, we came to an intuitive conclusion that the increment of the population size does not necessarily lead to an improvement in evolutionary quality and sometimes even leads to difficulty in convergence or falling into the local optimum. This is probably because a large population size will greatly average the contributions of individuals and thus reduce the exploratory nature of each individual, so it is wise to adjust the size appropriately rather than just thinking that the larger, the better.

3.4. Cluster-Based Adaptive Topology

The inter-agent connectivity in the field of multi-agent system has been primarily modeled and characterized by means of graph theory [6,43], which will be utilized to identify the observable neighbors of the missiles in the formation. In this subsection, we review some of the basics of graph theory. Suppose there are n agents in the MAS (agents can be represented by nodes), and a graphs

G

is defined as

(V, ϵ)

, where

V = (v_{1}, v_{2}, \dots, v_{n})

represents the set of nodes and

ϵ \subseteq V \times V

represents the set of edges composed of directed connections of different nodes. The neighbor set of the node i is defined as

N_{i} = {j \in V : (i, j) \in ϵ}

, and

j \in V

is said to be connected to

i \in V

in a directed way if

(i, j) \in ϵ

and

(j, i) \notin ϵ

, while j is said to be connected to i in an undirected way if both

(i, j)

and

(j, i)

belong to

ϵ

. A directed path of

G

is a series of adjacent edges of the form

(v_{1}, v_{2}), (v_{2}, v_{3}), \dots, (v_{i}, v_{j}), (v_{j}, v_{k})

, and a graph is said to have a spanning tree if there exists a directed path from one node to any other nodes, and a graph is said to be connected in a directed (undirected) manner if there is a directed (undirected) path between any pair of distinct nodes. We use the adjacency matrix

A = [a_{i j}] \in R^{| V | \times | V |}

to represent the above node connectivity, in which

a_{i j} = \{\begin{matrix} 0, & if i = j or (i, j) \notin ϵ \\ 1, & otherwise \end{matrix} .

(24)

The associated Laplacian matrix is

L = D - A

.

D = d i a g (a_{i})

is the diagonal matrix of vertex degrees where

a_{i}

is the degree of vertex i, and

a_{i} = \sum_{j = 1}^{n} a_{i j}

. The Laplacian matrix L is a symmetric semi-positive definite matrix.

One of the foremost goals of designing a communication method is to ensure highly reliable and low-latency communication for all nodes in the swarm. The communication quality of the missile swarm is easily affected by various conditions, especially when performing in a hostile environment. In other words, the swarm of missiles is constantly subject to threats from the enemy’s air defense system during flight, and some missiles may get intercepted and lose communication with the rest of the swarm. Therefore, dynamic communication approaches need to ensure that when some nodes fail, the swarm can still maintain effective communication to complete a mission. Based on the previous work in network topology and swarm communication, we developed a novel adaptive cluster-based network which adaptively reconfigures the communication topology to achieve robust and fault-tolerant communication. Under the guidance of the well-known Molloy–Reed criterion [44], which is

k = \frac{< k^{2} >}{< k >} > 2,

(25)

where k denotes the average degree of an arbitrary node which can be considered somehow as the node degrees

a_{i}

, it can be inferred that each node in the communication network should at least be connected to two other nodes. For displacement-based formation, the formation is persistent only if there exists at least a spanning tree in the communication topology. When establishing a communication connection, the formation error tends to grow as the communication distance between nodes increases, so we favor the connection method which minimizes the length of the communication chains [37]. With the minimal constraint of satisfying the above conditions, we propose setting one node (usually node 1) of the network as the cluster head while the other nodes choose another communication node according to the inter-node distances that are derived from the formation definition. To achieve this, we define the inter-agent distance matrix as

D_{g} = [d_{i j}] \in R^{n \times n}, d_{i j} = | | λ_{p i} - λ_{p j} | |,

(26)

where

d_{i j}

is the Euclidean distance between nodes that can be calculated from the definition of the formation in Section 2 and

D_{g}

is a symmetric positive definite matrix. Given that node h is selected as the cluster head, the element of the adjacency matrix is determined as

\begin{matrix} a_{i j} = \{\begin{matrix} 1, & if i \neq h, and j \in {γ^{*}, h}; \\ 0, & otherwise; \end{matrix} \end{matrix}

(27)

where,

γ^{*} = \underset{γ}{arg min} d_{i γ}

,

γ \in N, 1 \leq γ \leq n

and

γ \neq h

. In case there are multiple

γ

that minimize

d_{i γ}

, the one with the highest order is taken. In the proposed adaptive topology, the cluster head undertakes the task of broadcasting its own state to the other nodes in the network, or the follower nodes need to follow the cluster head. In either case, the cluster head is not constrained by nodes other than the reference target. It is also important to note that only the cluster head has access to the reference target information, which greatly reduces the communication or detection burden of the follower nodes.

Using the above communication configuration method for a network with five nodes as an example, the obtained communication topology is shown in Figure 4a, and its adjacent matrix is

A^{1} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 \end{matrix}] .

(28)

Node 1 is the cluster head by default. When node 1 fails, node 2 successively inherits the head position as shown in Figure 4b, and the adjacent matrix becomes

A^{2} = [\begin{matrix} 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \end{matrix}]

(29)

As a comparison, the traditional leader-follower topology in the case of leader node failure can be seen in Figure 4c. Due to the lack of adjacent links, effective connections among the nodes cannot be reformed. Using the proposed adaptive communication method, network connectedness can be ensured. The proof is that graph

G

is uniformly connected since, for any

t \geq t_{0}

, there exists a node

h \in V

such that h is the root of a spanning tree [45].

3.5. Model-Based Constrained Policy

The NCES algorithm explores the optimal policy that maximizes fitness by continuously interacting with the environment. Consequently, the NCES algorithm and the reinforcement learning algorithm are prone to fail to achieve global convergence of the control policy, in many cases due to excessive cumulative time or inherent defects in the design of the objective function. In much of the previous literature, control policies were allowed to freely explore regions of the environment under system constraints. In recent years, the idea of constrained policy has gradually emerged [46], which states that control policies should be executed under constraints that can be imposed either by human-developed rules or by the feedback state of the system. Constrained policy, also known as safety exploration [47], has been increasingly applied to the simulation and training of realistic robotic controllers by which not only can personal safety be ensured during the training process, but global convergence can also be accelerated to some extent. Based on the above facts, we propose a model-based constrained policy. The nonlinear error dynamics of the second-order system were obtained in (6), which were a function of the system input. To apply this method, we assume that the control input of the communication recipient can be obtained by the agent. Thus, at time t, the predicted formation error at the next step can be calculated as follows:

{\hat{e}}_{r i} (t) = e_{r i} (t) + {\dot{e}}_{r i} (t) τ,

(30)

where

τ

is the time step and

\dot{e_{r i}}

is the derivative of the resultant error as described in Equation (6). The error deviation

▵ e_{r i} (t) \in R^{3}

is

▵ e_{r i} (t) = | {\hat{e}}_{r i} (t) | - | e_{r i} (t) |,

(31)

which is derived from the absolute values of the two error vectors. Then, the aggregate matrix is defined as

▵ E (t) = [\dots ▵ e_{r i} (t) \dots] \in R^{3 \times n}

for

i \in {1, 2, \dots, n}

. A termination indicator

S T

is assigned to the system so that if it is a nonzero value, the system terminates and restarts the training algorithm, and it is defined as

S T (t) = \{\begin{matrix} 1, & if \min (▵ E (t)) > δ_{s}; \\ 0, & otherwise, \end{matrix}

(32)

where

δ_{s}

is the threshold that measures the maximum error increment the algorithm can tolerate. It satisfies

δ_{s} < (2 l + 1) τ V_{m a x}

, where l is the number of neighbors for the agent with the maximum number of neighbors in the MAS. With a reasonable choice of

δ_{s}

, the algorithm is expected to exclude control strategies that deviate too much from the desired trajectory at an early stage. Experimental evidence shows that without such a constrained policy, the convergence speed and global optimality of the algorithm will be degraded. Applying the above adaptation techniques, the pseudo-code of the proposed algorithm used to train the missile formation controllers is shown in Algorithms 1 and 2. Since the network topology adaptation and constrained policy strategies are embedded in the process of fitness evaluation, they are not exhibited in the pseudo-code.

Algorithm 1 The distributed NCES based formation control algorithm.

Input: agent number

N \in N^{+}

N, population size

η_{p}

, standard deviation

σ

, rotation angle

β

, evolution path

ρ_{θ}

, number of parameters m, iteration t

Initialize
for each agent $i = {1, \dots, N}$ $i = 1, \dots, N$ do
initialize parameter $θ_{i}^{i n i t}$
$t \leftarrow 0$
$θ_{i} (0) \leftarrow θ_{i}^{i n i t}$
$η_{p} (0) \leftarrow 10 + 5 ln (m)$
$η_{p}^{m i n} \leftarrow η_{p} (0)$
$η_{p}^{m a x} \leftarrow 4 η_{p} (0)$
end for
while stopping criterion not met do
$t \leftarrow t + 1$
for $k = 1, \dots, m$ do
for each agent $i = {1, \dots, N}$ $i = 1, \dots, N$ do
sample $ϵ_{i}^{k} \sim N (0, σ^{2} I)$
$θ_{i}^{k} \leftarrow θ_{i} + ϵ_{i}^{k}$
end for
evaluate fitness $f (θ_{i}^{k}, θ_{N_{i}}^{k}),$ for $i = {1, \dots, N}$ $i = 1, \dots, N$
end for
for each agent $i = {1, \dots, N}$ $i = 1, \dots, N$ do
calculate natural gradient:
$g_{θ_{i}} \leftarrow \frac{1}{m σ^{2}} \sum_{k = 1}^{m} f (θ_{i}^{k}, θ_{N_{i}}^{k}) ϵ_{i}^{k} \prod_{c \in N_{i}} p (ϵ_{c}^{k})$
$θ_{i} (t) \leftarrow θ_{i} (t - 1) + η_{α} \cdot g_{θ_{i}}$
end for
append evolution path:
$ρ_{θ} (t) \leftarrow \sum_{i = 1}^{N} {[θ_{i} (t) - θ_{i} (t - 1)]}^{T} Σ^{- 1} [θ_{i} (t) - θ_{i} (t - 1)]$
$AdaptPopulationSize (ρ_{θ})$
end while

Algorithm 2 Adapt population size (

ρ_{θ}

).

if $length (ρ_{θ}) > 1$ then
$η_{p} (t) \leftarrow η_{p} (t - 1) (β + (1 - β) \frac{ρ_{θ} (t - 1)}{ρ_{θ} (t)})$
$η_{p} (t) \leftarrow clip (max (η_{p} (t), η_{p} (t - 1)), η_{p}^{min}, η_{p}^{max})$
else
$η_{p} (t) \leftarrow η_{p} (0)$
end if

4. Simulation and Result Analysis

In this section, numerical experiments are conducted to demonstrate the effectiveness of the proposed formation control algorithm. In order to simulate the actual physical environment, the construction of the experimental scenario and the modeling of the multi-agent system were carried out in PyBullet [48]. We simulated the operation of the missile swarm in three-dimensional space at a fixed altitude, and for convenience, the trajectories were plotted as two-dimensional planar graphs. Some complex aerodynamic parameters, such as air resistance, were removed, while collision detection was preserved. The simulation time step

τ

was set to be

0.1

for the following experiments.

4.1. Basic Formation Control

First, the basic formation control tasks were implemented to examine the validity of the algorithm under basic situations. A swarm of five missiles entails tracking the reference target moving along a diagonal or spiral trajectory while keeping the formation geometry. The reference target or virtual leader is set to be at the center of the formation in order to achieve error-free formation control. The objective is to achieve zero tracking error as well as zero formation maintenance error for the whole time (i.e.,

\int_{t} | e_{r i} (t) | d t \to 0_{3}

for

i \in {1, \dots, N}

). The hyperparameters of the algorithm are listed in Table 1, and it is noted that this parameter setting applied to all of the following experiments. Additionally, the system constraints are shown in Table 2. The missiles, as well as the reference target, were subjected to saturated control inputs and limited states. For the linear trajectory, the target was driven by a constant control input, while for the spiral trajectory, the control inputs of the target were

v_{r} = 0.65 - 0.01 t (km / s)

and

ω_{r} = 0.1 + 0.01 t (rad / s)

.

For convenience, we used

| e_{r i} |

to represent the resultant error of each agent. The trajectories of the two situations are shown in Figure 5 and Figure 6, and the corresponding analytical results are presented in Figure 7 and Figure 8. It can be observed that the convergent resultant error was maintained in a small interval (within 0.05) in both cases, and the results were of comparable if not better accuracy than the comparison results in [33], which kept the position error within 50 m. In addition, good synchronization of the speed and heading angles among the missiles was achieved, although the neighboring speed information was not provided.

4.2. Moving into Formation

Moving into formation is, however, different from cases where the formation is in the ideal geometry in the initial state. The missiles were separated from the reference target, with their positions initialized randomly in an area 4 km wide and 3 km long. The missiles need to first move to the designated formation shape and then track the reference target in a consistent motion direction, and in this case, the target moved along the y axis with a constant speed of 0.5 km. Figure 9 shows the motion trajectory of the swarm formation, and Figure 10 shows the resultant error during the flight. The results indicate that the formation was able to adjust to the desired formation shape rapidly (mostly within 10 s) and achieved high accuracy in tracking and maintaining the formation in the case of random initial distribution.

4.3. Switching Formations

Furthermore, we discuss the case in which the missile swarm is supposed to switch among formation patterns in order to avoid obstacles and go through narrow spaces. To achieve this transformation, an additional input node

T S

was appended to the input layer of the policy network as depicted in Figure 3. It was assumed that whenever the obstacle was detected by the leader missile or the literal cluster head, the missile then sendt a signal to all connected missiles to perform a formation switch of

λ^{P} \to λ^{L}

. Although formation geometries are predefined by the controller, the shapes can be controlled by parameters

l_{f}

and

β

, which can be changed during flight.

In this scenario, two walls were placed in the trajectory of the formation flight to create narrow spaces, and the missiles needed to pass through the obstacles by changing the shape or size of the formation. We implemented an event-based formation-switching strategy, in which the nodes in the formation were able to detect obstacles within a certain range

d_{c}

and send a formation-switching signal to all other nodes if an obstacle were to be detected. Similarly, after crossing the obstacle and reaching a safe distance, a formation recovery signal would be sent to restore the original formation.

To pass through the obstacles, we considered changing both the formation pattern and formation size, and the time-varying definitions of the formation in both cases were as follows.

The resulting trajectories are shown in Figure 11 and Figure 12. It can be observed that when obstacles were detected, the swarm could swiftly adjust by changing the formation pattern or size and then recover the formation quickly after going through the narrow space:

\begin{matrix} λ^{1} (t) & = \{\begin{matrix} λ_{(0, 0.5)}^{P}, & if d_{c} ⩽ n \cdot l_{f} \\ λ_{(- π / 4, 0.5)}^{L}, & otherwise . \end{matrix}, \\ λ^{2} (t) & = \{\begin{matrix} λ_{(0, 0.5)}^{P}, & if d_{c} ⩽ n \cdot l_{f} \\ λ_{(0, 0.2)}^{P}, & otherwise . \end{matrix}, \end{matrix}

(33)

4.4. Formation Control under Node Failure

An experiment was conducted in this scenario to verify the effectiveness of the proposed algorithm under node failure, where a swarm with six nodes was designed to pursue the reference target in a regular polygon formation and the reference target moved in a sinusoidal fashion. At

t = 20

s during pursuit, cluster head node 1 and cluster member node 4 suffered attacks and would disconnect from the other nodes in the swarm, and the remaining nodes of the swarm needed to maintain their original formation and complete the formation task.

From the results in Figure 13 and Figure 14, it can be observed that the swarm selected node 2 as the cluster head to reorganize the communication topology after node failure and successfully maintained the original formation shape after a short fluctuation. Finally, the robustness of the proposed formation control algorithm against node failure was validated. To investigate the effect of policy constraints on the control performance, we compared the results of the NCES-based formation control method that imposed policy constraints with the one without policy constraints. In some cases, such as node failure and switching formation, there was a certain chance that the algorithm would not converge. Moreover, in all cases, the convergence rate was generally improved by more than 20% with policy constraint rather than without it. Therefore, policy constraint is essential for the training of the NCES-based neural network controller.

5. Conclusions

This paper proposes a novel distributed NCES-based formation control algorithm for a second-order multi-missile system using neural networks. The algorithm minimizes the formation shape error and tracking error by training the optimal network controller through iterative learning, and it was combined with a policy constraint approach to enhance the stability of the algorithm. Additionally, we designed an adaptive topology scheme for the node failure situation which can achieve stable connections at a low communication cost. We also proposed a stable population adaptation method based on the evolution path, which further improved the performance of the algorithm and alleviated local optimum issues. The numerical experiments demonstrated that the proposed formation control algorithm is capable of accomplishing tasks, such as formation maintenance, tracking of reference trajectories, formation transformation and obstacle avoidance. This indicates that the proposed algorithm has a high level of accuracy and robustness to cope with situations such as random initial positions and node failures. This paper also discussed the parametric definition of the formation geometry as a background supplement to the field.

Due to the characteristics of constrained co-evolution, the algorithm is expected to be applied to predict the amount of available renewable energy by environmental indicators, such as gas emissions or wind, estimate the water absorption by crops in controlled agriculture and even provide new ways to solve quantum many-body problems [49,50,51]. Future works are recommended to investigate online evolutionary algorithms to solve the problem of unknown or stochastic system models and to apply the proposed algorithm to real-world experiments.

Author Contributions

Conceptualization, J.C. and X.L.; methodology, J.C.; software, J.C. and J.L.; formal analysis, J.C., X.L. and J.L.; writing—original draft preparation, J.C., X.L. and Y.Z.; writing—review and editing, J.C., X.L., Y.Z. and J.L.; supervision, X.L. and Y.Z.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Guangzhou Science and Technology Project (Grant No. 202102010403).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lim, H.; Yeonsik, K.; Kim, J.; Kim, C. Formation Control of Leader Following Unmanned Ground Vehicles Using Nonlinear Model Predictive Control. In Proceedings of the 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics, Singapore, 14–17 July 2009; pp. 945–950. [Google Scholar] [CrossRef]
Shi, P.; Yan, B. A Survey on Intelligent Control for Multiagent Systems. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 161–175. [Google Scholar] [CrossRef]
Reynolds, C.W. Flocks, Herds, and Schools: A Distributed Behavioral Model. In SIGGRAPH ’87: Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques; Association for Computing Machinery: New York, NY, USA, 1987; p. 10. [Google Scholar]
Cui, N.; Wei, C.; Guo, J.; Zhao, B. Research on Missile Formation Control System. In Proceedings of the 2009 International Conference on Mechatronics and Automation, Changchun, China, 9–12 August 2009; pp. 4197–4202. [Google Scholar] [CrossRef]
Ren, W.; Cao, Y. Distributed Coordination of Multi-Agent Networks; Communications and Control Engineering; Springer: London, UK, 2011. [Google Scholar] [CrossRef]
Oh, K.K.; Park, M.C.; Ahn, H.S. A Survey of Multi-Agent Formation Control. Automatica 2015, 53, 424–440. [Google Scholar] [CrossRef]
Marshall, J.; Broucke, M.; Francis, B. Formations of Vehicles in Cyclic Pursuit. IEEE Trans. Autom. Control 2004, 49, 1963–1974. [Google Scholar] [CrossRef]
Asaamoning, G.; Mendes, P.; Rosário, D.; Cerqueira, E. Drone Swarms as Networked Control Systems by Integration of Networking and Computing. Sensors 2021, 21, 2642. [Google Scholar] [CrossRef]
Shrit, O.; Martin, S.; Alagha, K.; Pujolle, G. A New Approach to Realize Drone Swarm Using Ad-Hoc Network. In Proceedings of the 2017 16th Annual Mediterranean Ad Hoc Networking Workshop (Med-Hoc-Net), Budva, Montenegro, 28–30 June 2017; pp. 1–5. [Google Scholar] [CrossRef]
Chen, W.; Liu, J.; Guo, H.; Kato, N. Toward Robust and Intelligent Drone Swarm: Challenges and Future Directions. IEEE Netw. 2020, 34, 278–283. [Google Scholar] [CrossRef]
Slotine, J.J.E.; Li, W. Applied Nonlinear Control; Prentice Hall: Englewood Cliffs, NJ, USA, 1991. [Google Scholar]
Wu, Y. A Survey on Population-Based Meta-Heuristic Algorithms for Motion Planning of Aircraft. Swarm Evol. Comput. 2021, 62, 100844. [Google Scholar] [CrossRef]
Liu, S.; Huang, F.; Yan, B.; Zhang, T.; Liu, R.; Liu, W. Optimal Design of Multimissile Formation Based on an Adaptive SA-PSO Algorithm. Aerospace 2021, 9, 21. [Google Scholar] [CrossRef]
Lee, S.M.; Kim, H.; Myung, H.; Yao, X. Cooperative Coevolutionary Algorithm-Based Model Predictive Control Guaranteeing Stability of Multirobot Formation. IEEE Trans. Control Syst. Technol. 2015, 23, 37–51. [Google Scholar] [CrossRef]
Pessin, G.; Osório, F.; Hata, A.Y.; Wolf, D.F. Intelligent Control and Evolutionary Strategies Applied to Multirobotic Systems. In Proceedings of the 2010 IEEE International Conference on Industrial Technology, Via del Mar, Chile, 14–17 March 2010; pp. 1427–1432. [Google Scholar] [CrossRef]
Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [Google Scholar] [CrossRef] [Green Version]
Li, X.; Zhu, D. An Adaptive SOM Neural Network Method for Distributed Formation Control of a Group of AUVs. IEEE Trans. Ind. Electron. 2018, 65, 8260–8270. [Google Scholar] [CrossRef]
Barreto, G.; Araujo, A. Identification and Control of Dynamical Systems Using the Self-Organizing Map. IEEE Trans. Neural Networks 2004, 15, 1244–1259. [Google Scholar] [CrossRef]
Zhang, H.; Jiang, H.; Luo, Y.; Xiao, G. Data-Driven Optimal Consensus Control for Discrete-Time Multi-Agent Systems With Unknown Dynamics Using Reinforcement Learning Method. IEEE Trans. Ind. Electron. 2017, 64, 4091–4100. [Google Scholar] [CrossRef]
Tanaka, T.; Moriya, T.; Shinozaki, T.; Watanabe, S.; Hori, T.; Duh, K. Automated Structure Discovery and Parameter Tuning of Neural Network Language Model Based on Evolution Strategy. In Proceedings of the 2016 IEEE Spoken Language Technology Workshop (SLT), San Diego, CA, USA, 13–16 December 2016; pp. 665–671. [Google Scholar] [CrossRef]
Vidnerová, P.; Neruda, R. Evolution Strategies for Deep Neural Network Models Design. Ceur Workshop Proc. 2017, 1885, 159–166. [Google Scholar]
Wang, X.; Yadav, V.; Balakrishnan, S.N. Cooperative UAV Formation Flying With Obstacle/Collision Avoidance. IEEE Trans. Control Syst. Technol. 2007, 15, 672–679. [Google Scholar] [CrossRef]
Vasile, M.; Minisci, E.; Locatelli, M. Analysis of Some Global Optimization Algorithms for Space Trajectory Design. J. Spacecr. Rocket. 2010, 47, 334–344. [Google Scholar] [CrossRef]
Hughes, E. Multi-Objective Evolutionary Guidance for Swarms. In Proceedings of the 2002 Congress on Evolutionary Computation, CEC’02 (Cat. No.02TH8600). Honolulu, HI, USA, 12–17 May 2002; Volume 2, pp. 1127–1132. [Google Scholar] [CrossRef]
Lan, X.; Wu, Z.; Xu, W.; Liu, G. Adaptive-Neural-Network-Based Shape Control for a Swarm of Robots. Complexity 2018, 2018, 8382702. [Google Scholar] [CrossRef]
Fei, Y.; Shi, P.; Lim, C.C. Neural Network Adaptive Dynamic Sliding Mode Formation Control of Multi-Agent Systems. Int. J. Syst. Sci. 2020, 51, 2025–2040. [Google Scholar] [CrossRef]
Ni, J.; Shi, P. Adaptive Neural Network Fixed-Time Leader–Follower Consensus for Multiagent Systems With Constraints and Disturbances. IEEE Trans. Cybern. 2021, 51, 1835–1848. [Google Scholar] [CrossRef]
Yang, S.; Bai, W.; Li, T.; Shi, Q.; Yang, Y.; Wu, Y.; Chen, C.L.P. Neural-Network-Based Formation Control with Collision, Obstacle Avoidance and Connectivity Maintenance for a Class of Second-Order Nonlinear Multi-Agent Systems. Neurocomputing 2021, 439, 243–255. [Google Scholar] [CrossRef]
Lan, X.; Liu, Y.; Zhao, Z. Cooperative Control for Swarming Systems Based on Reinforcement Learning in Unknown Dynamic Environment. Neurocomputing 2020, 410, 410–418. [Google Scholar] [CrossRef]
Chen, J.; Lan, X.; Zhao, Z.; Zou, T. Cooperative Guidance of Multiple Missiles: A Hybrid Co-Evolutionary Approach. arXiv 2022, arXiv:2208.07156. [Google Scholar]
Xingguang, X.; Zhenyan, W.; Zhang, R.; Shusheng, L. Time-Varying Fault-Tolerant Formation Tracking Based Cooperative Control and Guidance for Multiple Cruise Missile Systems under Actuator Failures and Directed Topologies. J. Syst. Eng. Electron. 2019, 30, 587–600. [Google Scholar] [CrossRef]
Wei, C.; Shen, Y.; Ma, X.; Guo, J.; Cui, N. Optimal Formation Keeping Control in Missile Cooperative Engagement. Aircr. Eng. Aerosp. Technol. 2012, 84, 376–389. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, K.; Han, Z. A Novel Cooperative Control System of Multi-Missile Formation Under Uncontrollable Speed. IEEE Access 2021, 9, 9753–9770. [Google Scholar] [CrossRef]
Aicardi, M.; Casalino, G.; Bicchi, A.; Balestrino, A. Closed Loop Steering of Unicycle like Vehicles via Lyapunov Techniques. IEEE Robot. Autom. Mag. 1995, 2, 27–35. [Google Scholar] [CrossRef]
Dinesh, K.; Vijaychandra, J.; SeshaSai, B.; Vedaprakash, K.; Srinivasa, R.K. A Review on Cascaded Linear Quadratic Regulator Control of Roll Autopilot Missile. 2021. Available online: https://doi.org/10.2139/ssrn.3768344 (accessed on 1 January 2020).
Ren, W. Consensus Strategies for Cooperative Control of Vehicle Formations. IET Control Theory Appl. 2007, 1, 505–512. [Google Scholar] [CrossRef]
Das, A.; Fierro, R.; Kumar, V.; Ostrowski, J.; Spletzer, J.; Taylor, C. A Vision-Based Formation Control Framework. IEEE Trans. Robot. Autom. 2002, 18, 813–825. [Google Scholar] [CrossRef] [Green Version]
Lewis, M.A.; Tan, K.H. High Precision Formation Control of Mobile Robots Using Virtual Structures. Auton. Robot. 1997, 4, 387–403. [Google Scholar] [CrossRef]
Sefrioui, M.; Perlaux, J. Nash Genetic Algorithms: Examples and Applications. In Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512), La Jolla, CA, USA, 16–19 July 2000; Volume 1, pp. 509–516. [Google Scholar] [CrossRef]
Nishida, K.; Akimoto, Y. PSA-CMA-ES: CMA-ES with Population Size Adaptation. In Proceedings of the Genetic and Evolutionary Computation Conference, Kyoto, Japan, 15–19 July 2018; pp. 865–872. [Google Scholar] [CrossRef]
Nomura, M.; Ono, I. Towards a Principled Learning Rate Adaptation for Natural Evolution Strategies. In Applications of Evolutionary Computation, Proceedings of the 25th European Conference, EvoApplications 2022, Held as Part of EvoStar 2022, Madrid, Spain, 20–22 April 2022; Springer: Cham, Switzerland, 2022. [Google Scholar]
Glasmachers, T.; Schaul, T.; Yi, S.; Wierstra, D.; Schmidhuber, J. Exponential Natural Evolution Strategies. In Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation (GECCO ’10), Portland, OR, USA, 7–11 July 2010; p. 393. [Google Scholar] [CrossRef] [Green Version]
Desai, J.; Ostrowski, J.; Kumar, V. Controlling Formations of Multiple Mobile Robots. In Proceedings of the 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146), Leuven, Belgium, 20 May 1998; Volume 4, pp. 2864–2869. [Google Scholar] [CrossRef]
Barabási, A.L. Network Science Network Robustness. p. 54. Available online: http://networksciencebook.com/chapter/8 (accessed on 1 January 2020).
Lin, Z.; Francis, B.; Maggiore, M. State Agreement for Continuous-Time Coupled Nonlinear Systems. SIAM J. Control Optim. 2007, 46, 288–307. [Google Scholar] [CrossRef]
Achiam, J.; Held, D.; Tamar, A.; Abbeel, P. Constrained Policy Optimization. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 22–31. [Google Scholar]
Ray, A.; Achiam, J.; Amodei, D. Benchmarking Safe Exploration in Deep Reinforcement Learning. arXiv 2019, arXiv:1910.01708. [Google Scholar]
Coumans, E.; Bai, Y. Pybullet, a Python Module for Physics Simulation for Games, Robotics and Machine Learning. 2016–2019. Available online: http://pybullet.org (accessed on 1 January 2020).
Mason, K.; Duggan, J.; Howley, E. Forecasting Energy Demand, Wind Generation and Carbon Dioxide Emissions in Ireland Using Evolutionary Neural Networks. Energy 2018, 155, 705–720. [Google Scholar] [CrossRef]
Concepcion II, R.; Lauguico, S.; Almero, V.J.; Dadios, E.; Bandala, A.; Sybingco, E. Lettuce Leaf Water Stress Estimation Based on Thermo-Visible Signatures Using Recurrent Neural Network Optimized by Evolutionary Strategy. In Proceedings of the 2020 IEEE 8th R10 Humanitarian Technology Conference (R10-HTC), Kuching, Malaysia, 1–3 December 2020; pp. 1–6. [Google Scholar] [CrossRef]
Chen, A.; Choo, K.; Astrakhantsev, N.; Neupert, T. Neural Network Evolution Strategy for Solving Quantum Sign Structures. Phys. Rev. Res. 2022, 4, L022026. [Google Scholar] [CrossRef]

Figure 1. Engagement scenario.

Figure 2. Two types of formation patterns.

Figure 3. Neural network controller schematic.

Figure 4. Schematic diagram of topology network with five nodes: (a,b) the adaptive topology networks and (c) the traditional leader-follower topology.

Figure 5. Trajectories of basic formation control along linear trajectory.

Figure 6. Trajectories of basic formation control along spiral trajectory.

Figure 7. Analytical results of the linear trajectory case.

Figure 8. Analytical results of the spiral trajectory case.

Figure 9. Trajectories in case of moving into formation.

Figure 10. Resultant error curves for case of moving into formation.

Figure 11. Trajectories for case of switching formation type.

Figure 12. Trajectories for case of switching formation size.

Figure 13. Trajectories of the node failure case.

Figure 14. Resultant error curves of the node failure case.

Table 1. Hyperparameters of the proposed algorithm.

Symbol	Description	Value
$η_{α}$	Learning rate	0.02
$τ$	Time step	0.1
$σ$	Standard deviation	0.2
$β$	Population size adaptation factor	0.84
$K_{c}$	Cost weight matrix	${[0.15, 0.15, 0.1]}^{T}$

Table 2. System constraints.

Symbol	Description	Value
$V_{m a x}$	Maximum speed of both missile and reference target	0.8 km/s
$V_{m i n}$	Minimum speed of both missile and reference target	0.3 km/s
$a_{l m a x}$	Maximum lateral acceleration	40 g
$a_{v m a x}$	Maximum speed acceleration	30 g

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, J.; Lan, X.; Zhou, Y.; Liang, J. Formation Control with Connectivity Assurance for Missile Swarms by a Natural Co-Evolutionary Strategy. Mathematics 2022, 10, 4244. https://doi.org/10.3390/math10224244

AMA Style

Chen J, Lan X, Zhou Y, Liang J. Formation Control with Connectivity Assurance for Missile Swarms by a Natural Co-Evolutionary Strategy. Mathematics. 2022; 10(22):4244. https://doi.org/10.3390/math10224244

Chicago/Turabian Style

Chen, Junda, Xuejing Lan, Ye Zhou, and Jiaqiao Liang. 2022. "Formation Control with Connectivity Assurance for Missile Swarms by a Natural Co-Evolutionary Strategy" Mathematics 10, no. 22: 4244. https://doi.org/10.3390/math10224244

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Formation Control with Connectivity Assurance for Missile Swarms by a Natural Co-Evolutionary Strategy

Abstract

1. Introduction

2. Preliminaries and Problem Formulation

2.1. System Modeling of a Swarm of Cruise Missiles

2.2. Formation Control under Displacement-Based Framework

3. Applying Natural Co-Evolutionary Strategy to Formation Control via Neural Networks

3.1. Natural Co-Evolutionary Strategy for MASs

3.2. Distributed Co-Evolutionary Strategy Optimizing a Neural Network Controller

3.3. Population Adaptation Technique

3.4. Cluster-Based Adaptive Topology

3.5. Model-Based Constrained Policy

4. Simulation and Result Analysis

4.1. Basic Formation Control

4.2. Moving into Formation

4.3. Switching Formations

4.4. Formation Control under Node Failure

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI