**Intelligent Autonomous Decision-Making and Cooperative Control Technology of High-Speed Vehicle Swarms**

Editors

**Dong Zhang Wei Huang**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Dong Zhang Northwestern Polytechnical University China

Wei Huang National University of Defense Technology China

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Applied Sciences* (ISSN 2076-3417) (available at: https://www.mdpi.com/journal/applsci/special issues/Cooperative Control Technology High Speed Vehicle Swarms).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-4231-7 (Hbk) ISBN 978-3-0365-4232-4 (PDF)**

© 2022 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**


## **About the Editors**

#### **Dong Zhang**

He is currently an associate professor and doctoral supervisor of Northwestern Polytechnical University. He is also a visiting professor of the Key Laboratory of Complex System Control and Intelligent Collaborative Technology. His current research interests include the dynamic modeling and control of aerospace vehicles, high-speed aircraft swarms combat mission planning, and collaborative guidance and control. He has presided over more than 10 national projects, published more than 30 academic papers and 1 academic monograph, and won the second prize for military-scientific technological progress in 2020.

#### **Wei Huang**

He is currently a professor at the National University of Defense Technology. His current research interests include wind turbines, vortex, hypersonics, drag, vorticity, numerical simulation, flow, aerodynamics, aircraft, and drag reduction.

## *Editorial* **Special Issue on "Intelligent Autonomous Decision-Making and Cooperative Control Technology of High-Speed Vehicle Swarms"**

**Dong Zhang 1,\* and Wei Huang <sup>2</sup>**


Swarm intelligence technology is a high and new technology combining unmanned system technology, network information technology and artificial intelligence technology, which has become a research hotspot. The characteristics of high-speed and high dynamic flight, large airspace flight environment and multi-field coupling battlefields under strong confrontation bring great challenges to the cooperative operation of high-speed vehicle swarms.

Due to the high speed and high dynamic flight characteristics, as well as the large airspace flight environment, the fluid–thermal–structural coupling effect makes it difficult for the traditional UAV swarm technology to be directly applied to the high-speed aircraft swarm system, which brings great challenges to its decision making and planning. Therefore, it is urgent to study new theories and methods for the cooperative decision making and mission planning of high-speed aircraft swarm systems.

This Special Issue, "Intelligent autonomous decision-making and cooperative control technology of high-speed vehicle swarms", has been launched to provide an opportunity for researchers in the area of collaborative decision-making and mission-planning technology for high-speed aircraft swarms to highlight recent developments made in their fields. Eight excellent papers that cover a wide variety of characteristic analysis and parameter optimization method for hypersonic vehicle/fully distributed control/multi-constraints cooperative guidance method/formation control technology aspects were selected for publication in this Special Issue [1–7]. These eights papers have been summarized as follows:


**Citation:** Zhang, D.; Huang, W. Special Issue on "Intelligent Autonomous Decision-Making and Cooperative Control Technology of High-Speed Vehicle Swarms". *Appl. Sci.* **2022**, *12*, 4409. https://doi.org/ 10.3390/app12094409

Received: 14 April 2022 Accepted: 15 April 2022 Published: 27 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

scenarios, improve the adaptiveness of the swarm under the sudden circumstance, and realize the optimization of the task execution efficiency of the UAV swarm.


**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Studies on Multi-Constraints Cooperative Guidance Method Based on Distributed MPC for Multi-Missiles**

**Mingyu Cong 1,2,3, Xianghong Cheng 1,2,\*, Zhiquan Zhao <sup>3</sup> and Zhijun Li <sup>3</sup>**


**Abstract:** Cooperative terminal guidance with impact angle constraint is a key technology to achieve a saturation attack and improve combat effectiveness. The present study envisaged cooperative terminal guidance with impact angle constraint for multiple missiles. In this pursuit, initially, the three-dimensional cooperative terminal guidance law with multiple constraints was studied. The impact time cooperative strategy of virtual leader missile and follower missiles was designed by introducing virtual leader missiles. Subsequently, based on the distributed model prediction control combined with the particle swarm optimization algorithm, a cooperative terminal guidance algorithm was designed for multiple missiles with impact angle constraint that met the guidance accuracy. Finally, the effectiveness of the algorithm was verified using simulation experiments.

**Keywords:** cooperative guidance; model prediction control; multi-missile cooperative control; multiconstraint cooperative guidance

#### **1. Introduction**

Cooperative guidance is one of the key technologies for multiple missiles to attack and intercept large maneuvering targets with a high accuracy [1,2]. The two major types of cooperative guidance technology that have been extensively studied include independent cooperative guidance and distributed cooperative guidance.

Independent cooperative guidance determines the main guidance information by relying on only its own individual information, and there is no interaction of information between the missiles. The attack tasks are completed independently according to the designated guidance law and impact time before the launch. Wu S.T. et al. [3] was the first to propose the guidance problem along with the impact time. After more than ten years of development, many research results have emerged in this field. Based on the combination of the optimal control and time adjustment, Ma G.X. et al. [4] designed a cooperative guidance law with the impact time and angle constraints, and realized the cooperative interception based on the impact angle constraints. Li G.Y. et al. [5] designed the cooperative guidance law for multiple missiles based on the terminal sliding mode control method and the principle of consistency. They obtained the line-of-sight (LOS) angle rate for all the cooperative missiles converging to zero in a finite time, and it was found that the LOS angle converged to the desired angle. Zhou J. et al. [6] proposed a cooperative guidance law based on "leader-follower" architecture and sliding mode control to achieve a cooperative attack with line-of-sight (LOS) angle constraints on moving targets. Based on the impact angle, a guidance law with the impact time based on the sliding mode control was designed [7,8]. During the flight, multiple missiles exchange information with each other, "negotiate" together, and adjust their impact time according

3

**Citation:** Cong, M.; Cheng, X.; Zhao, Z.; Li, Z. Studies on Multi-Constraints Cooperative Guidance Method Based on Distributed MPC for Multi-Missiles. *Appl. Sci.* **2021**, *11*, 10857.

https://doi.org/10.3390/app112210857

Academic Editors: Wei Huang and Dong Zhang

Received: 18 October 2021 Accepted: 8 November 2021 Published: 17 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

to a certain strategy in order to achieve simultaneous arrival and saturation attack. This is called distributed cooperative guidance. Various studies that have been conducted on distributed cooperative guidance includes the time cooperative guidance law based on bias proportional guidance, the time cooperative guidance law based on the leader-follower and the time based on switching logic. Using a weighted average consensus algorithm, a distributed time cooperative guidance law was designed [9,10]. Wang X. et al. [11] designed the guidance law with the impact time based on a bias proportional guidance. Assuming that each missile had the same speed, Tekin R. et al. [12] designed a time cooperative guidance law based on the 'leader-follower' model. Jeon J.S. et al. [13] considered the integration of the time cooperative guidance and control and studied the tracking problem of the nominal projectile distance.

The guidance law with impact time constraints is a key technology in achieving saturation attack, break through anti-missile defense systems, and accomplishing reliable strikes against important strategic targets. Lin D.F. et al. [14] designed a cooperative guidance law with an impact time control and field of view constraints of the seeker for the cooperative guidance of multiple missiles in attacking low-speed targets. Li G.Y. et al. [15] designed the cooperative guidance law for multiple missiles to simultaneously intercept high maneuvering targets and designed the cooperative guidance law for multiple missiles based on the terminal sliding mode control method and the principle of consistency to achieve the LOS angle rate of the cooperative missiles converging to zero in a finite time. In view of the Field of View (FOV) constraints of the missiles, the terminal LOS angle constraints [16–18] were employed in the design of the cooperative guidance law, along with the impact time constraints.

Considering the research on the cooperative guidance algorithm, there are more studies on the guidance law with impact angle or time constraints [19–26], and relatively fewer studies on combining the two as terminal constraints [27–31]. Impact time constraints are the basis of the cooperative guidance algorithm and are indispensable. Impact angle constraints can make the missile hit the target with the best attitude in order to maximize the effectiveness of the warhead to achieve a maximum damage. Hence, considering both impact time and impact angle constraints in the cooperative terminal guidance algorithm is of great significance for improving the effective damage and combat capability of the cruise missile weapon system.

Model prediction control (MPC), also known as receding horizon control (RHC), has an online rolling optimization at the core, which is essentially an optimal control problem in the finite time domain. At each optimal control moment, the current optimal control domain can be obtained, and the first part of the domain is used as the optimal control to act on the system. The optimal control sequence is obtained after multiple sampling. MPC is a model-based algorithm that has been widely used in industry, metallurgy, etc. [32]. In recent years, MPC has been continuously improved, and a few improved methods have been for proposed for MPC [33–36], which achieved better application prospects and obtained better prediction results.

In the present study, we proposed a three-dimensional cooperative terminal guidance law that met the constraints of impact angle and impact time simultaneously, on the premise of meeting the requirements of miss distance. First, the nonlinear motion model of the missile and the target was established, the state quantity and the control quantity were normalized, the concept of leader-follower missiles was introduced, and the impact time cooperative strategy of leader and follower missiles was designed. Subsequently, based on the model predictive control method, a three-dimensional cooperative terminal guidance law was proposed that simultaneously controlled multiple missiles to perform attacks on the target with the constraints of impact time and impact angle on the premise of meeting the requirements of miss distance. Finally, by introducing one leader missile and four follower missiles to simulate the multi-missile cooperative attack against stationary targets, linear moving targets, and snake-shaped maneuvering targets, the effectiveness of the multi-constraint cooperative guidance law algorithm was verified.

.

#### **2. Description of Cooperative 3D Terminal Guidance for Multiple Missiles**

The relative motion relationship between the missile and the target is shown in Figure 1. Herein, M and T represent the missile and the target, respectively. The ground inertial coordinate system is expressed in the *A-XYZ* plane. This is the relative distance between the missile and the target, is the line-of-sight elevation angle and the line-of-sight azimuth angle, represents the trajectory inclination angle and the trajectory deflection angle of the missile, and represents the trajectory inclination angle and the trajectory deflection angle of the target. Unless otherwise specified, all angles were positive counterclockwise.

**Figure 1.** Guidance geometric relationship.

According to the relative geometric relationship between the missile and the target in Figure 1, the relative motion can be obtained as:

⎧ ⎪⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎪⎩ d*L dt* <sup>=</sup> <sup>d</sup> <sup>d</sup>*t*(*riL*) <sup>=</sup> *<sup>V</sup>Ti<sup>T</sup>* <sup>−</sup> *VMi<sup>M</sup>* <sup>=</sup> . *ri<sup>L</sup>* + **Ω***<sup>L</sup>* × *ri<sup>L</sup> A<sup>T</sup>* = *Aytj<sup>T</sup>* + *Aztk<sup>T</sup>* = **Ω***<sup>L</sup>* × *V<sup>T</sup>* + **Ω***<sup>T</sup>* × *V<sup>T</sup> A<sup>M</sup>* = *Aymj<sup>M</sup>* + *Azmk<sup>M</sup>* = **Ω***<sup>L</sup>* × *V<sup>M</sup>* + **Ω***<sup>M</sup>* × *V<sup>M</sup>* **<sup>Ω</sup>***<sup>L</sup>* <sup>=</sup> . *<sup>ψ</sup><sup>L</sup>* sin *<sup>θ</sup>Li<sup>L</sup>* <sup>−</sup> . *<sup>θ</sup>Lj<sup>L</sup>* <sup>+</sup> . *<sup>ψ</sup><sup>L</sup>* cos *<sup>θ</sup>Lk<sup>L</sup>* <sup>=</sup> . *<sup>λ</sup>xi<sup>L</sup>* <sup>+</sup> . *<sup>λ</sup>yj<sup>L</sup>* <sup>+</sup> . *λzk<sup>L</sup>* **<sup>Ω</sup>***<sup>M</sup>* <sup>=</sup> . *<sup>ψ</sup><sup>m</sup>* sin *<sup>θ</sup>mi<sup>M</sup>* <sup>−</sup> . *<sup>θ</sup>mj<sup>M</sup>* <sup>+</sup> . *ψ<sup>m</sup>* cos *θmk<sup>M</sup>* **<sup>Ω</sup>***<sup>T</sup>* <sup>=</sup> . *<sup>ψ</sup><sup>t</sup>* sin *<sup>θ</sup>ti<sup>T</sup>* <sup>−</sup> . *<sup>θ</sup>tj<sup>T</sup>* <sup>+</sup> . *ψ<sup>t</sup>* cos *θtk<sup>T</sup>* (1)

where *V<sup>T</sup>* and *V<sup>M</sup>* are the velocity vectors of the target and the missile, respectively; *Azm* and *Aym* are the normal accelerations of the turning plane and the dive plane of the missile, respectively; *Azt* and *Ayt* are the normal accelerations of the target turning plane and the dive plane, respectively; **Ω***<sup>L</sup>* is the rotation angular velocity vector of the line-of-sight coordinate system; **Ω***<sup>T</sup>* is the rotation angular velocity vector of the target relative to the line of sight coordinate system; and **Ω***<sup>M</sup>* is the rotation angular velocity vector of the missile relative to the line of sight coordinate system.

From Equation (1), the following differential equations can be obtained:

$$\begin{cases} \dot{r} = V\_m \cdot (\rho \cos \theta\_l \cos \psi\_l - \cos \theta\_m \cos \psi\_m) \\ r \dot{\lambda}\_y = V\_m \cdot (\sin \theta\_m - \rho \sin \theta\_l) \\ r \dot{\lambda}\_z = V\_m \cdot (\rho \cos \theta\_l \sin \psi\_l - \cos \theta\_m \sin \psi\_m) \\ \dot{\theta}\_m = \frac{A\_m}{V\_m} + \frac{1}{r} V\_m \tan \lambda\_y \sin \psi\_m (\rho \cos \theta\_l \sin \psi\_l - \cos \theta\_m \sin \psi\_m) - \frac{1}{r} V\_m \cos \psi\_m (\rho \sin \theta\_l - \sin \theta\_m) \\ \dot{\psi}\_m = \frac{A\_m}{V\_m \cos \theta\_m} - \frac{1}{\cos \theta\_m} V\_m \sin \theta\_m \cos \psi\_m \times \tan \lambda\_y (\rho \cos \theta\_l \sin \psi\_l - \cos \theta\_m \sin \psi\_m) \dots \\ - \frac{1}{r \cos \theta\_m} V\_m \sin \theta\_m \sin \psi\_m (\rho \sin \theta\_l - \sin \theta\_m) - \frac{1}{r} V\_m (\rho \cos \theta\_l \sin \psi\_l - \cos \theta\_m \sin \psi\_m) \\ \dot{\theta}\_l = \frac{A\_m}{V\_l} + \frac{1}{r} V\_m \sin \psi\_l \tan \lambda\_y (\rho \cos \theta\_l \sin \psi\_l - \cos \theta\_m \sin \psi\_m) - \frac{1}{r} V\_m \cos \psi\_l (\rho \sin \theta\_l - \sin \theta\_m) \end{cases} (2)$$

where *ρ* = *Vt*/*Vm* is the speed ratio of the target relative to the missile; *A<sup>M</sup>* and *A<sup>T</sup>* are the acceleration vectors of the missile and the target, respectively; and *Azmd* and *Aymd* are the acceleration commands for the longitudinal and lateral channels of the missile, respectively.

#### **3. Strategy for Timed Cooperative Guidance of Multiple Missiles**

For cooperative guidance, one of the important features is the ability wherein the multiple missiles can hit the target simultaneously. In order to achieve this goal, the present study employed the concept of virtual leader missile. According to the selection principle, one of the missiles was selected as the virtual leader missile, and the traditional guidance scheme was adopted to make it fly towards the target, while the remaining missiles were regarded as follower missiles. In the dive plane, the follower missiles used the same guidance algorithm as the virtual leader missile. In the turning plane, the follower missiles realized the time cooperation through lateral maneuvering (which was mainly achieved by controlling the relative distance between the missile and the target).

#### *3.1. Movement Law of the Virtual Leader Missile*

The relative motion relationship of the leader-follower missiles can be seen form Figure 2. Considering that Missles Leader (ML) is the leader missile, Mi is the ith follower missile, and T is the target, the velocity of the leader missile is *Vm*0, the trajectory inclination angle and the trajectory deflection angle are *θ*<sup>0</sup> and *ψ*0, respectively; the vertical and azimuth LOS angle are *θ*0L and *ψ*0L, respectively; the velocity of the ith follower missile is *Vmi*; the trajectory inclination angle and the trajectory deflection angle are *θ<sup>i</sup>* and *ψi*, respectively; and the vertical and azimuth LOS angles are *θ*0i and *ψ*0i, respectively.

**Figure 2.** Relative motion relationship of the leader-follower missiles.

In the later stage of the missile movement, the guidance of the missiles makes *θ*<sup>0</sup> and *θ<sup>i</sup>* tend to zero. At this time, the curvature of the trajectory curve of the leader missile and the follower missiles tends to be consistent, that is, the relationships *r*0/*Vm*<sup>0</sup> =*ri*/*Vmi* and *ψ*<sup>0</sup> = *ψ<sup>i</sup>* (or *r*0/*Vm*<sup>0</sup> =*ri*/*Vmi* and *ψ*<sup>0</sup> = −*ψi*) are satisfied, and the leader missile and follower missiles reach and hit the target simultaneously. The important aspect is that this strategy can make a same curvature of the leader missile and follower missiles relative to the velocity. Under this condition, the flight of the missile consumes the same time. Hence, the leader-follower strategy can also be suitable for different missile speeds. The leader missile adopts an augmented proportional guidance method as:

$$A\_{z0} = K\_{01} |\dot{r}\_0| \dot{\theta}\_{L0} |\,/\cos\theta\_0 \tag{3}$$

$$A\_{\mathcal{Y}0} = -\frac{K\_{01}|r\_0|\dot{\theta}\_{L0}\sin\theta\_0\sin\psi\_0}{\cos\theta\_0\cos\psi\_0} - \frac{K\_{02}|r\_0|\dot{\psi}\_{L0}\cos\theta\_{L0}}{\cos\psi\_0} \tag{4}$$

In order to quickly make the attitude angle *θ*<sup>0</sup> of the leader missile tend towards zero, a larger value of *K*<sup>01</sup> needs to be chosen. The follower missiles also adopt an augmented proportional guidance method on the pitch channel:

$$A\_{\rm zi} = K\_{i1}|\dot{r\_i}|\dot{\theta\_{Li}}| / \cos \theta\_i \tag{5}$$

In order to quickly make the attitude angle *θ<sup>i</sup>* of the leader missile tend towards zero, a larger value of *Ki*<sup>1</sup> needs to be chosen. Next, according to the position relationship between the leader missile and the follower missiles, the lateral acceleration *Ayi* of the leader missile is designed so that *ri*/*Vmi* → *r*0/*Vm*<sup>0</sup> and *ψ<sup>i</sup>* → ±*ψ*<sup>0</sup> are satisfied during the guidance process.

#### *3.2. Cooperative Strategy of Follower Missiles*

The remaining time error between the leader missile and follower missiles is defined as follows:

$$
\xi^x = \frac{r\_0}{V\_{m0}} - \frac{r\_i}{V\_{mi}} \tag{6}
$$

By deriving the remaining time error with respect to time, we can get:

·

$$\dot{\xi}^i = \cos\theta\_i \cos\psi\_i - \cos\theta\_0 \cos\psi\_0 \tag{7}$$

According to Equation (7), *Ayi* indirectly controls *ξ* by controlling *ψi*. Therefore, the control system can be divided into two sub-control systems according to direct control *ψ<sup>i</sup>* and indirect control *ξ*, namely, a nonlinear slow sub-system and a nonlinear fast sub-system. Therefore, the dynamic design method of time-scale separation is used to design the control instructions *Ayi*. The desired dynamics of the slow sub-system is expressed as:

$$
\dot{\xi}\_{dcs} = -K\_R \xi \tag{8}
$$

where *KR* is the bandwidth of the slow sub-system.

If · *<sup>ξ</sup>* <sup>=</sup> · *ξdes* is obtained through the control, considering the command value of *ψ<sup>i</sup>* is *ψi <sup>c</sup>*, then:

$$\cos\psi\_i^{\circledast} = \frac{\cos\theta\_0 \cos\psi\_0 - K\_R \zeta}{\cos\theta\_i} \tag{9}$$

Because the value range of *ψ<sup>i</sup>* is (−*π*/2, *π*/2), *ψ<sup>i</sup> <sup>c</sup>* is within the value range. Combining the above equation and the value range, the value range of KR can be obtained:

$$\frac{\cos \theta\_0 \cos \psi\_0 - \cos \theta\_i}{\xi} \le K\_R \le \frac{\cos \theta\_0 \cos \psi\_0}{\xi} \tag{10}$$

if

$$K\_R = \frac{\cos \theta\_0 \cos \psi\_0 - \cos \theta\_i + c\_1 \frac{\frac{\pi}{6}}{\frac{\pi}{6} + c\_2} \cos \theta\_i}{\frac{\pi}{6}} \tag{11}$$

where *c*1, *c*<sup>2</sup> are constants and satisfy the relationship 0 < *c*1, *c*2< 1.

Herein, *ψ<sup>i</sup> <sup>c</sup>* can be expressed as:

$$\psi\_i^{\varepsilon} = f\_{\psi} \times \arccos\left(\frac{\cos\theta\_0 \cos\psi\_0 - K\_R \xi}{\cos\theta\_i}\right) \tag{12}$$

where *ψ<sup>i</sup>* ∈ [−*π*/2, 0), *f<sup>ψ</sup>* = −1; *ψ<sup>i</sup>* ∈ [0, *π*/2], *f<sup>ψ</sup>* = 1.

Taking the derivative of Equation (12) with respect to time, we have:

$$\dot{\Psi}\_i^c = f\_{\dot{\Psi}} \times \left( \frac{K\_R \cos \psi\_i - \frac{\cos \psi\_i^c \sin \theta\_i}{\cos \theta\_i} -}{\frac{K\_R \cos \theta\_0 \cos \psi\_0 - \dot{\theta}\_0 \sin \theta\_0 \cos \psi\_0 - \dot{\psi}\_0 \cos \theta\_0 \sin \psi\_0}} \right) \tag{13}$$

$$\text{If } \psi\_i \in [-\pi/2, 0)\_\prime \\
f\_{\hat{\psi}} = -1 ; \forall\_i \in [0, \pi/2]\_\prime \\
f\_{\hat{\psi}} = 1 .$$

In order to make *ξ* satisfy the relation in Equation (8), *ψ<sup>i</sup>* should quickly converge to near its command value *ψ<sup>c</sup> <sup>i</sup>* . Therefore, the following fast dynamics sub-system is designed, and its specific expression is given as:

$$
\dot{\psi}\_{i,dcs} - \dot{\psi}\_i^c = -K\_{\psi} (\psi\_i - \psi\_i^c) \tag{14}
$$

where *<sup>K</sup><sup>ψ</sup>* is the bandwidth of the fast sub-system. Under the control . *<sup>ψ</sup><sup>i</sup>* <sup>=</sup> . *ψi*,*des* united with Equation (4), the final designed maneuvering control command in the turning plane can be obtained as:

$$A\_{yi} = \frac{V\_{\text{mi}}^2}{r\_i} \sin \psi\_i (1 - \cos \theta\_i \sin \theta\_i \cos \psi\_i \tan \theta\_{Li}) - V\_{\text{mi}} \cos \theta\_i \dot{\psi}\_i^\varepsilon + K\_\Psi V\_{\text{mi}} \cos \theta\_i (\psi\_i - \psi\_i^\varepsilon) \tag{15}$$

Equation (14) can be satisfied by maneuvering control commands on the lateral plane, so that *<sup>ψ</sup><sup>i</sup>* → *<sup>ψ</sup><sup>c</sup> <sup>i</sup>* is obtained; *<sup>ψ</sup><sup>c</sup> <sup>i</sup>* can make the slow dynamics sub-system meet Equation (8). Under this condition, the time cooperative control of multiple missiles can be realized. Longitudinal control commands *Az*<sup>0</sup> and *Azi* can make *θ*<sup>0</sup> → 0 and *θ<sup>i</sup>* → 0, along with the control of the yaw channel, *ξ* → 0, so as to realize *ψ<sup>i</sup>* → *ψ*<sup>0</sup> (or *ψ<sup>i</sup>* → −*ψ*<sup>0</sup> ). The control flow chart for cooperative guidance of multiple missiles can be seen form Figure 3.

**Figure 3.** Control flow chart for cooperative guidance of multiple missiles.

#### **4. Cooperative Prediction Guidance Based on Distributed Model Prediction Control (DMPC) for Multiple Missiles**

#### *4.1. Introduction to DMPC*

The application of the MPC method in a multi-agent cluster system has three structures: centralized, decentralized, and distributed MPC. As shown in Figure 4, the centralized MPC considers the overall performance of the cluster as an optimization index. In each sampling period, a central model predictor performs the rolling optimization on the overall control performance index of the cluster system to obtain the control sequence set with the best overall performance of the cluster system. Each sub-system of the decentralized and distributed MPC has an independent model predictor, which reduces the dimensionality of the system. It also reduces the amount of calculation for the online optimization of each sub-controller and increases the reliability of the system. The difference between the distributed and decentralized MPC is that there is no information interaction between the sub-controllers of the decentralized MPC. As a result, it cannot control the problem of the multi-agent system with coupling between the sub-systems.

**Figure 4.** Classification of MPC: (**a**) Centralized MPC; (**b**) Decentralized MPC; (**c**) Distributed MPC.

From the above-mentioned missile motion model, the motion equation of each missile can be expressed as: .

$$\dot{z}\_l(t) = f\_i(z\_i(t), u\_i(t)), t \ge t\_0,\\ z\_i(0) = z\_0 \tag{16}$$

Therefore, the motion equations of its adjacent missiles can be defined as follows:

$$\dot{z}\_{-l}(t) = f\_{-i}(z\_{-i}(t), u\_{-i}(t)), t \ge t\_0 \tag{17}$$

The prediction time domain is defined as *Tp* <sup>∈</sup> (0, <sup>∞</sup>),*<sup>s</sup>* <sup>∈</sup> *tc*, *tc* + *Tp* . The control moment of the receding horizon control is defined as *tc* = *t*<sup>0</sup> + *δc*, *c* ∈ {0, 1, 2, ···}, where *<sup>δ</sup>* is the control interval and satisfies *<sup>δ</sup>* <sup>∈</sup> 0, *Tp* , *z*ˆ*i*(*s*; *tc*), *u*ˆ*i*(*s*; *tc*) is the estimated state sequence and control sequence, respectively, while *z p <sup>i</sup>* (*s*; *tc*) and *<sup>u</sup><sup>p</sup> <sup>i</sup>* (*s*; *tc*) are the predicted state sequence and control sequence, respectively. For any time, *tc*, the optimal control sequence of the *i*th missile can be expressed as:

$$\boldsymbol{u}\_{i}^{\*}\left(\mathbf{s};t\_{\mathcal{C}}\right) = \operatorname\*{argmin}\_{\boldsymbol{u}\_{i}^{\*}\left(\mathbf{s};t\_{\mathcal{C}}\right)} \mathcal{J}\_{i}\big{(}\boldsymbol{z}\_{i}\big{(}\mathbf{s};t\_{\mathcal{C}}\big{)},\boldsymbol{z}\_{-i}\big{(}\mathbf{s};t\_{\mathcal{C}}\big{)},\boldsymbol{u}\_{i}\big{(}\mathbf{s};t\_{\mathcal{C}}\big{)}\big{)}\tag{18}$$

The cost function can be expressed as:

$$\mathcal{J}\_{\mathbf{i}} = \int\_{t\_{\mathbf{c}}}^{t\_{\mathbf{c}} + T\_p} \mathrm{F}\_{\mathbf{i}} \Big( \mathbf{z}\_{\mathbf{i}}^p(\mathbf{s}; t\_{\mathbf{c}}), \boldsymbol{\varepsilon}\_{-\mathbf{i}}(\mathbf{s}; t\_{\mathbf{c}}), \mathbf{u}\_{\mathbf{i}}^p(\mathbf{s}; t\_{\mathbf{c}}) \Big) ds + \Phi\_{\mathbf{i}} \Big( \mathbf{z}\_{\mathbf{i}}^p(t\_{\mathbf{c}} + T\_p; t\_{\mathbf{c}}) \Big) \tag{19}$$

where J*<sup>i</sup>* is the operation function and Φ is the final state function. The following constraints are satisfied: .

$$\dot{z}\_i^p(\mathbf{s}; t\_c) = f\_i(z\_i^p(\mathbf{s}; t\_c), \boldsymbol{\mu}\_i^p(\mathbf{s}; t\_c)) \tag{20}$$

$$\dot{\mathcal{Z}}\_{-i}(\mathbf{s}; t\_{\mathfrak{c}}) = f\_{-i}(\mathcal{Z}\_{-i}(\mathbf{s}; t\_{\mathfrak{c}}), \mathcal{U}\_{-i}(\mathbf{s}; t\_{\mathfrak{c}})) \tag{21}$$

$$z\_i^p(s; t\_\mathfrak{c}), \sharp\_{-i}(s; t\_\mathfrak{c}) \in Z \tag{22}$$

$$\iota^p u\_i^p(\mathbf{s}; t\_c), \mathfrak{U}\_{-i}(\mathbf{s}; t\_c) \in \mathcal{U} \tag{23}$$

According to the definition of the motion equations of the adjacent missiles, *<sup>s</sup>* <sup>∈</sup> *tc*, *tc* + *Tp* in the prediction time domain and the estimated control sequence *u*ˆ−*i*(*s*; *tc*) can be obtained by:

$$\mathfrak{u}\_{-i}(\mathbf{s}; t\_c) = \{\mathfrak{u}\_j(\mathbf{s}; t\_c)\}, j \in N\_i \tag{24}$$

In the prediction time domain *<sup>s</sup>* <sup>∈</sup> *tc*, *tc* + *Tp* , the estimated control sequence *u*ˆ*j*(*s*; *tc*) consists of two parts. In the time domain *<sup>s</sup>* <sup>∈</sup> *tc*, *tc*−<sup>1</sup> + *Tp* , the first part is the optimal control sequence *u*∗ *<sup>j</sup>* (*s*; *tc*) of the previous prediction time domain *<sup>s</sup>* <sup>∈</sup> *tc*−1, *tc*−<sup>1</sup> + *Tp* , while in the time domain *<sup>s</sup>* <sup>∈</sup> *tc*−<sup>1</sup> + *Tp*, *tc* + *Tp* , the second part is the optimal control sequence derived from *u*∗ *<sup>j</sup>* (*s*; *tc*−1) at the time *s* = *tc*−<sup>1</sup> + *Tp*. The expression of *u*ˆ*j*(*s*; *tc*) is as follows:

$$\mathfrak{u}\_{\dot{\jmath}}(\mathbf{s};\mathbf{t}\_{\varepsilon}) = \begin{cases} \mathfrak{u}\_{\dot{\jmath}}^{\*}(\mathbf{s};\mathbf{t}\_{\varepsilon-1}) & \mathbf{s} \in \left[t\_{\varepsilon}, t\_{\varepsilon-1} + T\_{\mathbf{p}}\right) \\ \mathfrak{u}\_{\dot{\jmath}}^{\*}(\mathbf{t}\_{\varepsilon-1} + T\_{\mathbf{p}}; \mathbf{t}\_{\varepsilon-1}) & \mathbf{s} \in \left[t\_{\varepsilon-1} + T\_{\mathbf{p}}, t\_{\varepsilon} + T\_{\mathbf{p}}\right] \end{cases} \tag{25}$$

#### *4.2. Algorithm for Cooperative Guidance Considering Impact Time and Angle Constraints*

The framework idea of the distributed MPC is as follows: At each control moment *tc*, the current missile control variable is initialized with the optimal control amount of the previous prediction period *tc*−1, *tc*−<sup>1</sup> + *Tp* . After that, it exchanges the information with the adjacent missiles and receives the estimated value of the control variable at the moment *tc*−1. Subsequently, it sends out the estimated value of its own control variable at the moment *tc*−<sup>1</sup> to estimate the state of the adjacent missile in the prediction period *tc*, *tc* + *Tp* and evaluates the cost function <sup>J</sup>*<sup>i</sup>* in the prediction period *tc*, *tc* + *Tp* and uses the particle swarm optimization (PSO) algorithm to solve the optimal control variable sequence in the prediction period *tc*, *tc* + *Tp* . Based on the rolling optimization idea, the optimal control amount of the first update interval is taken to obtain the optimal control sequence, and the missile state at [*tc*, *tc*+1] is updated to enter the next control moment *tc*+1. The solution process of the estimated value is shown in Figure 5.

**Figure 5.** Solution process of the predicted state value.

The basic principle of the PSO-based distributed cooperative model prediction framework for multiple missiles is shown in Figure 6. The cost function of the cooperative

control of missiles includes two parts, namely, the operation function, F, and the final state function, Φ:

$$J\_i = \int\_{t\_\varepsilon}^{t\_\varepsilon + T\_p} \mathcal{F}\_i \left( \mathbf{z}\_i^p(\mathbf{s}; t\_\varepsilon), \sharp\_{-i}(\mathbf{s}; t\_\varepsilon), \boldsymbol{u}\_i^p(\mathbf{s}; t\_\varepsilon) \right) ds + \Phi\_i \left( \boldsymbol{z}\_i^p(t\_\varepsilon + T\_p; t\_\varepsilon) \right) \tag{26}$$

The operation function is defined as:

$$\mathrm{E}\_{i}\Big(\boldsymbol{z}\_{i}^{p}(\mathbf{s};t\_{\mathcal{L}}),\boldsymbol{\uphat{z}}\_{-i}(\mathbf{s};t\_{\mathcal{L}}),\boldsymbol{u}\_{i}^{\hat{p}}(\mathbf{s};t\_{\mathcal{L}})\Big) = \mathrm{a}\sum\_{j\in\mathcal{N}\_{i}}\left\lVert\boldsymbol{t}\_{\mathcal{S}^{\text{adj}}}^{p}(\mathbf{s};t\_{\mathcal{L}})-\boldsymbol{\uphat{f}}\_{\mathcal{S}^{\text{adj}}}(\mathbf{s};t\_{\mathcal{L}})\right\rVert^{2} + (1-\mathrm{a})\left\lVert\boldsymbol{u}\_{i}^{p}(\mathbf{s};t\_{\mathcal{L}})\right\rVert^{2}\tag{27}$$

The final state function is defined as:

$$\Phi\_i\left(z\_i^p\left(t\_\mathcal{\boldsymbol{t}}+T\_{\mathcal{p}};t\_\mathcal{\boldsymbol{t}}\right)\right) = \left\|\left|r\_i\left(t\_\mathcal{\boldsymbol{t}}+T\_{\mathcal{p}};t\_\mathcal{\boldsymbol{t}}\right)\right\|^2 + \gamma\left\|\left|\theta\left(t\_\mathcal{\boldsymbol{t}}+T\_{\mathcal{p}};t\_\mathcal{\boldsymbol{t}}\right) - \theta\_{\text{exp\,ext}} + \psi\left(t\_\mathcal{\boldsymbol{t}}+T\_{\mathcal{p}};t\_\mathcal{\boldsymbol{t}}\right) - \psi\_{\text{exp\,ext}}\right\|\right\|\tag{28}$$

where *α*, *β*, *γ* are the weight constants, ··· represents the modulus of the space vector, *Ni* is the set of missiles other than missile *i*,*ri* is the distance between the missile and the target, *t p go*,*<sup>i</sup>* is the predicted time of arrival, and <sup>ˆ</sup>*tgo*,*<sup>i</sup>* is the estimated time of arrival, which can be solved by the following equation:

$$\hat{\mathfrak{f}}\_{\mathcal{S}^{o,j}}(s; t\_{\varepsilon}) = \frac{\mathfrak{f}(s; t\_{\varepsilon})}{\upsilon\_{j}}, j \in \mathcal{N}\_{i} \tag{29}$$

where *r*ˆ(*s*; *tc*) is the estimated distance between the missile and the target at the moment *tc*, *θ*expect and *ψ*expect represent the desired pitch angle and trajectory deflection angle that are used to control the angle of the missile at the terminal landing point to achieve the impact angle constraints.

**Figure 6.** PSO-DMPC model prediction framework.

#### **5. Verification by Simulations**

Let M1, M2, M3, and M4 denote four missiles participating in cooperative operations. The target is in the horizontal plane, and its initial position is at the origin of the coordinates. Table 1 lists the initial parameters of the proposed virtual leader missile (denoted as M) and follower missiles.


**Table 1.** Initial parameters of the virtual leader and the follower missiles.

Let us consider a case where a stationary target, a straight moving target, and a snake-like maneuvering target is attacked (specific movement of the target is shown in Table 2).

**Table 2.** Initial parameters of the virtual leader missile and follower missiles.


The control coefficients of the augmented proportional guidance method adopted by the five missiles are *K*<sup>01</sup> = *Ki*<sup>1</sup> = 10, *K*<sup>02</sup> = 5; the parameters are *c*<sup>1</sup> = 0.7, *c*<sup>2</sup> = 0.9; the bandwidth of the fast dynamics sub-system is *K<sup>ψ</sup>* = 5; and the overload limit of the missile is 8 g. The impact time during the attack on a fixed target is shown in Table 3. According to the above design strategy, under the premise of adopting the augmented proportional guidance method, the impact time of the leader missile is greater than that of each follower missile. Therefore, the use of this scheme can provide a sufficient flight time adjustment margin for the follower missiles.

**Table 3.** Impact time of APN guidance.


In the adjustable range, in order to enhance the ability of attack and damage on moving targets in a horizontal plane, the ideal impact angles (*θ*∗ *<sup>m</sup>*, *ψ*<sup>∗</sup> *vm*) of M1, M2, M3, and M4 are designated as (50◦ , −20◦ ), (44◦ , −34◦ ), (−30◦ , 0◦ ), and (35◦ , −20◦ ), respectively. The ideal impact time *t* ∗ is determined based on the flight time of the leader missile attacking different targets. The simulation results of cooperative attack by missiles on targets 1, 2, and 3 are shown in Figures 7–11.

**Figure 7.** Flight trajectory curve of the cooperative attack on stationary target 1.

**Figure 8.** Flight trajectory curve of the cooperative attack on target 2.

**Figure 9.** Flight trajectory curve of the cooperative attack on target 3.

**Figure 10.** Projection of flight trajectory curve of cooperative attack on target 3 in the XOZ plane.

**Figure 11.** Projection of flight trajectory curve of cooperative attack on target 3 in the XOY plane.

As shown in Figures 7–9, the cooperative strategy based on the tracking of the missiletarget distance could well constrain the missile-target distance during the flight of multiple projectiles. As a result, the follower missiles converged to the same value as the leader missile. The MPC-based cooperative guidance algorithm optimizes the trajectory to meet the terminal impact angle constraints, based on the impact time of the cooperative missile group. From Figures 10 and 11, it can be observed that the MPC-based cooperative guidance law can make multiple missiles participating in cooperative operations hit the target with separately specified impact angles at a specified time. The miss distance meets the requirements and is less than that of the cooperative time guidance. Additionally, the strike effect is more accurate. Due to the limited adjustable time and angle of the missiles under different position conditions, the omni-directional saturation attack of the multiple missiles can be realized through reasonable state estimation, mission planning, and assignment of multiple terminal guidance missiles. The simulation results verified the effectiveness of this method in attacking the stationary and maneuvering targets. From Figures 7–10, it can be observed that the trajectory curve optimized by the MPC algorithm is smoother and easier to implement.

By tracking the distance between the missile and the target, each follower missile achieved a cooperative impact time with the virtual leader missile. When hitting target 1, target 2, and target 3, the cooperative impact time of the four missiles was the same as the impact time of the virtual lead missile, which was 41.66 s, 41.28 s, and 41.82 s, respectively. However, the impact angle did not meet the requirements at this time, and the iterative optimization of the guidance instructions through the MPC algorithm achieved the impact angle constraints at the terminal. In the simulation example, the impact angle constrained by each follower missile and the terminal impact angle are given by Figures 12–14, when different guidance methods were used to attack an unfixed target.

**Figure 12.** Velocity-inclination change curve of each follower missile during the cooperative attack on target 1 based on the MPC method: (**a**) Trajectory deflection curve; (**b**) Trajectory inclination curve.

**Figure 13.** Velocity-inclination change curve of each follower missiles during cooperative attack on target 2 based on the MPC method: (**a**) Trajectory deflection curve; (**b**) Trajectory inclination curve.

**Figure 14.** Velocity-inclination change curve of each follower missiles during the cooperative attack on target 3 based on the MPC method: (**a**) Trajectory deflection curve; (**b**) Trajectory inclination curve.

Figures 12–14 show the changing curves of the impact angle when attacking a stationary target, a uniformly moving target, and a maneuvering target by the MPC method. Using the MPC method can not only realize the cooperative time but can also satisfy the impact angle constraints. Both the time cooperative guidance and the MPC-based guidance can make four missiles hit the target at the same time, but the former could only constrain the impact time of four missiles, while the latter achieved a multi-directional cooperative attack with designated angles on the target. This verified the superior guidance performance of the MPC-based 3D cooperative guidance law with multiple constraints. Figure 15 shows the tracking curve of the missile-target distance in the time cooperative strategy with the designed parameters. Figure 16 shows the curve of the remaining time error with the time obtained by the MPC method. Tracking the missile-target distance of the leader missile achieves a cooperative impact time between the follower missiles and the leader missile, but it does not meet the impact angle constraints. Therefore, the MPC method was used to iteratively optimize the control commands to meet the impact angle constraints.

**Figure 15.** Tracking of the missile-target distance.

**Figure 16.** Remaining time error of MPC.

The performance of the MPC-based 3D cooperative guidance law and time cooperative guidance law are shown in Figures 17 and 18, respectively. It can be observed that, when the missile flies with the time cooperative guidance law, although the impact time was similar to the flight time of the leader missile, there was still a significant gap between the terminal attack attitude angle and the ideal impact angle. After iterative optimization using the MPC algorithm, both the trajectory inclination *θ*0L and the trajectory deflection angle *ψ*0L achieved the impact angle constraints within the specified time. The MPC algorithm is based on the principle of optimization and was designed according to performance indicators to minimize the total energy in the guidance process under the condition of meeting terminal constraints. In every iteration of the MPC algorithm, the deviation at the terminal of the trajectory was evenly distributed to each step of the entire trajectory by a controlled adjustment. Therefore, the control command exhibited a smoother transition than the initial control command, and the amplitude was smaller, which was easier to implement in engineering.

**Figure 17.** Comparison of the trajectory inclination curves of each follower missile between the proposed guidance scheme and the time cooperative guidance scheme: (**a**) Time cooperative guidance law; (**b**) MPC-based cooperative guidance law.

**Figure 18.** Comparison of the trajectory deflection curves of each follower missile between the proposed guidance scheme and the time cooperative guidance scheme: (**a**) Time cooperative guidance law; (**b**) MPC-based cooperative guidance law.

From Figures 19 and 20, it can be observed that the time cooperative guidance strategy required greater control capabilities, especially at the terminal. In order to achieve the consistency requirements of the time cooperative guidance, a greater overload is required, and the required overload exceeded that which the missile can provide. Therefore, this method has very high requirements for the maneuverability of the missiles. In contrast with the MPC-based cooperative strategy, the control amount can be controlled within a certain range through the strategy of rolling optimization, and the requirements for the maneuverability of the missile are not high. Furthermore, it can not only achieve the time cooperation, but also achieve the requirements of impact angle at the terminal. Thus, it proved to be a better cooperative guidance strategy.

The results demonstrated that the cooperative strategy of tracking the distance between the missile and the target based on the dynamic inverse design was very effective. The MPC algorithm added the impact angle constraints based on the impact time constraints, thereby achieving good effects and a better guidance performance, along with an easy implementation of the instructions.

**Figure 19.** Normal acceleration curves obtained by the time cooperative guidance algorithm: (**a**) Normal acceleration curve of the dive plane; (**b**) Normal acceleration curve of turning plane.

**Figure 20.** Normal acceleration curves obtained by the MPC guidance algorithm: (**a**) Normal acceleration curve of the dive plane; (**b**) Normal acceleration curve of turning plane.

#### **6. Conclusions**

In the present study, we proposed a three-dimensional cooperative terminal guidance strategy for cruise missiles under multiple constraints, introduced the concept of virtual leader missiles, and designed a cooperative strategy for the impact time of virtual leader and follower missiles. On this basis, using the model predictive control method, a threedimensional cooperative guidance law was presented that could simultaneously control the multiple missiles with impact time and impact angle constraints on the premise of meeting the requirements of the miss distance. The algorithm implementation was carried out, and the effectiveness of the algorithm was verified using simulation studies. The robustness of the algorithm and the solution time of the algorithm both need further study in order to improve the operating efficiency of the algorithm and realize real-time calculations.

**Author Contributions:** Conceptualization, M.C. and X.C.; methodology, M.C. and Z.Z.; validation, Z.Z. and Z.L.; formal analysis, X.C. and Z.Z. data curation, Z.L.; writing—original draft preparation, Z.Z. and Z.L.; writing—review and editing, M.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Fully Distributed Control for a Class of Uncertain Multi-Agent Systems with a Directed Topology and Unknown State-Dependent Control Coefficients**

**Zongcheng Liu 1,\*, Hanqiao Huang 2,\*, Sheng Luo 2, Wenxing Fu <sup>2</sup> and Qiuni Li 1,\***


**Abstract:** To address the control of uncertain multi-agent systems (MAS) with completely unknown system nonlinearities and unknown control coefficients, a global consensus method is proposed by constructing novel filters and barrier function-based distributed controllers. The main contributions are as follows. Firstly, a novel two-order filter is designed for each agent to produce informational estimates from the leader, such that a connectivity matrix is not used in the controller's design, solving the difficultly caused by the time-varying control coefficients in a MAS with a directed graph. Secondly, combined with the novel filters, barrier functions are used to construct the distributed controller to deal with the completely unknown system nonlinearities, resulting in the global consensus of the MAS. Finally, it is rigorously proved that the consensus of the MAS is achieved while guaranteeing the prescribed tracking-error performance. Two examples are given to verify the effectiveness of the proposed method, in which the simulation results demonstrate the claims.

**Keywords:** distributed control; MAS; flight control

#### **1. Introduction**

The control of uncertain nonlinear systems has been researched for several decades, such that so many remarkable results have been obtained on this topic [1–9]. However, most of them are for SISO or MIMO systems, and their methods or techniques cannot be directly applying to multi-agent systems, as the information of each agent or subsystems is only available for part of others. According to the topology of information transformation graph, the graph can be divided into undirected and directed graphs. Generally, the consensus control of a MAS with the directed graph is more difficult than the undirected case, since the methods for the directed case are always applicable for the undirected case, but not vice versa.

Recently, some significant progress has been made in the control of a MAS [10–12]. For a linear MAS with undirect graphs, fully distributed adaptive consensus controller is present in [10]. Adaptive asymptotically consensus for an uncertain MAS is achieved in [11], and adaptive asymptotically consensus is achieved in [12] for an uncertain MAS, and so on. However, their methods are only applicable for a MAS with an undirected graph and are in vain for a MAS with a directed graph. For a MAS with a directed graph and constant control coefficients, adaptive consensus for a MAS with system nonlinearities satisfying match conditions is researched in [13] to solve the problem of actuator faults; a fully distributed adaptive consensus control is studied for a MAS with unknown control directions in [14] by using a Nussbaum gain technique; actuator faults in a MAS are considered in [15] with integral chain dynamics; and prescribed performance consensus control for uncertain MAS is investigated in [16]. Though much progress has been made [17–20], it should be noted

**Citation:** Liu, Z.; Huang, H.; Luo, S.; Fu, W.; Li, Q. Fully Distributed Control for a Class of Uncertain Multi-Agent Systems with a Directed Topology and Unknown State-Dependent Control Coefficients. *Appl. Sci.* **2021**, *11*, 11304. https:// doi.org/10.3390/app112311304

Academic Editors: Wei Huang and Dong Zhang

Received: 9 October 2021 Accepted: 16 November 2021 Published: 29 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

that there are still some nonnegligible problems to be solved. Firstly, the existing methods require the control coefficients to be constants, or even known, for a MAS with a directed graph. The main difficulty is that the Laplace matrix for a directed graph is asymmetric and thus the selections of control parameters must always resort to adaptive methods, which falls into trouble when the control coefficients are time-varying and unknown. Secondly, to the best of our knowledge, there is no global consensus control method for a MAS with a directed graph and the systems functions thereof completely unknown, except for [21], wherein the unknown system nonlinearities required to satisfy the Lipschitz conditions and control coefficients are one. Universal approximators such as neural networks (NN) or fuzzy logic systems (FLS) have been attempted to solve the consensus control problem of a MAS with completely unknown system nonlinearities [22–24], however, it is well known that these methods are semi-global in the sense that their stabilities depend on the initial conditions of systems and the careful selection of controller parameters. Therefore, NN or FLS-based approaches cannot guarantee the global consensus of the MAS, though they are very favorable to solve the problem of MAS with unknown nonlinearities.

As for the global control of systems with completely unknown nonlinearities, a pioneering work is [25], wherein a low-complexity controller is presented that cannot only achieve global convergence of all the system signals, but which can also guarantee the prescribed performance of tracking error and state errors. In view of the low complexity and strong robustness of this method, much research has been carried on this method for solving different nonlinear control problems [26–30]. By introducing a novel barrier function, a fault-tolerant controller is designed for a class of unknown nonlinear systems in [26]. With consideration to the constraints of system states, a barrier function-based adaptive control method is proposed in [27]. Addressing systems with unknown control direction and system dynamics, a Nussbaum function-based low-complexity control scheme is designed in [28]. As regards asymptotic tracking control for systems with unknown nonlinearities, an universal global low-complexity controller is proposed in [31]. Nevertheless, it is worth mentioning that the global control of a MAS with unknown nonlinearities is still an unsolved problem, since these methods are based on the condition that the desired output for systems are known, but this knowledge cannot be obtained for some agents of a MAS. Moreover, considering the control coefficients of each agent are time-varying functions, these traditional methods will fall into trouble when solving for the consensus control of a MAS with unknown dynamics.

Motived by the above discussion, we investigate the fully distributed control of a MAS with a directed graph, time-varying control coefficients and completely unknown system nonlinearities. The main contributions of this paper are summarized as follows.


#### **2. Problem Statement and Preliminaries**

Consider a class of uncertain MAS as follows

$$\begin{cases} \dot{\mathbf{x}}\_{i,m} = g\_{i,m}(\overline{\mathbf{x}}\_{i,m})\mathbf{x}\_{i,m+1} + f\_{i,m}(\overline{\mathbf{x}}\_{i,m}) + d\_{i,m}(t, \overline{\mathbf{x}}\_{i,m}), \; m = 1, 2, \dots, n-1\\ \dot{\mathbf{x}}\_{i,n} = g\_{i,n}(\overline{\mathbf{x}}\_{i,n})u\_i + f\_{i,n}(\overline{\mathbf{x}}\_{i,n}) + d\_{i,n}(t, \overline{\mathbf{x}}\_{i,n})\\ y\_i = \mathbf{x}\_{i,1}, \; for \; i = 1, 2, \dots, N \end{cases} \tag{1}$$

where *xi*,*<sup>m</sup>* = [*xi*,1, *xi*,2 ,..., *xi*,*m*] *<sup>T</sup>* <sup>∈</sup> *<sup>R</sup>m*, *<sup>y</sup>* <sup>∈</sup> *<sup>R</sup>*, *<sup>u</sup>* <sup>∈</sup> *<sup>R</sup>*, are the states, the control input and the output of the *i* th subsystem, respectively. The system nonlinearities *fi*,*m*(·), *gi*,*m*(·) : *<sup>R</sup><sup>m</sup>* <sup>×</sup> *<sup>R</sup>*<sup>+</sup> <sup>→</sup> *<sup>R</sup>* are unknown continuous functions with respect to *xi*,*m*. *di*,*m*(*t*, *xi*,*m*), *m* = 1, 2 . . . , *n* represent the system uncertainties and external disturbances.

The desired trajectory for the outputs of the subsystems *yd* is bounded and only known by part of the *<sup>N</sup>* subsystems, with . *yd* being bounded and unknown to all subsystems.

Suppose that the information transmission condition among the group of *N* subsystems can be represented by a directed graph *G* - (*V*, *E*), where *V* = {1,..., *N*} denotes the set of indexes corresponding to each subsystem. The edge (*i*, *j*) ∈ *E* indicates that subsystem *j* could obtain information from subsystem *i*, but not necessarily vice versa. In this case, subsystem *j* is called a neighbor of subsystem *i*, and vice versa. Denoting the set of neighbors for subsystem *i* as *Ni* - {*j* ∈ *V* : (*j*, *i*) ∈ *E*}. Self-edging (*i*, *i*) is not allowed, thus (*i*, *<sup>i</sup>*) ∈/ *<sup>E</sup>* and *<sup>i</sup>* ∈/ *Ni*. The connectivity matrix *<sup>A</sup>* = [*aij*] ∈ *<sup>R</sup>N*×*<sup>N</sup>* of *<sup>G</sup>* is defined as *aij* = 1 if (*j*, *i*) ∈ *E* and *aij* = 0 if (*j*, *i*) ∈/ *E*. An in-degree matrix Δ is introduced, such that <sup>Δ</sup> <sup>=</sup> *diag*(Δ*i*) <sup>∈</sup> *<sup>R</sup>N*×*<sup>N</sup>* with <sup>Δ</sup>*<sup>i</sup>* <sup>=</sup> <sup>∑</sup>*j*∈*Ni aij* being the *<sup>i</sup>* th row sum of *<sup>A</sup>*. Then, the Laplacian matrix of *L* is defined as *L* = Δ − *A*. Defining *B* = *diag*{*μ*1, *μ*<sup>2</sup> ,..., *μN*}, where *μ<sup>i</sup>* = 1 means the *yd* is accessible directly by subsystem *i*, and otherwise, we have *μ<sup>i</sup>* = 0. Throughout this paper, the following notations are used. · is the Euclidean norm of a vector. Letting *<sup>a</sup>* ∈ *<sup>R</sup><sup>n</sup>* and *<sup>b</sup>* ∈ *<sup>R</sup><sup>n</sup>* be two vectors, then define the vector operator .<sup>∗</sup> as *<sup>a</sup>*. <sup>∗</sup> *<sup>b</sup>* = [*a*(1)*b*(1),..., *<sup>a</sup>*(*n*)*b*(*n*)]*T*. Letting *<sup>Q</sup>* be a matrix, *<sup>λ</sup>*min(*Q*) then denotes the minimum eigenvalue of *Q*.

**Assumption 1.** *The directed graph G contains a spanning tree, and the desired trajectory yd is accessible to at least one subsystem, i.e.,* ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *μ<sup>i</sup>* > 0.

**Assumption 2.** *There exist unknown local Lipschitz functions bi*,*m*(*xi*,*m*) *such that, for i* = 1, 2 , . . . , *N*

$$|d\_{i,m}(t, \overline{x}\_{i,m})| \le b\_{i,m}(\overline{x}\_{i,m}), m = 1, 2, \dots, n \tag{2}$$

**Assumption 3.** *The unknown control coefficients gi*,*m*(*xi*,*m*) *is strictly positive or negative. Without a loss of generality, it is assumed to be strictly positive, namely, for i* = 1, 2 , . . . , *N*

$$g\_{i,m}(\overline{x}\_{i,m}) > 0, m = 1, 2, \dots, n \tag{3}$$

**Lemma 1.** *(Ref.* [17]*) Based on Assumption 1, the matrix* (*L* + *B*) *is nonsingular. Defining*

$$\begin{aligned} \theta &= \begin{bmatrix} \theta\_1, \dots, \theta\_N \end{bmatrix}^T = \begin{pmatrix} L+B \end{pmatrix}^{-1} \begin{bmatrix} 1, \dots, 1 \end{bmatrix}^T\\ P &= \operatorname{diag} \{ P\_1, \dots, P\_N \} = \operatorname{diag} \left\{ \frac{1}{\theta\_1}, \dots, \frac{1}{\theta\_N} \right\} \\ Q &= P(L+B) + (L+B)^T P \end{aligned} \tag{4}$$

*then θ<sup>i</sup>* > 0 *for i* = 1, 2 , . . . , *N and Q is definitely positive.*

**Remark 1.** *In contrast to the methods in [13–16] for a MAS with a directed graph, the control coefficients, gi*,*m*(*xi*,*m*)*, are time-varying and unknown continuous functions in this paper, which makes the control design much more difficult, since the matrix P in (4) is always unknown and required to be estimated adaptively while the unknown control coefficients gi*,1(*xi*,*m*) *make P inestimable. To cope with this problem, a novel two-order filter will be given for each agent (shown later).*

**Remark 2.** *The system nonlinearities, fi*,*m*(*xi*,*m*) *and gi*,*m*(*xi*,*m*)*, are completely unknown functions so that there is little knowledge with which to construct the controller. To deal with this problem, neural networks and fuzzy logic systems have been used to approximate the unknown functions caused by the system nonlinearities fi*,*m*(*xi*,*m*) *and gi*,*m*(*xi*,*m*) *in [22–24], however, only semi-global results can be obtained by use of these approximators. To construct a distributed controller for a MAS with these unknown system nonlinearities with global consensus is a challenging problem, which is solved by the skillfull cooperation of novel two-order filters and barrier functions in the following.*

#### **3. Design of Distributed Controller and Filters**

In this section, a distributed asymptotic tracking controller for a multi-agent system (1) will be designed. To facilitate the control design in distributed manner, design a filter (*qi*,1, *qi*,2) for each agent *i*, with *i* = 1,..., *N*.

*3.1. Filters Design*

Denote

$$z\_{i,j} = \sum\_{k=1}^{N} a\_{i,k}(q\_{i,j} - q\_{k,j}) + \mu\_i(q\_{i,j} - y\_d^{(j-1)}), j = 1, 2\tag{5}$$

Then, design the filters as

$$\begin{cases} \dot{q}\_{i,1} = q\_{i,2} \\ \dot{q}\_{i,2} = v\_i \end{cases} \tag{6}$$

with

$$w\_i = -\mathfrak{c}\_1 \underline{z}\_i - \mathfrak{c}\_0 q\_{i,2} - \mathfrak{c}\_0 \text{sgn}(\underline{z}\_i) \sum\_{j=1}^2 \hat{\mathbb{P}}\_{i,j} \tag{7}$$

$$\dot{\mathcal{F}}\_{i,j} = \sum\_{k=1}^{N} a\_{i,k} (\mathcal{F}\_{k,j} - \mathcal{F}\_{i,j}) + \mu\_i (F\_j - \mathcal{y}\_d^{(j-1)}), \; j = 1, 2 \tag{8}$$

where *zi* = *c*0*zi*,1 + *zi*,2, *y* (0) *<sup>d</sup>* = *yd* and *y* (1) *<sup>d</sup>* <sup>=</sup> . *yd*, and *c*0, *c*<sup>1</sup> are design parameters chosen as *c*<sup>0</sup> ≥ 1 and *c*<sup>1</sup> > *c*<sup>0</sup> + 1. We then have the following lemma.

**Lemma 2.** *Consider a closed-loop system consisting of Nfilters (6) satisfying Assumption 1 with local controller (7). The asymptotic consensus tracking of all the filter's outputs to yd*(*t*) *is achieved, i.e.,* lim *<sup>t</sup>*→+∞|*qi*,1 <sup>−</sup> *yd*(*t*)<sup>|</sup> <sup>=</sup> <sup>0</sup>*. Moreover, qi*,1 *and qi*,2 *are bounded.*

**Proof (of Lemma 2).** Consider the following Lyapunov function

$$V\_z = \frac{1}{2} z^T P z + \frac{1}{2\gamma} \sum\_{j=1}^2 \hat{F}\_j^T P \tilde{F}\_j \tag{9}$$

where *z* = [*z*1,*z*<sup>2</sup> ,..., *zN*] *<sup>T</sup>*, *<sup>F</sup><sup>j</sup>* <sup>=</sup> *<sup>F</sup>*<sup>ˆ</sup> *<sup>j</sup>* <sup>−</sup> *Fj*, *<sup>F</sup>*<sup>ˆ</sup> *<sup>j</sup>* = [*F*ˆ 1,*j*, *F*ˆ 2,*<sup>j</sup>* ,..., *F*ˆ *<sup>N</sup>*,*j*] *T* , *Fj* = [*F*1,*j*, *F*2,*<sup>j</sup>* ,..., *FN*,*j*] *T*, and *<sup>γ</sup>* <sup>&</sup>gt; 0 is a constant satisfying *<sup>γ</sup>* <sup>&</sup>lt; <sup>2</sup>*λ*<sup>2</sup> min(*Q*) *<sup>ϕ</sup>*<sup>2</sup> with *<sup>ϕ</sup>* = *P*(*<sup>L</sup>* + *<sup>B</sup>*). Denote *zj* = [*z*1,*j*, *z*2,*<sup>j</sup>* ,..., *zN*,*j*] *<sup>T</sup>* and *qj* = [*q*1,*j*, *<sup>q</sup>*2,*<sup>j</sup>* ,..., *qN*,*j*] *<sup>T</sup>*. Then, we have

$$\dot{\bar{z}} = (L+B)(\varepsilon\_0 \eta\_2 - \varepsilon\_0 y\_d^{(1)} + v - y\_d^{(2)}) \tag{10}$$

Using (9) and (10), the time derivative of *Vz* is

$$\begin{split} \dot{V}\_{z} &= z^{T}P(L+B)(-c\_{1}\underline{z}-c\_{0}\sum\_{j=1}^{2}\text{sgn}(\underline{z}). \star \underline{E}\_{j} \\ &+ c\_{0}\sum\_{j=1}^{2}\varepsilon(\underline{z}). \star \underline{E}\_{j} - c\_{0}y\_{d}^{(1)} - y\_{d}^{(2)}) \\ &- \frac{1}{\gamma}\sum\_{j=1}^{2}\widetilde{F}\_{j}^{T}P(L+B)\widetilde{F}\_{j} \\ &\leq -c\_{1}z^{T}Qz - \frac{1}{\gamma}\sum\_{j=1}^{2}\widetilde{F}\_{j}^{T}Q\widetilde{F}\_{j} - c\_{0}\sum\_{j=1}^{2}z^{T}P\text{Absgn}(\underline{z}). \star \underline{E}\_{j} \\ &+ c\_{0}\sum\_{j=1}^{2}z^{T}P\text{Asgn}(\underline{z}). \star \underline{E}\_{j} - c\_{0}\sum\_{j=1}^{2}z^{T}P\text{Bsggn}(\underline{z}). \star \underline{E}\_{j} \\ &- \sum\_{j=1}^{2}z^{T}P(L+B)(c\_{0}y\_{d}^{(1)} + y\_{d}^{(2)}) + c\_{0}\sum\_{j=1}^{2}||z||\|P(L+B)||\|\underline{E}\_{j}|| \end{split} \tag{11}$$

where sgn(*z*)=[sgn(*z*1), . . . , sgn(*zN*)]*T*. By noting

$$\mathop{\rm c\_{0}}{\mathop{\rm c\_{0}}{\sum\_{j=1}^{2}}} \boldsymbol{z}^{T} P \boldsymbol{\Delta} \text{sgn}(\boldsymbol{\underline{z}}) .\* \mathop{\rm E\_{j}}{\mathop{\rm c\_{0}}{\mathop{\rm c\_{j}}{\sum\_{i=1}^{N}}} \boldsymbol{F}\_{\rm{j}} \sum\_{i=1}^{N} p\_{i} a\_{i\mathbf{k}} |\boldsymbol{\underline{z}}\_{i}| \tag{12}$$

$$c\_0 \sum\_{j=1}^{2} z^T P A \text{sgn}(\underline{z}). \ \* \ E\_j \leq c\_0 \sum\_{j=1}^{2} F\_j \sum\_{i=1}^{N} p\_i a\_{ik} |\underline{z}\_i| \tag{13}$$

$$\mathop{\rm c\_0}\sum\_{j=1}^{2} z^T P B \text{sgn}(\underline{z}). \ast \underline{F}\_j = \mathop{\rm c\_0}\sum\_{j=1}^{2} F\_j \sum\_{i=1}^{N} \mu\_i p\_i |\underline{z}\_i| \tag{14}$$

$$\left| \sum\_{j=1}^{2} z^T P(L+B) (c\_0 y\_d^{(1)} + y\_d^{(2)}) \right| \le c\_0 \sum\_{j=1}^{2} F\_j \sum\_{i=1}^{N} \mu\_i p\_i |\underline{z}\_i| \tag{15}$$

$$\sum\_{j=1}^{2} \left\lVert z \right\rVert \left\lVert P(L+B) \right\rVert \left\lVert \tilde{\Delta}\_{j} \right\rVert \leq \lambda\_{\text{min}}(Q) \left\lVert z \right\rVert^{2} + \sum\_{j=1}^{2} \frac{q^{2}}{2\lambda\_{\text{min}}(Q)} \left\lVert \tilde{\Delta}\_{j} \right\rVert \tag{16}$$

we have .

$$\dot{V}\_z \le -c\_2 \|z\|^2 - \gamma^\* \|\underline{\mathring{E}}\_j\|\tag{17}$$

where *c*<sup>2</sup> = *λ*min(*Q*)(*c*<sup>1</sup> − *c*0), *γ*<sup>∗</sup> = *λ*min(*Q*) 1 *<sup>γ</sup>* <sup>−</sup> *<sup>ϕ</sup>*<sup>2</sup> 2*λ*<sup>2</sup> min(*Q*) . It is easily verified that *c*<sup>2</sup> > 0 and *<sup>γ</sup>*<sup>∗</sup> <sup>&</sup>gt; 0, therefore, it follows from (17) that lim*t*→+∞*z* <sup>=</sup> 0 and hence lim*t*→+∞|*qi*,1 <sup>−</sup> *yd*(*t*)<sup>|</sup> <sup>=</sup> 0. From the boundedness of *Vz* and *z*, the boundedness of *qi*,1 and *qi*,2 are easily obtained. This completes the proof.

**Remark 3.** *As is seen, a two-order filter is designed to produce a signal qi*,1 *for each agent. Actually, qi*,1 *is the estimate of yd, as seen in Lemma 2, and the agents no longer require estimating the matrix P. Cooperating these two-order filters makes the use of traditional adaptive control techniques for MAS be easy, and thus the unknown time-varying control coefficients for a MAS with a directed graph can be dealt with.*

#### *3.2. Design of the Distributed Controller*

In this section, cooperating with the filter (6), the distributed adaptive controller is designed. The following error variables and change of coordinates are introduced

$$\varepsilon\_{i,1} = \frac{1}{k\_i(t)} \left( \mathbf{x}\_{i,1} - q\_{i,1} - \sigma(t) (\mathbf{x}\_{i,1}^0 - q\_{i,1}^0) \right) \tag{18}$$

$$x\_{i,m} = \frac{1}{k\_i(t)} \left( \mathbf{x}\_{i,m} - a\_{i,m-1} - \sigma(t) \mathbf{x}\_{i,m}^0 \right), m = 2, \dots, n \tag{19}$$

with

$$\sigma(t) = \begin{cases} \frac{1}{t\_s^2} \left( t - t\_s \right)^2, & t < t\_s \\ 0, \ t \ge t\_s \end{cases} \tag{20}$$

where *x*<sup>0</sup> *<sup>i</sup>*,*<sup>j</sup>* = *xi*,*j*(0), *<sup>j</sup>* = <sup>1</sup> ,..., *<sup>n</sup>*, and *<sup>q</sup>*<sup>0</sup> *<sup>i</sup>*,1 = *qi*,1(0), and *ts* can be any positive constant. Let *ts* = 1 in this paper.

Then, the intermediate control signals *αi*,*<sup>m</sup>* and the distributed controller *ui* are determined as follows

$$\alpha\_{i,m} = -\lambda\_{i,m} \frac{e\_{i,m}}{1 - e\_{i,m}^2}, \; m = 1, \dots, n - 1 \tag{21}$$

$$u\_i = -\lambda\_{i,n} \frac{e\_{i,n}}{1 - e\_{i,n}^2} \tag{22}$$

where *λi*,*m*, 1 ≤ *i* ≤ *N*, 1 ≤ *m* ≤ *n* are the positive design parameters. It is easy to verify that *ei*,*m*(0) = 0 for all 1 ≤ *i* ≤ *N*, 1 ≤ *m* ≤ *n* and *ei*,1(*t*) = *xi*,1(*t*) − *qi*,1(*t*) for *t* ≥ *ts*, 1 ≤ *i* ≤ *N*. *ki*(*t*) are the constrained functions chosen by the designer and used as prescriptive performance functions, satisfying 0 < *k* ≤ *ki*(*t*) ≤ *k*, . *ki*(*t*) <sup>≤</sup> *<sup>k</sup>* with *<sup>k</sup>*, *<sup>k</sup>* and *k* being positive constants.

**Remark 4.** *Function σ*(*t*) *is constructed to attenuate the influence of the initial conditions, since it makes ei*,*m*(0) = 0 *and therefore stable results can be achieved under all initial conditions using σ*(*t*) *for transformation (20). It should also be noted that <sup>σ</sup>*(*t*) *of (20) is continuously differentiable and* . *σ*(*t*) *does not exist in the further design of the controller, which means that the designed intermediate control signals and actual controller are smooth.*

#### **4. Stability Analysis**

In this section, we will give the main results with the designed fully distributed controller and present the stability analysis. The main results of this article are as follows.

**Theorem 1.** *Consider the closed-loop system consisting of N uncertain agents as (1) satisfying Assumptions 1–3, the intermediate control signals (21) and the distributed controller (22). Then, we have the following properties:*


**Proof (of Theorem 1).** From (18), (19) and (21), we have

$$\mathbf{x}\_{i,1} = k\_i \mathbf{c}\_{i,1} + q\_{i,1} + \sigma(t)(\mathbf{x}\_{i,1}^0 - q\_{i,1}^0) \tag{2.3}$$

$$\mathbf{x}\_{i,m} = k\_i \mathbf{c}\_{i,m} + a\_{i,m-1}(\mathbf{c}\_{i,m-1}) + \sigma(t) \mathbf{x}\_{i,m}^0, m = 2, \dots, n \tag{24}$$

It can be observed from (23) that *xi*,1 is continuous function of *ei*,1, *qi*,1 and *σ*(*t*), where *qi*,1 and *σ*(*t*) are bounded time-varying functions. Thus, *xi*,1 can be rewritten as the form of continuous function of *ei*,1 and *t*. Similar analysis can be made for *xi*,*m*. Therefore, we obtain

$$\begin{split} \dot{\varepsilon}\_{i,1} &= \frac{1}{k\_{i}} (f\_{i,1}(\mathbf{x}\_{i,1}) + g\_{i,1}(\mathbf{x}\_{i,1})\mathbf{x}\_{i,2} - q\_{i,2} - \dot{\sigma}(t)(\mathbf{x}\_{i,1}^{0} - q\_{i,1}^{0}) - \dot{k}\_{i}\varepsilon\_{i,1} + d\_{i,1}(t, \mathbf{x}\_{i,1}) \\ &= h\_{i,1}(t, \varepsilon\_{i,1}, \varepsilon\_{i,2}, \vartheta\_{i}) \end{split} \tag{25}$$

$$\begin{split} \dot{e}\_{i,m} &= \frac{1}{k\_i} (f\_{i,m}(\mathfrak{X}\_{i,m}) + \mathfrak{g}\_{i,m}(\mathfrak{X}\_{i,m})x\_{i,m+1} - \frac{\partial a\_{i,m-1}}{\partial c\_{i,m-1}} h\_{i,m-1}(t, e\_{i,1}, \dots, e\_{i,m}) \\ &- \dot{\sigma}(t)x\_{i,m}^0 - \dot{k}\_i c\_{i,m} + d\_{i,m}(t, \overline{x}\_{i,m})) \\ &= h\_{i,m}(t\_i, e\_{i,1}, \dots, e\_{i,m+1}) \end{split} \tag{26}$$

where *ei*,*n*+<sup>1</sup> = 0, and *hi*,*m*(·), *m* = 1, 2 ,..., *n* are some continuous functions. Defining *ei* = [*ei*,1 ,...,*ei*,*n*] *<sup>T</sup>* and in view of (25) and (26), we obtain

$$\dot{c}\_{i} = h\_{i}(t, c\_{i}) = \begin{bmatrix} h\_{i,1}(t, c\_{i,1}) \\ h\_{i,2}(t, c\_{i,1}, c\_{i,2}) \\ \vdots \\ h\_{i,n}(t, c\_{i,1}, \dots, c\_{i,n}) \end{bmatrix} \tag{27}$$

Let us define the open set:

$$
\Omega\_{\mathfrak{k}} = \underbrace{(-1, 1) \times \dots \times (-1, 1)}\_{n-times \text{ times}} \tag{28}
$$

It is easily observed that *ei*(0) ∈ Ω*e*, *i* = 1, 2 ,..., *N*. Additionally, *hi*,*m*(·), *m* = 1, 2 ,..., *n* are continuous with respect to all its variables, owing to the fact that *yd*, . *yd*, *σ*(*t*), *qi*,1, *ki*(*t*), *fi*,*m*, *gi*,*m*, *αi*,*<sup>m</sup>* are all continuous differentiable functions. Therefore, it follows from [32] that the conditions on *hi*,*m*(·) ensure the existence and uniqueness of a maximal solution *ηi*(*t*) on the time interval [0, *t*max), namely, *ei*(*t*) ∈ Ω*<sup>e</sup>* for *t* ∈ [0, *t*max), which implies

$$
\sigma\_{i,m} \in (-1, 1), for \; \forall t \in [0, t\_{\text{max}}) \tag{29}
$$

for *i* = 1,..., *N*, and *m* = 1,..., *n*.

In the following, we will prove that *t*max = +∞ by seeking a contradiction. Suppose that *t*max < +∞; then the related analysis is performed as follows, and all of what follows is based on *t* ∈ [0, *t*max).

Step 1: Consider the following positive definite functions

$$V\_{i,1} = \frac{1}{2} \log \frac{1}{1 - e\_{i,1}^2} \tag{30}$$

Let *ξi*,1 = <sup>1</sup> <sup>1</sup>−*e*<sup>2</sup> *i*,1 . It follows from (21), (24), (25) and (30) that the time derivative of *Vi*,1 is .

$$\mathbf{s}$$

$$\begin{array}{l} \dot{V}\_{i,1} = \frac{\dot{\mathfrak{T}}\_{i,1}}{k\_{i}} (f\_{i,1}(\mathbf{x}\_{i,1}) + \mathfrak{g}\_{i,1}(\mathbf{x}\_{i,1})(\mathbf{a}\_{i,1} + k\_{i}\boldsymbol{\varepsilon}\_{i,2} + \sigma(t)\mathbf{x}\_{i,2}^{0}) + \\ \qquad - q\_{i,2} - \dot{\sigma}(t) \left(\mathbf{x}\_{i,1}^{0} - q\_{i,1}^{0}\right) - \dot{k}\_{i}\boldsymbol{\varepsilon}\_{i,1} + d\_{i,1}(t, \boldsymbol{\varepsilon}\_{i,1})\right) \\ \leq \frac{1}{k\_{i}} g\_{i,1}(\mathbf{x}\_{i,1}|\mathbf{a}\_{i,1}\mathbf{x}\_{i,1}^{\mathsf{T}} + F\_{i,1}(t)|\boldsymbol{\xi}\_{i,1}| \\ \leq -\lambda\_{i,1} \boldsymbol{E}\_{i,1} \underline{\mathbf{x}}\_{i,1}^{\mathsf{T}} + F\_{i,1}|\boldsymbol{\xi}\_{i,1}| \end{array} \tag{31}$$

where *Fi*,1 = <sup>1</sup> *ki* (| *fi*,1(*xi*,1)| + |*gi*,1(*xi*,1)| *kiei*,2 + *σ*(*t*)*x*<sup>0</sup> *i*,2 <sup>+</sup> <sup>|</sup>*qi*,2|<sup>+</sup> . *σ*(*t*)(*x*<sup>0</sup> *<sup>i</sup>*,1 − *<sup>q</sup>*<sup>0</sup> *i*,1) + . *kiei*,1 + *bi*,1(*xi*,1)) and *Ei*,1 <sup>=</sup> *gi*,1(*xi*,1) *ki* . Note that *xi*,1, *ei*,1 and *ei*,2 are bounded on <sup>Ω</sup>*<sup>e</sup>* because (23) and (29), respectively. Utilizing the fact that *ki*(*t*), . *ki*(*t*), *σ*(*t*), *qi*,1, *qi*,2 are bounded and employing the extreme value theorem, owing to the continuity of *fi*,1(·), *gi*,1(·) and *bi*,1(·), we arrive at

$$E\_{i,1} \ge c\_{1,1} > 0\tag{32}$$

$$c\_{3,1} \ge F\_{i,1} \ge c\_{2,1} \ge 0 \tag{33}$$

where *c*1,1, *c*2,1, and *c*3,1 are some unknown positive constants.

Then, substituting (32) and (33) into (31) yields

$$
\dot{V}\_{i,1} \le -\lambda\_{i,1} c\_{1,1} \xi\_{i,1}^2 + c\_{3,1} |\xi\_{i,1}| \tag{34}
$$

From (34), it follows that . *Vi*,1 is negative when |*ξi*,1| ≤ *c*3,1/*λi*,1*c*1,1 and subsequently that

$$|\xi\_{i,1}^\*| \le \xi\_{i,1}^\* = \frac{c\_{3,1}}{\lambda\_{i,1}c\_{1,1}} \tag{35}$$

which implies

$$|c\_{i,1}(t)| \le c\_{4,1} = 1 - \frac{1}{\xi\_{i,1}^{\varkappa \cdot 2}} < 1\tag{36}$$

As a result, the control signal *αi*,1 is bounded. Moreover, invoking (24), we also can conclude the boundedness of *xi*,2. Therefore, the time derivative of *αi*,1 is

$$
\dot{\alpha}\_{i,1} = -\lambda\_{i,1} \dot{\xi}\_{i,1} \tag{37}
$$

where

$$\begin{split} \left| \dot{\xi}\_{i,1} \right| \leq & \frac{(1+\epsilon\_{i,1}^{2})}{k\_{i}(1-\epsilon\_{i,1}^{2})} (|f\_{i,1}(\mathbf{x}\_{i,1})| + \left| \mathcal{G}\_{i,1}(\mathbf{x}\_{i,1}) (k\_{i}\mathbf{e}\_{i,2} + \rho(t)\mathbf{x}\_{i,2}^{0} + a\_{i,1}) \right| \\ &+ |q\_{i,2}| + \left| \dot{\sigma}(t)(\mathbf{x}\_{i,1}^{0} - q\_{i,1}^{0}) \right| + \left| \dot{k}\_{i}\mathbf{e}\_{i,1} \right| + b\_{i,1}(\mathbf{x}\_{i,1}) \end{split} \tag{38}$$

Noting (36) and using the same analysis as (33), it also easy to conclude the boundedness of . *<sup>ξ</sup>i*,1, and hence . *αi*,1.

Step *j* (2 ≤ *j* ≤ *n*): Consider the following positive definite functions

$$V\_{i,j} = \frac{1}{2} \log \frac{1}{1 - e\_{i,j}^2} \tag{39}$$

Let *ξi*,*<sup>j</sup>* = <sup>1</sup> <sup>1</sup>−*e*<sup>2</sup> *i*,*j* . In a similar fashion to that in the former step, by noting Assumption 1, it follows from (21), (24), (26) and (39) that the time derivative of *Vi*,*<sup>j</sup>* is

$$\begin{array}{l} \dot{V}\_{i,j} = \frac{\mathbb{E}\_{i,j}}{k\_i} (f\_{i,j}(\overline{\mathbf{x}}\_{i,j}) + \mathbb{g}\_{i,j}(\overline{\mathbf{x}}\_{i,j})(\mathbf{a}\_{i,j} + k\_i \mathbf{e}\_{i,j+1} + \sigma(t)\mathbf{x}\_{i,j+1}^0) \\ \qquad - \dot{\mathbf{a}}\_{i,j-1} - \dot{\sigma}(t)\mathbf{x}\_{i,j}^0 - \dot{k}\_i \mathbf{e}\_{i,j} + d\_{i,j}(t, \overline{\mathbf{x}}\_{i,j})) \\ \leq \frac{1}{k\_i} \underline{\mathbf{g}}\_{i,j}(\overline{\mathbf{x}}\_{i,j}) \mathbf{a}\_{i,j}\underline{\mathbf{x}}\_{i,j} + F\_{i,j} |\underline{\mathbf{x}}\_{i,j}^0| \\ \leq -\lambda\_{i,j} E\_{i,j} \underline{\mathbf{x}}\_{i,j}^2 + F\_{i,j} |\underline{\mathbf{x}}\_{i,j}^0| \end{array} \tag{40}$$

where *Fi*,*<sup>j</sup>* = <sup>1</sup> *ki* ( *fi*,1(*xi*,*j*) <sup>+</sup> *gi*,1(*xi*,*j*) *kiei*,*j*+<sup>1</sup> + *σ*(*t*)*x*<sup>0</sup> *i*,*j*+1 <sup>+</sup> . *αi*,*j*−<sup>1</sup> + . *σ*(*t*)*x*<sup>0</sup> *i*,*j* + . *kiei*,*<sup>j</sup>* + *bi*,*j*(*xi*,*j*)) and *Ei*,*<sup>j</sup>* <sup>=</sup> *<sup>π</sup>gi*,*<sup>j</sup>* (*xi*,*j*) <sup>2</sup>*ki* . Noting that *xi*,*m*, *<sup>m</sup>* = 1, 2 ,..., *<sup>j</sup>* are bounded on <sup>Ω</sup>*<sup>e</sup>* because the boundedness of *αi*,*m*−1, *ei*,*<sup>j</sup>* and *ei*,*j*+<sup>1</sup> are bounded on Ω*<sup>e</sup>* in view of (29). Utilizing the fact that *ki*(*t*), . *ki*(*t*) are bounded and employing the extreme value theorem owing to the continuity of *fi*,*j*(·), *gi*,*j*(·) and *bi*,*j*(·), we arrive at

$$E\_{i,j} \ge c\_{1,j} > 0\tag{41}$$

*c*3,*<sup>j</sup>* ≥ *Fi*,*<sup>j</sup>* ≥ *c*2,*<sup>j</sup>* ≥ 0 (42)

with *c*1,*j*, *c*2,*<sup>j</sup>* and *c*3,*<sup>j</sup>* being some unknown positive constants. Then, substituting (41) and (42) into (40) yields

$$\dot{V}\_{i,j} \le -\lambda\_{i,j} c\_{1,j} \xi\_{i,j}^2 + c\_{3,j} \|\xi\_{i,j}\|\tag{43}$$

From (43), it follows that . *Vi*,*<sup>j</sup>* is negative when *ξi*,*<sup>j</sup>* ≤ *c*3,*j*/*λi*,*jc*1,*<sup>j</sup>* and subsequently that

$$\left|\mathcal{J}\_{i,j}\right| \le \mathcal{J}\_{i,j}^\* = \frac{\mathcal{L}\_{\mathfrak{Z},j}}{\lambda\_{i,j}\mathcal{L}\_{\mathfrak{I},j}} \tag{44}$$

which implies

$$\left|c\_{i,j}(t)\right| \le c\_{4,j} = 1 - \frac{1}{\zeta\_{i,j}^{\varkappa/2}} < 1\tag{45}$$

As a result, the control signal *αi*,*<sup>j</sup>* is bounded. Moreover, we also can conclude the boundedness of *xi*,*j*+<sup>1</sup> by noting (24). Finally, the time derivative of *αi*,*<sup>j</sup>* is

$$
\dot{\alpha}\_{i,j} = -\lambda\_{i,j} \dot{\xi}\_{i,j} \tag{46}
$$

where

$$\begin{array}{c|c} \left| \dot{\xi}\_{i,j} \right| \leq \frac{(1+\epsilon\_{i,j}^{2})}{k\_{i}(1-\epsilon\_{i,j}^{2})} (\left| f\_{i,j}(\overline{\mathbf{x}}\_{i,j}) \right| + \left| \mathbf{g}\_{i,j}(\overline{\mathbf{x}}\_{i,j}) (k\_{i}\mathbf{e}\_{i,j+1} + \sigma(\mathbf{t})\mathbf{x}\_{i,j+1}^{0} + a\_{i,j}) \right| \\ \quad + \left| \dot{\sigma}(\mathbf{t})\mathbf{x}\_{i,j}^{0} \right| + \left| \dot{k}\_{i}\mathbf{e}\_{i,j} \right| + b\_{i,j}(\overline{\mathbf{x}}\_{i,j}) \end{array} \tag{47}$$

Noting (45) and using the same analysis as (42), it also easy to conclude the boundedness of . *<sup>ξ</sup>i*,*<sup>j</sup>* and hence . *αi*,*j*.

Step *n*: Consider the following Lyapunov functions

$$V\_{i,n} = \frac{1}{2} \log \frac{1}{1 - e\_{i,n}^2} \tag{48}$$

Let *ξi*,*<sup>n</sup>* = <sup>1</sup> <sup>1</sup>−*e*<sup>2</sup> *i*,*n* . Similar as the former steps, we can have

$$\dot{V}\_{i,n} \le -\lambda\_{i,n} c\_{1,n} \zeta\_{i,n}^2 + c\_{3,n} |\zeta\_{i,n}| \tag{49}$$

where *<sup>c</sup>*1,*<sup>n</sup>* and *<sup>c</sup>*3,*<sup>n</sup>* are some unknown positive constants. It follows from (49) that . *Vi*,*<sup>n</sup>* is negative when |*ξi*,*n*| ≤ *c*3,*n*/*λi*,*nc*1,*<sup>n</sup>* and subsequently that

$$|\zeta\_{i,n}| \le \zeta\_{i,n}^{\*} = \frac{c\_{3,n}}{\lambda\_{i,n}c\_{1,n}}\tag{50}$$

which implies

$$|e\_{i,\mathfrak{u}}(t)| \le c\_{4,\mathfrak{u}} = 1 - \frac{1}{\xi\_{i,\mathfrak{u}}^{\ast \cdot \mathfrak{Z}}} < 1\tag{51}$$

As a result, the control signal *αi*,*<sup>j</sup>* is bounded. Moreover, we also can conclude the boundedness of *ui*. Notice that (36), (45) and (51) imply that *ei*(*t*) ∈ Ω *<sup>e</sup>*, for ∀*t* ∈ [0, *t*max), *i* = 1, 2 , . . . , *N*, where the set Ω *e* is nonempty and compact, defined as

$$
\Omega'\_{\mathfrak{c}} = [-c\_{4,1}, c\_{4,1}] \times [-c\_{4,2}, c\_{4,2}] \cdot \cdots \times [-c\_{4,n}, c\_{4,n}],
$$

Owing to (36), (45) and (51) it is straightforward to verify that Ω *<sup>e</sup>* ⊂ Ω*e*. Therefore, assuming *t*max < +∞ dictates the existence of a time instant *t* ∈ [0, *t*max), such that *ei*(*t* ) ∈/ Ω *<sup>e</sup>*, which is a clear contradiction. Therefore, *t*max = +∞. Hence, all closed-loop signals remain bounded and moreover *ei*(*t*) ∈ Ω *<sup>e</sup>* ⊂ Ω*e*, *f or* ∀*t* ≥ 0. Furthermore, from (36) we conclude that

$$|e\_{i,1}(t)| \le c\_{4,1} < 1\tag{52}$$

Then, for all *t* ≥ 0. In view of Lemma 2 and (52), we have

$$\begin{array}{l} \lim\_{t \to +\infty} |y\_i - y\_d| = \lim\_{t \to +\infty} |q\_{i,1} - y\_d + y\_i - q\_{i,1}| \\ \leq \lim\_{t \to +\infty} |q\_{i,1} - y\_d| + \lim\_{t \to +\infty} |y\_i - q\_{i,1}| \\ \leq \lim\_{t \to +\infty} |k\_i c\_{i,1}(t)| \\ \leq k\_i \end{array} \tag{53}$$

This completes the proof.

#### **5. Simulation Study**

Two examples will be given to demonstrate the effectiveness of the proposed distributed adaptive controller in this section, as follows. **Example 1**. *Consider the following multi-agent systems*

$$\begin{cases}
\dot{\mathbf{x}}\_{i,1} = g\_{i,1}(\overline{\mathbf{x}}\_{i,1})\mathbf{x}\_{i,2} + f\_{i,1}(t, \overline{\mathbf{x}}\_{i,1}), \\
\dot{\mathbf{x}}\_{i,2} = g\_{i,2}(\overline{\mathbf{x}}\_{i,2})\mathbf{u}\_i + f\_{i,2}(t, \overline{\mathbf{x}}\_{i,2}) \\
y\_i = \mathbf{x}\_{i,1}, for \ i = 1, 2, 3, 4
\end{cases}$$

*with the system functions chosen as follows: f*1,1 = *x*<sup>2</sup> 1,1*, <sup>g</sup>*1,1 = <sup>1</sup> + *<sup>x</sup>*<sup>2</sup> 1,1*, f*1,2 = *x*1,1*x*1,2*, g*1,2 = 1*, f*2,1 = *x*<sup>3</sup> 2,1 + 0.2 sin *t, g*2,1 = 1 + 0.1 cos *x*2,1*, f*2,2 = *x*2,1*x*2,2*, g*2,2 = 1*, f*3,1 = *x*3,1 sin *x*3,1*, g*3,1 = 1*, f*3,2 = *x*3,1*x*3,2 + 0.1 sin *t, g*3,2 = 1*, f*4,1 = *x*4,1 + 0.8 + 0.2 sin *t, g*4,1 = 1*, f*4,2 = *x*4,1*x*4,2 + 0.2 cos *t, g*4,2 = 1*. The communication topology for these subsystems are depicted in* Figure 1.

**Figure 1.** Communication topology for four subsystems.

The desired trajectory for the outputs of each subsystem is *yd* = sin *t*. The initial conditions for each subsystems are set as: *x*1,1(0) = 0.5, *x*2,1(0) = −0.5, *x*3,1(0) = 0, *x*4,1(0) = 0.1 and *x*1,2(0) = *x*2,2(0) = *x*3,2(0) = *x*4,2(0) = 0. Then, the intermediate control signals are designed and the distributed controllers are designed as follows

$$\begin{aligned} \alpha\_{i,1} &= -\lambda\_{i,1} \frac{\varepsilon\_{i,1}}{1 - \varepsilon\_{i,1}^2}, \; i = 1,2,3,4 \\\\ u\_i &= -\lambda\_{i,2} \frac{\varepsilon\_{i,2}}{1 - \varepsilon\_{i,2}^2}, \; i = 1,2,3,4 \end{aligned}$$

where their control parameters and functions are selected as: *λ*1,1 = 5, *λ*2,1 = 5, *λ*3,1 = 5, *λ*4,1 = 4, *λ*1,2 = 10, *λ*2,2 = 20, *λ*3,2 = 10 and *λ*4,2 = 10, *ki*(*t*) = 3*e*−0.5*<sup>t</sup>* + 0.01 for *i* = 1, 2, 3, 4. For the filters, the parameters are chosen as: *c*<sup>0</sup> = 2 and *c*<sup>1</sup> = 6. Then, the simulation results are reported as Figures 2–4.

**Figure 2.** Tracking errors *xi*,1 − *yd* for 1 ≤ *i* ≤ 4.

**Figure 3.** Outputs *xi*,1 for 1 ≤ *i* ≤ 4 and *yd*.

**Figure 4.** Distributed control inputs *ui* for 1 ≤ *i* ≤ 4.

It can be observed from Figures 2–4 that under the designed distributed controllers, the outputs of the subsystems track the desired trajectory very quick, and the tracking performance is satisfactory.

**Example 2.** *Consider the consensus for four high-maneuver fighters, with communication topologies as in Figure 5 and their flight control systems as follows [33].*

$$\begin{cases} \dot{X}\_{i,1} = f\_1(X\_{i,1}, X\_{i,3}) + G\_1(X\_{i,1})X\_{i,2} \\ \dot{X}\_{i,2} = f\_2(X\_i) + G\_2u\_i \\ \dot{X}\_{i,3} = f\_3(X\_i) \end{cases} \tag{54}$$

*with*

$$f\_1(X\_{i,1}, X\_{i,3}) = \begin{bmatrix} q\_i \tan \theta\_i \sin \phi\_i + r\_i \tan \theta\_i \cos \phi\_i \\ p\_i \beta\_i + z\_0 \Delta a\_i + (g\_0/V\_i)(\cos \theta\_i \cos \phi\_i - \cos \theta\_0) \\ y\_\beta \beta\_i + p\_i(\sin \alpha\_0 + \Delta a\_i) + (g\_0/V\_i)\cos \theta\_i \sin \phi\_i \end{bmatrix}$$

$$G\_1(X\_{i,1}) = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & -\cos \alpha\_0 \end{bmatrix}$$

$$f\_2(X\_i) = \begin{bmatrix} l\_\beta \beta\_i + l\_p p\_i + l\_q q\_i + l\_r r\_i + (l\_{\delta\alpha} \beta\_i + l\_{nr} r\_i) \Delta a\_i - i\_1 q\_i r\_i \\ m\_\alpha \Delta a\_i + m\_q q\_i + i\_2 p\_i r\_i - m\_a (\zeta\_0 / V\_i) (\cos \theta\_i \cos \phi\_i - \cos \theta\_0) \\ n\_\beta \beta\_i + n\_r r\_i + n\_p p\_i + n\_{pa} p\_i \Delta a\_i - i\_3 p\_i q\_i + n\_q q\_i \end{bmatrix}^T$$

$$G\_2 = \begin{bmatrix} L, M, N \end{bmatrix}^T$$

$$L = \begin{bmatrix} l\_{\delta\alpha'}, l\_{\delta r}, l\_{\delta\_{\alpha'}}, l\_{\delta\_{\alpha'}}, 0, 0, l\_{\delta\_r} \end{bmatrix}^T$$

$$M = \begin{bmatrix} m\_{\delta\_{\alpha'}}, m\_{\delta\_{\alpha'}}, m\_{\delta\_{\alpha'}}, m\_{\delta\_{\alpha'}}, m\_{\delta\_{\alpha'}}, m\_{\delta\_{\alpha'}}, m\_{\delta\_r} \end{bmatrix}^T$$

$$N = \begin{bmatrix} n\_{\delta\_{\alpha'}}, n\_{\delta\_{\alpha'}}, n\_{\delta\_{\alpha'}}, n\_{\delta\_{\alpha'}}, 0, 0, n\_{\delta\_r} \end{bmatrix}^T$$

$$f\_3(X\_i) = q\_i \cos \phi\_i - r\_i \sin \phi\_i$$

*where Xi* = *X<sup>T</sup> <sup>i</sup>*,1, *<sup>X</sup><sup>T</sup> i*,2T =(*φi*, *αi*, *βi*, *pi*, *qi*,*ri*, *θi*) <sup>T</sup> *are the roll angle, attack angle, sideslip angle, roll angular velocity, pitching angular velocity, yaw angular velocity and pitch angle of fighter i, respectively. yi* = *Xi*,1 = [*φi*, *αi*, *βi*] <sup>T</sup> *Xi*,2 = [*pi*, *qi*,*ri*] <sup>T</sup> *Xi*,3 = *<sup>θ</sup>i*. *ui* = [*δiel*, *<sup>δ</sup>ier*, *<sup>δ</sup>ial*, *<sup>δ</sup>iar*, *<sup>δ</sup>ilef* , *<sup>δ</sup>itef* , *<sup>δ</sup>ir*] <sup>T</sup> *are the left and right elevators, left and right ailerons, front and rear flaps, and rudder, respectively. Detailed explanations for the parameters and variables of this model can be found in [26]. Suppose that they are all flying at an altitude of 40,000 feet, at a speed of 0.6 Mach. The desired output for these fighters is yd* = [*yd*,1, *yd*,2, *yd*,3] <sup>T</sup> = [20, 30, 0] <sup>T</sup>*. The signal yd is only available for fighter 1.*

**Figure 5.** Communication topology for four fighters.

According to Theorem 1, we design the distributed flight controller as follows

$$\begin{aligned} \xi\_{i,1} &= G\_1^{-1}(X\_{i,1}) \operatorname{diag} \left\{ -\lambda\_{i,1} \frac{c\_{i,\phi}}{1 - c\_{i,\phi}^2}, -\lambda\_{i,1} \frac{c\_{i,\mu}}{1 - c\_{i,\mu}^2}, -\lambda\_{i,1} \frac{c\_{i,\beta}}{1 - c\_{i,\beta}^2} \right\} \\ u\_i &= G\_2^{+} \operatorname{diag} \left\{ -\lambda\_{i,2} \frac{c\_{i,\mu}}{1 - c\_{i,\mu}^2}, -\lambda\_{i,2} \frac{c\_{i,\eta}}{1 - c\_{i,\eta}^2}, -\lambda\_{i,2} \frac{c\_{i,\tau}}{1 - c\_{i,\tau}^2} \right\} \end{aligned}$$

with 
$$\begin{array}{ll} \boldsymbol{e}\_{i,\boldsymbol{\phi}} = \boldsymbol{\phi}\_{\bar{\mathbf{i}}} - \boldsymbol{q}\_{d,1'} \boldsymbol{e}\_{i,\boldsymbol{\alpha}} = \boldsymbol{a}\_{i} - \boldsymbol{q}\_{d,2'} \boldsymbol{e}\_{i,\boldsymbol{\beta}} = \boldsymbol{\beta}\_{\bar{\mathbf{i}}} - \boldsymbol{q}\_{d,3} \\ \left[ \boldsymbol{e}\_{i,\boldsymbol{p}}, \boldsymbol{e}\_{i,\boldsymbol{q}}, \boldsymbol{e}\_{i,\boldsymbol{r}} \right]^{T} = \left[ \boldsymbol{p}\_{i\boldsymbol{r}} \boldsymbol{q}\_{\bar{\mathbf{i}}}, \boldsymbol{r}\_{i} \right]^{T} - \boldsymbol{\xi}\_{i,1}^{\mathbf{r}} \end{array}$$

where *qd*,1, *qd*,2 and *qd*,3 are the signals produced by filter (6) with *yd*,*i*, *i* = 1, 2, 3 being the filter inputs, respectively. *λi*,1 = 1 and *λi*,2 = 2 for *i* = 1, 2, 3, 4, and *G*<sup>+</sup> <sup>2</sup> represents the pseudo-inverse for *G*2.

For the purposes of comparison, we use the control method of [17] under the same conditions. Following [17], the controller for the distributed flight controller is designed as follows

$$\begin{array}{l} \mathcal{G}\_{i,1} = G\_1^{-1}(X\_{i,1}) \operatorname{diag} \left\{ -\lambda\_{i,1} \varepsilon\_{i,\mathfrak{p}\prime} - \lambda\_{i,1} \varepsilon\_{i,\mathfrak{a}\prime} - \lambda\_{i,1} \varepsilon\_{i,\mathfrak{f}} \right\} \\\ u\_i = G\_2^{+} \operatorname{diag} \left\{ -\lambda\_{i,2} \varepsilon\_{i,\mathfrak{p}\prime} - \lambda\_{i,2} \varepsilon\_{i,\mathfrak{q}\prime} - \lambda\_{i,2} \varepsilon\_{i,\mathfrak{r}} \right\} \end{array}$$

where the variables and controller parameters are the same as in our proposed methods. The simulation results are then reported in Figures 6–10. In Figure 6, the dotted curves

denote the outputs of fighters under the control of the method in [17], while the solid curves denote the outputs of fighters under the control of method in this paper. It can be seen from Figure 6 that our control performance is better than [17] since the outputs of ours track the desired value more accurately. Figures 7–10 show the actions of actuators of four fighters under our method. Figure 11 show the controller performance of our method and that from [17]. In Figure 11, the blue curves denote the control efforts *E*<sup>1</sup> of the fighters with our method, while the red curves denote the control efforts *E*<sup>2</sup> of Fighters in the method from [17], where *E*<sup>1</sup> and *E*<sup>2</sup> are defined as

 $E\_k = \sqrt{\delta\_{iel}^2 + \delta\_{ier}^2 + \delta\_{ial}^2 + \delta\_{iar}^2 + \delta\_{ilcf}^2 + \delta\_{itef}^2 + \delta\_{ir}^2}$   $k = 1, 2 \text{ and } i = 1, 2, 3, 4$ 

**Figure 6.** Output of four fighters.

**Figure 7.** Actuator actions of Fighter 1.

It can be seen from Figure 11 that, initially, the control efforts of our method are greater than those in [17], and finally, there is little difference in effort between these methods, which means that the control performance of our method is better under similar control efforts.

It can be seen from these results that the consensus between the four fighters is achieved and the tracking performance is very good, while fairly good control performance is achieved.

**Figure 8.** Actuator actions of Fighter 2.

**Figure 9.** Actuator actions of Fighter 3.

**Figure 10.** Actuator actions of Fighter 4.

**Figure 11.** Control efforts.

#### **6. Conclusions**

A novel distributed consensus method was presented for a MAS with completely unknown system nonlinearities and time-varying control coefficients under a directed graph. A two-order filter for each agent was constructed, providing the desired signals and thus avoiding estimating the unknown matrix, which is related on a Laplace matrix. Combined with these filters, a global consensus method was proposed for a MAS with completely unknown system nonlinearities under a directed graph for the first time. The proposed consensus method was applied to two examples. It was shown that four highmaneuver fighters achieved angular consensus and had very good control performances using the proposed method. The two simulation results demonstrated the effectiveness of the proposed method.

**Author Contributions:** Funding acquisition, W.F.; Investigation, H.H.; Writing—original draft, Z.L.; Writing—review and editing, S.L. and Q.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by National Natural Science Foundation of China under grant 62106284 and 62176214, and also in partly funded by Nature Science Foundation of Shaanxi Province of China under grant 2021JQ-370 and also in partly funded by Xi'an Youth Talent Promotion Plan under grant 095920201309.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Formation Control Technology of Fixed-Wing UAV Swarm Based on Distributed Ad Hoc Network**

**Wenbo Suo 1, Mengyang Wang 2, Dong Zhang 2,\*, Zhongjun Qu <sup>1</sup> and Lei Yu <sup>3</sup>**


**Abstract:** The formation control technology of the unmanned aerial vehicle (UAV) swarm is a current research hotspot, and formation switching and formation obstacle avoidance are vital technologies. Aiming at the problem of formation control of fixed-wing UAVs in distributed ad hoc networks, this paper proposed a route-based formation switching and obstacle avoidance method. First, the consistency theory was used to design the UAV swarm formation control protocol. According to the agreement, the self-organized UAV swarm could obtain the formation waypoint according to the current position information, and then follow the corresponding rules to design the waypoint to fly around and arrive at the formation waypoint at the same time to achieve formation switching. Secondly, the formation of the obstacle avoidance channel was obtained by combining the geometric method and an intelligent path search algorithm. Then, the UAV swarm was divided into multiple smaller formations to achieve the formation obstacle avoidance. Finally, the abnormal conditions during the flight were handled. The simulation results showed that the formation control technology based on distributed ad hoc network was reliable and straightforward, easy to implement, robust in versatility, and helpful to deal with the communication anomalies and flight anomalies with variable topology.

**Keywords:** fixed-wing UAV; UAV swarm formation; distributed ad hoc network; consistency theory; formation obstacle avoidance

#### **1. Introduction**

The fixed-wing UAV swarm has essential application prospects and has become a current research hotspot. Completing combat missions such as coordinated reconnaissance, early warning, strike, and evaluation in the military field; and realizing disaster emergency, geological survey, and pesticide-spraying tasks in the civilian field [1–4]. Formation control is one of the key issues to achieve UAV swarm flight [5,6]. Its primary con-tents include formation maintenance, formation switching, formation obstacle avoidance, and exception handling. The distributed wireless ad hoc network [7] is the core of realizing the cluster unmanned system [8]. The formation control based on the distributed ad hoc network can better reflect the distributed, networked, and centerless characteristics of the cluster system, which is the future development trend of cluster control.

The formation control technology for UAV swarm behavior has been extensively studied. Wang [9] proposed the leader–follower method. Its basic idea is that other UAVs follow a leader UAV as followers. Luo et al. [10] and Gu et al. [11] also adopted the leader– follower method to design a control method for the leader and followers in the formation. CamPa et al. [12], based on the leader–follower method, proposed a virtual leader method. Its main idea was to treat the formation of a multi-UAV formation as a rigid virtual structure.

**Citation:** Suo, W.; Wang, M.; Zhang, D.; Qu, Z.; Yu, L. Formation Control Technology of Fixed-Wing UAV Swarm Based on Distributed Ad Hoc Network. *Appl. Sci.* **2022**, *12*, 535. https://doi.org/10.3390/ app12020535

Academic Editor: Seong-Ik Han

Received: 30 November 2021 Accepted: 30 December 2021 Published: 6 January 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

37

When the formation moved as a whole, the UAV only needed to track the movement of the fixed point corresponding to the rigid body. Li and Liu [13] designed a synchronized position tracking controller to improve the effectiveness of using a virtual structure method to maintain formation geometry. Yun and Albayrak [14] applied behavior-control methods to study the formation of multiplatform formations, such as linear formation and circular formation. Chen and Luh [15] applied behavior-control methods to achieve the purpose of object transportation. Ginlietti et al. [16] used behavior methods to define the concept of the formation geometric center. When flying in formation, each aircraft needs to maintain a prescribed distance from the geometric center, and be able to perceive the movements of other aircraft and reconstruct the formation, similar to the behavior of migratory birds in nature. Joongbo et al. [17] proposed a feedback linearization method based on consistency for multi-UAV systems to maintain a specific time-varying formation flight geometry. Glavas et al. [18] applied the consistency research strategy to study the situation when there were random communication noise and information packet loss constraints in the network, and used the polygon method based on information exchange to achieve formation control. Yasuhiro and Toru [19] studied the cooperative control problem of multi-UAV systems, and proposed a cooperative formation control strategy with collision-avoidance capability using decentralized model predictive control (MPC) and consensus-based control. Zhao et al. [20] studied the problem of formation control of multiaircraft formation with time-varying formation characteristics when there was a spanning tree in the network topology based on the consistency theory, and obtained the stability conditions of the system. Dong et al. [21] designed a distributed formation controller based on the consistency theory, proving that as long as the network topology was guaranteed to have directional strong connectivity, even if the aircraft was lost during the formation flight, the multiaircraft system could still achieve stable formation control. Seo et al. [22] designed a consistent control protocol for the situation in which the network topology had fixed connectivity, and studied the problem of cooperative formation control of multiaircraft systems forming geometric formations. Tang et al. [23] used evolutionary control theory to complete distributed collaborative control of UAV formations. He and Lu [24] proposed a decentralized design method based on a UAV distributed-formation maintenance controller, decomposing the UAV formation model into decoupled parts and associated parts using robust control methods, and improved the distributed control method of the associated system designs of the controller to control the UAV formation flying. However, most of the research content was based on the formation controller design coupled with the UAV's underlying control system. It was assumed that the UAV had a three-channel autopilot and had the ability of instantaneous response [25]. These assumptions made it difficult to apply the formation algorithm to an actual UAV swarm control. Furthermore, the research content was insufficient in the control algorithm of real flight environments such as network topology jump, communication delay, and even weak communication.

Based on the consistency theory, this paper proposes a method for formation switching and obstacle avoidance based on waypoint planning for the problem of formation control of a fixed-wing UAV swarm in a distributed ad hoc networks. The organizational structure of the paper is as follows. The first part gives a general description of the problem of a fixed-wing UAV swarm formation in a distributed ad hoc network; the second part proposes a method for switching the formation of the UAV swarm based on the consistency theory; the third part designs a UAV swarm formation obstacle avoidance algorithm; the fourth part deals with the problems of flight abnormality and communication abnormality of the UAV swarm during the flight; the fifth part simulates and verifies the formation switching of the UAV swarm, the formation obstacle avoidance, and handling of anomalies during the flight; the sixth part analyzes and discusses the results; and finally, the seventh part summarizes the article.

#### **2. Problem Formulation**

This paper focuses on two critical problems of formation switching and formation obstacle avoidance in a distributed ad hoc network for UAV swarm formation control.

#### **Definition 1.** *Formation switch in distributed ad hoc network.*

For *n* UAV, given an initial position **X***i*(0) of UAV*i*, a UAV swarm forms a communication topology in the ad hoc network (as shown in Figure 1). Plan the waypoint *Pi* = {*Pi*1, ··· , *Pik*, ··· } of the UAV*<sup>i</sup>* under the dynamic constraints of the maximum turning angle constraint *β*max and minimum route length constraint *ds*, so that the distance between UAV*<sup>i</sup>* and other UAVs reaches the expected value Δ**X***jiref* within the time *T*:

$$\left| \mathbf{X}\_{j}(t) - \mathbf{X}\_{i}(t) - \Delta \mathbf{X}\_{j \bar{i} \bar{n} \bar{f}} \right| \to 0 \quad i = 1, 2, \cdots, n \tag{1}$$

**Figure 1.** UAV swarm formation problem.

#### **Definition 2.** *Formation obstacle avoidance in distributed ad hoc network.*

For *n* UAV, given an initial position X of UAV*i*, intelligently split the UAV swarm into *Nc* formation; the number of UAVs in the *p*th subformation is *Np*. Plan the waypoint *Pi* = {*Pi*1, ··· , *Pik*, ··· } of the UAV*<sup>i</sup>* under the dynamic constraints of the maximum turning angle *β*max and the minimum direct flying distance *ds*, so that the UAVs will not collide through the rectangular obstacle avoidance area *S* containing the circular obstacle *O*, and the distance between UAV*<sup>i</sup>* the and other UAVs will eventually reach the expected value Δ**X***jiref* ; namely, reconstructed into the required formation.

Where *<sup>n</sup>* <sup>=</sup> *Nc* ∑ *p*=1 *Np*, *O* is a collection of obstacles, and *O* = {*o*1, *o*2, ··· , *ooN*}, *oN* is the

number of obstacles, each obstacle *om* can be described as a dyadic array < *Rom*, *Rm* >; *Rom* is the center point of the *m*th circle; *Rm* is the radius of the *m*th circle; *S* is the rectangular obstacle avoidance area described as a ternary array < *So*, *L*, *W* >, where *So* is the center point of the rectangular obstacle avoidance area; *L* is the length of the rectangle; and *W* is the width of the rectangle.

A reasonable formation-switching control method requires that the UAV can perform online formation switching based on the neighboring UAV information, and can meet the UAV dynamic constraints and can handle communication delays and flight anomalies, eliminating track deviation. Formation obstacle avoidance requires that a UAV swarm can effectively avoid obstacles, and can form a desired formation after the avoidance is completed. The UAV studied in this paper was a highly dynamic fixed-wing UAV with a uniform speed. The dynamic constraints were required to satisfy the minimum turning radius constraints and minimum track length constraints. The research space of this paper was the two-dimensional Euclidean horizontal plane.

#### **3. Waypoint-Based Formation-Switching Method**

The UAV studied in this paper was a highly dynamic fixed-wing UAV. Due to its high speed, it was difficult for the control system to update flight parameters in real time, and the errors were relatively large. Therefore, this article mainly considered the waypoint-based formation switching, and the online design of a small number of waypoints to complete the flight process of the UAV swarm formation, without the need to participate and change the design of the aircraft control system.

#### *3.1. Consensus-Based Design for UAV Swarm Formation Control Protocol*

The algebraic graph theory was used to describe the UAV swarm system and its behavior. Assume that the ad hoc network UAV swarm system has *n* UAVs, and each UAV is regarded as a node, then the communication relationship is seen as an edge. An undirected graph *G* = (*V*, *E*, **A**) represents the UAV swarm system, where *V* = {*s*1,*s*2, ··· ,*sn*} is a collection of nodes, *E* = (*si*,*sj*) ∈ *s* × *s*, *i* = *j* is a collection of edges, and **A** = [*aij*] *n*×*n* represents an adjacency matrix with weights. The edge of the graph is indicated by *eij* = (*si*,*sj*). For an undirected graph, UAV*<sup>i</sup>* and UAV*<sup>j</sup>* can receive the information sent by each other, namely (*si*,*sj*) ∈ *E* ⇔ (*sj*,*si*) ∈ *E*. The adjacency matrix is defined as *aii* = 0, and *aij* = *aji* ≥ 0, when *eij* ∈ *E*, *aij* > 0. The neighbor set of the node *si* is defined as *Ni* = *si* ∈ *V* (*si*,*sj*) ∈ *E* .

An undirected graph *Gn* was used to describe the communication topology relationship in the UAV swarm. At each moment *t*, the communication connection between the ad hoc network and UAV swarm forms a communication topology. For the vertex *i* of graph *G*, let *xi*(*t*) ∈ *<sup>R</sup><sup>q</sup>* and *ui*(*t*) ∈ *<sup>R</sup><sup>q</sup>* denote the state variable and state information input variable of the UAV swarm, respectively, at the time *t*. The classic first-order continuous-time consistency protocol [26] is:

$$
\dot{x}\_i(t) = u\_i(t) \quad i = 1, 2, \cdots, n \tag{2}
$$

$$u\_i(t) = -\sum\_{j=1}^{n} a\_{ij}(t) \left(\mathbf{x}\_i(t) - \mathbf{x}\_j(t)\right) \tag{3}$$

For any UAV*i*, the initial state is *<sup>x</sup>*(0) <sup>∈</sup> *<sup>R</sup>P*, when *<sup>t</sup>* <sup>→</sup> <sup>∞</sup>, there is *xi*(*t*) − *xj*(*t*) → 0, which is called the state of UAV swarm system reaching consensus.

In this paper, the consensus algorithm needed to be applied to the formation switching of the UAV swarm. The distance between UAVs needed to reach the expected value eventually. Therefore, to improve the classic first-order consistency protocol, the first-order consistency protocol with reference location information was proposed, and the specific form was as follows:

$$\dot{x}\_i(t) = u\_i(t) = \frac{\sum\_{j=1}^{m} \left( a\_{ij}(t) \left[ (\mathbf{X}\_j(t) - \mathbf{X}\_i(t)) - \mathbf{X}\_{j \text{inf}f} \right] \right)}{\sum\_{j=1}^{m} a\_{ij}(t)} i = 1, 2, \dots, n \tag{4}$$

where **X***jiref* is the expected formation relative distance between UAV*<sup>j</sup>* and UAV*i*.

#### *3.2. Consensus-Based Waypoints Planning*

#### 3.2.1. Formation Waypoints

The UAV speed studied in this paper was constant, and the UAV swarm formation was controlled based on the route. Therefore, the consistency control protocol in Equation (4) needed to be discretized. First, the communication time of the UAV swarm was discretized, then the position status of each UAV was updated in real time according to the difference equation, and the discrete consistency control protocol can be given by:

$$\mathbf{X}\_{i}[k+1] = \mathbf{X}\_{i}[k] + \Delta \mathbf{X}\_{i}[k] + \mathbf{D} \tag{5}$$

$$\Delta \mathbf{X}\_{i}[k] = \frac{\sum\_{j=1}^{m} \left( a\_{ij}[k] \left( (\mathbf{X}\_{j}[k] - \mathbf{X}\_{i}[k]) - \mathbf{X}\_{j \text{irref}} \right) \right)}{\sum\_{j=1}^{m} a\_{ij}[k]} \tag{6}$$

where *k* represents a communication event, which is a formation process of the UAV swarm formation waypoint; **D** is the distance required to switch the formation, and is selected according to actual needs; **X***j*[*k*] and **X***i*[*k*] are the position status of the UAV*<sup>j</sup>* and UAV*<sup>i</sup>* at time *k*, also called formation waypoint; *aij*[*k*] ∈ *R* × *R* is the adjacency weight matrix of the communication topology, and its elements are defined as:

$$a\_{i\bar{j}} = \begin{cases} 1 & \begin{pmatrix} v\_{j\cdot} v\_{\bar{i}} \end{pmatrix} \in E \\ 0 & \begin{pmatrix} v\_{j\cdot} v\_{\bar{i}} \end{pmatrix} \notin E \end{cases} \tag{7}$$

The undirected graph in this article did not allow self-loops, so *aij* = 0.

During the flight, UAV*<sup>i</sup>* established a communication connection with other UAVs in the ad hoc network, forming a formation waypoint **X***i*[*k*] within each communication event *k* according to Equations (5) and (6).

According to the consistency theory, it can be proved that for any UAV*<sup>i</sup>* with an initial value **<sup>X</sup>***i*[0] ∈ *<sup>R</sup>P*, the time-varying communication topology union is fully connected in a formation transformation [27], namely *Nk* ∑ *k*=0 *aij*[*k*] > 0 (*i* = *j*), when *k* → *Nk* , there is

 Δ**X***ji*[*k*] − Δ**X***jiref* [*k*] <sup>→</sup> 0; that is, the relative distance between the UAV swarm reached the desired value, and the formation switch was realized. If the initial state of the communication topology *aij*[0] was full connectivity; namely, the communication was in the normal state, the formation fly could be achieved as the expected formation when *k* = 0.

#### 3.2.2. Flying to Formation Waypoint in Consistent Time

Once the formation waypoints were obtained, only local waypoints from the current position of the UAV*<sup>i</sup>* to its formation waypoint needed to be designed so that the flight time of each UAV was equal. In this section, a distributed UAV swarm flying method based on time consistency is proposed, and local waypoints are designed under dynamic constraints to achieve UAV swarm formation switching when flying at a constant speed.

#### (1) Dynamic constraints

Maximum turning angle constraint: when the UAV turns according to the waypoint, the turning angle *β* (as shown in Figure 2) needs to be lower than the maximum turning angle *β*max:

$$
\beta \le \beta\_{\max} \tag{8}
$$

**Figure 2.** Dynamic constraints.

Minimum route length constraint [28]: assuming that the UAV was flying under the available overload at the turn, the turning radius *R* was a fixed value. As shown in Figure 2, the turning angle of the UAV in this section of the route was *β*. For the UAV to make two consecutive turns, the route of this section needed to be lower than the minimum track length constraint *ds*:

$$d\_{\delta} = 2 \times \mathbb{R} \times \tan \frac{\beta}{2} \tag{9}$$

#### (2) Waypoint design

When a UAV flew to formation waypoint, we first took the UAV that was going to fly the longest path as the time base, and then planned for other UAVs to fly around in the horizontal plane so that the final time to reach the formation waypoint was consistent.

The longest path *C*max is:

$$\mathcal{C}\_{\text{max}} = \text{max}(\mathcal{C}\_i) + d\_y \tag{10}$$

where *dy* is the vertical flight distance margin to ensure that the vertical distance meets the constraint of the minimum route length *ds*. The turning angle of this method is a fixed value *β* = 90◦, so we set *dy* = 2 × *ds* = 4 × *R*, *Ci* as the route distance that the UAV*<sup>i</sup>* needed to fly:

$$C\_i = \Delta x\_i + \Delta y\_i + \Delta t\_i \times V \tag{11}$$

where Δ*xi* and Δ*yi* are the horizontal distance and vertical distance of UAV*<sup>i</sup>* from the current position to the formation waypoint; *V* is the flight speed of the UAV; and Δ*ti* is the waiting time of UAV*i*, which can be selected according to the actual project. If it was a formation switch scenario, Δ*ti* = 0. If it was a formation assembly scenario, Δ*ti* was the launch interval of UAV*<sup>i</sup>* from the first UAV.

The dynamic constraints of the flight plan needed to consider the maximum turning angle constraint and the minimum route length constraint. The flight project was divided into the following sections:

As shown in Figure 3, the flight project was composed of four right-angle turning sections. The solid line is the planned route, which was sections <sup>1</sup> , <sup>2</sup> , <sup>3</sup> , <sup>4</sup> , and <sup>5</sup> . The dashed line is the actual flight route considering the UAV turning process. The solid point is the UAV waypoint. To make the flying distance the same, the length of the distances of <sup>1</sup> , <sup>2</sup> , <sup>3</sup> , <sup>4</sup> , and <sup>5</sup> were designed as:

**Figure 3.** Dynamic constraints.

$$\begin{cases} L\_{1x,i} = L\_{5x,i} = r \\ L\_{3x,i} = \Delta x\_i - 4 \times r \\ L\_{2y,i} = L\_{4y,i} = \frac{\mathbb{C}\_{\text{max}} - \Delta x\_i - \Delta y\_i}{2} \end{cases} \tag{12}$$

where *L*1*x*,*i*, *L*3*x*,*i*, and *L*5*x*,*<sup>i</sup>* are the horizontal distances of UAVi in segments <sup>1</sup> , <sup>3</sup> , <sup>5</sup> ; and *L*2*y*,*i*, *L*4*y*,*<sup>i</sup>* are the vertical distances of UAVi in segments <sup>2</sup> and <sup>4</sup> , respectively.

The local waypoints of the UAVi could be calculated according to the starting waypoint and the distance lengths of <sup>1</sup> , <sup>2</sup> , <sup>3</sup> , <sup>4</sup> , and <sup>5</sup> :

$$\begin{cases} P\_{i1}(\mathbf{x}\_{i1}, y\_{i1}) = P\_{i0}(\mathbf{x}\_{i0} + L\_{1x,i}, y\_{i0}) \\ P\_{i2}(\mathbf{x}\_{i2}, y\_{i2}) = P\_{i1}(\mathbf{x}\_{i1}, y\_{i1} + L\_{2y,i}) \\ P\_{i3}(\mathbf{x}\_{i3}, y\_{i3}) = P\_{i2}(\mathbf{x}\_{i2} + L\_{3x,i}, y\_{i2}) \\ P\_{i4}(\mathbf{x}\_{i4}, y\_{i4}) = P\_{i3}(\mathbf{x}\_{i3}, y\_{i3} - L\_{4y,i}) \end{cases} \tag{13}$$

At this point, all local route points could be obtained. The UAVs could arrive at the formation waypoints at the same time, thus forming the expected formation.

In this section, a waypoint-based distributed ad hoc network formation-switching control method, derived from the consistency theory, only needed to obtain the formation waypoint from the position information of the UAVs, and then plan the local waypoint from the current waypoint to the formation waypoint under dynamic constraints. The UAV only needed to reach the online planned waypoint under the control of its flight control system. It did not need to call the UAV's control system to track flight parameters in real time, and only required the design of four local waypoints. This method has a small amount of calculation, is practical and straightforward, and is conducive to implementation in engineering. It can also realize dynamic formation control of UAVs when some communication networks are lost and the topology structure changes.

#### **4. Waypoint-Based Formation Obstacle Avoidance Algorithm**

The traditional formation obstacle avoidance method uses an artificial potential field method for obstacle avoidance or an intelligent algorithm to search for waypoints. Still, considering the real-time nature of obstacle avoidance and engineering applications, the artificial potential field method needs to participate in the design of the control system. When facing a high-dynamics UAV, it is challenging to update flight parameters in real time, as it presents significant errors and low reliability. Using the intelligent algorithm to plan trajectories, if high precision is required, planning the trajectory of a single UAV is still too slow, and if there are many planned waypoints, it cannot meet the dynamic constraints of the UAV.

In this paper, the intelligent path search algorithm was used to search for obstacle avoidance channels through which UAVs could pass, and the UAV swarm was divided into multiple smaller formations according to the number of UAVs that could pass through the avoidance channel.

#### *4.1. Consensus-Based Design for UAV Swarm Formation Control Protocol*

Obstacle-avoidance principles include the A\* algorithm and the smallest enclosing convex polygon of a set of points (SECP) decision principle.

#### (1) A\* algorithm

The A\* algorithm [29,30] is the most effective direct search method for solving the shortest path in a static road network. We only needed to search for the formation channel between the obstacle circles, and there was no need to search for high-precision track points, so we could use the traditional A\* algorithm:

$$f(o) = \mathcal{g}(o) + h(o) \tag{14}$$

where *f*(*n*) is the cost estimate from the initial state to the target state via state *n*; *g*(*n*) is the actual cost from the initial state to state *n* in the state space; and *h*(*n*) is the estimated cost of the best path from state to the target state. The shortest path could be determined by searching from the starting point according to the valuation function in Equation (14) to the ending point.

(2) SECP determination method

Given a plane point set *A* = {(*xi*, *yi*)|*xi*, *yi* ∈ *R*}, we determined the smallest enclosing convex polygon of a set of points *Ai* (SECP): we connected the points two by two to form a line segment set *S* = (*Ai*, *Aj*) *Ai*, *Aj* ∈ *A*, *i* = *j* . If all the other line segments were on the side of the line where a line segment *Sconvex* was located, then the line segment *Sconvex* where this line (e.g., the dotted line in Figure 4) lay was a side of the required convex polygon SECP.

**Figure 4.** Determination method.

#### *4.2. Formation Obstacle Avoidance Algorithm*

To obtain multiple formation passage paths, firstly, it was necessary to determine the entry points and the exit points according to the SECP determination method; secondly, to determine the formation obstacle avoidance path in combination with the A\* search method; and finally, to determine the obstacle avoidance waypoint of the UAV swarm. The algorithm flow is shown in Figure 5; the specific steps were as follows:

**Figure 5.** Principle flow chart of obstacle avoidance algorithm.

Step 1. Determine entry and exit points.

As shown in Figure 6, we connected the center points of circles two by two to get the connecting line set *CirSeg* ={*CirSegmentk*|*k* = 1, 2 ··· , *Nk*}, and obtained the entry and exit line segments according to the SECP determination method, and further obtained the in-point set *EnPoint* and the out-point set *ExPoint*. The specific process of the algorithm was as follows Algorithm 1:

**Figure 6.** Determination of entry and exit points.

**Input:** *O*: Obstacle collection *S*: Obstacle avoidance area **Output:** *ExPoint* = *ExitPointj <sup>j</sup>* <sup>=</sup> 1, 2 ··· *EnPoint* = {*EntryPointi*|*i* = 1, 2 ··· } 1: *CirSeg*← connect the two-point in *Ro* = {*Rom*|*m* = 1, ··· , *n*} 2: *MidLine*← connect the maximum and minimum of point *Rom* in *y*-direction 3: **for** *k* = 1 **to** *Nk* **do** 4: *CirLinek*←*CirSegmentk* 5: **if** {CirSeg − *CirSegmentk*} in the same side of *CirLinek* **then** 6: *CirSegmentk* ∈ *SCEP* (e.g., SCEP in Figure 6) 7: *PathSegement*← {*CirSegmentk* − *CirSegmentk* ∩ *O*} 8: **if** *PathSegement* on the left side of *MidLine* **then** 9: *EnPoint*← the middle point of *PathSegement* (e.g., *A* in Figure 6) 10: **else** *ExPoint*← the middle point of *PathSegement* (e.g., *B* in Figure 6) 11: **end if** 12: **end if** 13: **end for** 14: **return** *EnPoint*, *ExPoint*

Step 2. Determine formation avoidance path.

As shown in Figure 7, we combined the entry and exit points set *EnPoint ExPoint* from Algorithm 1 and the processed channel segment *CirSeg* (e.g., *M* in Figure 7) to find the obstacle avoidance path set *AvoidPath* with the A\* method. The specific process of the algorithm was as follows Algorithm 2:

**Figure 7.** Determination of the formation avoidance path.

```
Algorithm 2. Get the avoidance path
Input: O, S, ExPoint, EnPoint
Output: AvoidPath =

                      AvoidPath(i→j),q

                                     q = 1, 2, ··· , Nq
1: for k = 1 to Nk do
2: if CirSegmentk ∩ O then erase CirSegmentk
3: else if CirSegmentk ∩ {CirSeg − CirSegmentk} then erase CirSegmentk
4: end if
5: end for
6: use O and S initialize A* map
7: for i = 1 to Ni do
8: for j = 1 to Nj do
9: search Pathi→j from EntryPointi to ExitPointj by A* (e.g., S in Figure 7)
10: for k = 1 to Nk do
11: if Pathi→j ∩ CirSegmentk then
12: Pathsegement← {Csegmentk − Csegmentk ∩ O}
13: AvoidPath(i→j),q← the middle point of Pathsegement
14: end if
15: end for
16: end for
17: end for
18: return AvoidPath (e.g., the blue line in Figure 7)
```
Step 3. Determining the UAV formation obstacle avoidance waypoint.

As shown in Figure 8, we extended *AvoidPath* to the boundary of the obstacle avoidance area (e.g., *P* in Figure 8), deleted the formation obstacle avoidance path that did not meet the UAV dynamic constraints, and calculated the number of UAVs that the path of the *i*th entry point could pass through:

**Figure 8.** Determining the UAV formation obstacle avoidance waypoint.

$$n\_{i \to j} = \min\left( \left\lfloor \omega \times \frac{L\_{(i \to j),q} - 2 \times d\_{saf\varepsilon}}{d} \right\rfloor + 1\right) q = 1, 2, \dots, Nq \tag{15}$$

$$\begin{cases} n\_i = \max\_{i \to j} j = 1, 2, \dots, Nj \\\ j \max = \operatorname\*{argmax}(n\_i) \\\ N\_{\text{pass}} = \sum\_{i=1}^{Ni} n\_i \end{cases} \tag{16}$$

where *d* is the vertical interval distance of the UAV, *dsaf e* is the safe distance of the UAV, *L*(*i*→*j*),*<sup>q</sup>* is the length of the *q*th formation channel segment from the *i*th formation path to

*j*th formation path, *nj* is the number of UAV that can be passed by the *j*th formation path, *ω* is the scaling scale, and 0 < *ω* ≤ 1 is to adjust the number of UAVs through the channels.

Then, we calculated the obstacle avoidance waypoint *PathPoint* of UAV*<sup>i</sup>* based on *AvoidPath* obtained from *ni* and Algorithm 2; the specific process was as follows Algorithm 3:

**Algorithm 3.** Get fly path point

**Input**: *O*, *S*, **X**(0), *AvoidPath*, *v* **Output**: *PathPoint* = {*PathPointuavi*|*uavi* = 1, ··· , *N*} 1: extent *AvoidPathi*→*<sup>j</sup>* to *S* 's boundary (e.g., *P* in Figure 8) 2: **if** *AvoidPathi*→*<sup>j</sup>* do not satisfy constraints *Rres* and *ds* **then** erase 3: **end if** 4: calculate *ni* by Equation (15) 5: *CirLinek*←*CirSegmentk* 6: split UAV swarm into *Ni* sub-forms 7: *PathPoint*← calculate *PathPointuavi* based *AvoidPath*(*i*→*jmax*),*<sup>q</sup>* 8: **if** *Npass* ≤ *N* **then** 9: *PathPoint*← *<sup>N</sup>* <sup>−</sup> *Npass* UAV fly around the obstacle area 10: **end if** 11: **for** *uavi* = 1 **to** *N* 12: *Tuavi* = |*PathPointuavi*|/*v* 13: **end for** 14: *PathPoint*← use formation conversion algorithm with *Tuavi* and *PathPoint* 15: **return** *PathPoint*

#### **5. Handling Exceptions**

This paper dealt with two kinds of abnormal situations: flight abnormity and communication abnormalities during the formation flight of distributed ad hoc network UAV swarm. The processing methods of each category were shown in Figure 9, and the detailed processing methods were shown below.

**Figure 9.** Handling exceptions.

#### *5.1. Flight Abnormity*

For the abnormal situation in which the tracking error was too large, the abovementioned distributed ad hoc network online formation switch algorithm could be used for dynamic adjustment, and had the characteristics of not being related to the initial position of the UAV. In response to the falling of the UAV during the flight, the remaining

UAVs switched formation online using the formation switch algorithm; that is, formation reconstruction.

#### *5.2. Communication Abnormalities*

Communication abnormalities mainly considered three situations. First, the excessive communication delay included two cases, which were the entire UAV swarm's delay and some members' delay of the UAV swarm. Second, the communication was completely disconnected; that is, the other UAV communication links in the UAV swarm were completely disconnected and could not be restored. Third, the communication was partially interrupted; that is, the UAV communication link of other parts of the UAV swarm was temporarily disconnected, and it could be restored after some time.

If the communication of some UAVs was completely disconnected, this could be regarded as the situation of UAV formation to reconstruct the formation. If there was a delay in communication, the offset error could be eliminated according to the online formationswitching algorithm. If the communication of some UAVs was temporarily interrupted, multiple iterative formation flights could be performed based on the distributed ad hoc network online formation-switching algorithm.

#### **6. Simulation Analysis**

In this paper, 12 UAVs were used for formation assembly, switching, and formation flight simulation under abnormal flight and communication environments, and 8 UAVs were used for formation obstacle avoidance simulation. In the simulation experiment, the algorithm was programmed in the C++ language, the platform tool was Microsoft Visual Studio 2016, and the hardware environment was a PC with an inter-core i5-4210 CPU, 2.60 GHz dual-core processor, and 8 GB memory.

#### *6.1. Formation Assembly and Formation Switching*

The 12 UAVs took off in succession. After each UAV passed the assembly point A, the formation-switching algorithm began to take over the UAV, planning the trajectory of the UAV; the formation was assembled into an inverted V-shape; and then the formationswitching algorithm was enabled again to change the swarm into a V-shaped formation, as shown in Figure 10. The simulation parameters that had to be set are shown in Table 1. The simulation results of formation assembly and formation switching in the horizontal plane are shown in Figures 11 and 12.

**Figure 10.** UAV swarm formation assembly and formation switching.



**Figure 11.** Formation assembly and formation switching.

**Figure 12.** Assembly formation and switch formation.

The assembly formation was V-shaped, the formation switching was changed from V-shaped to column formation, the vertical formation interval in the y-direction was 100 m, and other parameters remain unchanged. The simulation is shown in Figures 13 and 14.

**Figure 13.** Formation assembly and formation switching—scene 2.

**Figure 14.** Assembly formation and switch formation—scene 2.

It can be seen in Figures 12 and 14 that when the communication topology of UAV swarm was fully connected, formation aggregation and formation switching could make the distance between each UAV meet the desired requirements after a formation interval correction; namely, the communication event was *k* = 0. If the communication topology was partially interrupted, the formation interval had to be corrected again. This method could simply and quickly converge to the final expected formation interval value. It can be seen in Figures 11 and 13 that this formation method could quickly plan the route to the formation waypoints under the constraints of the UAV dynamics, and has a high engineering application value.

#### *6.2. Formation Obstacle Avoidance*

The obstacle avoidance area was rectangular, and the relevant parameters are shown in Table 2. Eight UAVs formed a column formation to enter the obstacle avoidance area. The obstacle information is shown in Table 3. The initial position information of UAVs is shown in Table 4. The maximum turning angle constraint was 90◦, and the minimum direct flight distance constraint was calculated according to Equation (9). The UAV swarm passed through the obstacle and was reconstructed into a V-shape. Other parameters of UAV swarm are shown in Table 1. The horizontal plane simulation results are shown in Figure 15.

**Table 2.** Obstacle avoidance area parameters.


**Table 3.** Obstacle parameters.


**Table 4.** Initial position of UAV.


**Figure 15.** UAV formation obstacle avoidance.

We reset obstacles for simulation, and the obstacle information is shown in Table 5. Other parameters remain unchanged; the horizontal plane simulation results are shown in Figure 16.


**Table 5.** Obstacle parameters—scene 2.

**Figure 16.** UAV formation obstacle avoidance—scene 2.

As can be seen in Figures 15 and 16, the UAV swarm could be intelligently divided into multiple smaller formations to pass obstacles or fly around to avoid obstacles. Finally, it could be reconstructed into the expected formation. The simulations showed that the algorithm could quickly and flexibly plan the cooperative obstacle avoidance path and had the ability of online formation to avoid obstacles.

#### *6.3. Handle Exceptions*

#### 6.3.1. Flight Abnormity

Suppose the number of UAV swarm was 12, and the assembly formation was an inverted V-shaped formation. After the formation assembly, four UAVs were randomly dropped or lost. Then, the remaining eight UAVs were reconstructed into a column formation. Other simulation parameters were still as shown in Table 1, and the simulation results are shown in Figures 17 and 18:

**Figure 17.** Formation reconstruction.

**Figure 18.** Formation reconstruction process.

It can be seen in Figures 17 and 18 that after the UAV flight abnormally fell, then the required formation could be reconstructed according to this plan.

#### 6.3.2. Communication Abnormity

In actual flight, the communication topology may be temporarily interrupted. At this time, a communication event, *k* = 0, is insufficient to meet the expected formation, and formation interval distance correction is required again.

In this simulation, a formation switch was performed from an inverted V-shape to a V-shape. It was assumed that in the initial communication event, namely *k* = 0, the communication failure rate reached 18%, and in the second communication event, namely *k* = 1, the communication failure rate reached 9%. Other parameters were as shown in Table 1, and the results are shown in Figures 19 and 20.

**Figure 19.** Formation switch under abnormal communication.

**Figure 20.** Formation change process under abnormal communication.

It can be seen in Figures 19 and 20 that in the case in which some UAVs and other UAVs were lost in the communication network, the first formation interval distance correction could not obtain the desired formation. In the second correction, the communication network still had local faults, but eventually formed the expected formation. So, the formation-switching algorithm was effective in dealing with communication abnormity.

#### *6.4. Simulation Comparison*

#### 6.4.1. Formation-Switching Method

The formation-switching control method proposed in this paper mainly dealt with the communication abnormalities of the UAVs during the flight. Therefore, the classic consistency control method [31], which could handle changes in the communication topology, was selected to compare to illustrate the effectiveness of the algorithm.

We selected six UAVs for simulation, and used the simulation scenario in Section 6.3.2 to set the consistency control step to meet the minimum algorithm stability requirement, which was 0.2 s. The simulation of the classical consistency control method is shown in Figure 21, and the method proposed in this paper is shown in Figure 22.

**Figure 21.** Formation switching of the classical consistency control method.

**Figure 22.** Formation switching.

The algorithm performance during the simulation process was recorded, as shown in Table 6.



The simulation results showed that when the step length was 0.5 s, in the case of abnormal communication, the classic consistent formation control method needed to communicate with neighboring UAVs 200 times before the formation could be changed from an inverted V to a V-shape. The route-based formation-switching method proposed in this paper only needed two communications to obtain the required formation waypoints, which greatly reduced the pressure on the airborne communication system.

In addition, the classic consensus algorithm obtained the coordinated route of the UAV by controlling the deflection angle of the track. The average solution time per communication took a longer time, and required high control system performance, which could not guarantee the real-time performance of online formation flying. The solution proposed in this paper was time-consuming, and only required the UAV to perform direct flight and turning maneuvers. It was easy to control during the flight and easy to implement in engineering. However, the disadvantage was that the planned path distance was relatively long.

#### 6.4.2. Formation Obstacle Avoidance Method

The sparse A \* algorithm [30] was selected and compared with the formation obstacle avoidance algorithm proposed in this paper to illustrate the effectiveness of the algorithm. The simulation scenario used scenario 2 as given in Section 6.2. The sparse A\* simulation results are shown in Figure 23.

**Figure 23.** The sparse A \* algorithm for formation obstacle avoidance.

In order to analyze the complexity of the two algorithms, the number of UAVs in the formation was increased successively, and the two algorithms were used to simulate the formation obstacle avoidance simulation. The relationship between the algorithm running time and the number of drones was obtained as shown in Figure 24.

**Figure 24.** The algorithm running time.

We concluded from Figures 16 and 24 that the sparse A\* algorithm was basically the same as the obstacle avoidance path planned by the algorithm in this paper, but the running time of the sparse A\* algorithm was much longer than that of the algorithm in this paper, and it doubled with the increase in the number of UAV formations. However, the obstacle avoidance algorithm proposed in this paper based on geometry had a shorter running time and did not change significantly with the increase in the number of UAVs. It could meet online real-time trajectory planning, and has strong engineering applicability.

#### **7. Conclusions**

This paper proposed a method of fixed-wing UAV swarm formation control based on distributed ad hoc networks. This method included the formation switching of UAVs, formation obstacle avoidance, and handling of abnormal conditions during flight.

Simulation results showed that in formation switching, the ad hoc network, highdynamics, fixed-wing UAV could form formation switching only by position information. The method was not sensitive to the initial position information of the UAV, which could eliminate errors during flight, and handle temporarily interrupted communication topologies and UAV drop, as well as other abnormal flight situations. In formation obstacle avoidance, the UAVs could be clustered into multiple subformations to pass through the obstacle avoidance area, and could be reconstructed as required formation. The formation technology was designed based on the waypoints, which was versatile, simple, and reliable, and is easy to realize in engineering.

Compared with the classic consistent formation control algorithm and obstacle avoidance algorithm, it was shown that the formation technology method proposed in this paper had lower complexity and higher timeliness, and is suitable for online formation flying of highly dynamic UAVs in an ad hoc network. However, the route planned by the method did not consider optimality, and it was only suitable for high-speed and constant-speed formation missions. For obstacle avoidance algorithms, the obstacle model is too simple, and there may be scenarios for obstacle avoidance that are not covered by the algorithm. Next, we will consider the optimal route planning problem and extend the algorithm to three-dimensional flight scenes to increase its practicability.

#### **8. Patents**

Suo, W.B., Zhang, D., and Wang, M.Y. "A distributed unmanned aerial vehicle flying around formation method based on time consistency", C.N. Patent, 202010226440.4, issued 10 July 2020.

Zhang, D., Suo, W.B., and Wang, M.Y. "A distributed unmanned aerial vehicle dynamic formation switching method based on waypoints", C.N. Patent, 202010226439.1, issued 10 July 2020.

**Author Contributions:** Conceptualization, W.S. and D.Z.; methodology, W.S.; software, W.S.; validation, W.S.; formal analysis, W.S.; investigation, W.S. and D.Z.; resources, D.Z. and Z.Q.; data curation, W.S., M.W. and L.Y.; writing—original draft preparation, W.S.; writing—review and editing, W.S.; visualization, W.S.; supervision, D.Z.; project administration, Z.Q.; funding acquisition, Z.Q. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available upon request from the corresponding author.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Distributed Grouping Cooperative Dynamic Task Assignment Method of UAV Swarm**

**Boyu Qin 1,2, Dong Zhang 1,2,\*, Shuo Tang 1,2 and Mengyang Wang 1,2**


**Abstract:** Aiming at the problem of UAV swarms with distributed subsets performing cooperative reconnaissance-and-attack tasks on multi-targets in complex and uncertain combat scenarios, a distributed grouping cooperative dynamic task assignment method is proposed based on extended contract network protocol. The dynamic task assignment model for the UAV swarm with the topology of distributed subsets is established considering multiple constraints such as task cooperation, performing sequence, dynamic environment, communication topology, payload model, and UAV capability. According to the characteristics of multi-participants and multi-tasks in the process of UAV swarm executing tasks, the determination mechanism on cooperators and the selection mechanism of sequential tasks are proposed, and then the contract network protocol is extended. On the basis of the above, an event-triggered task assignment strategy for dynamic tasks is designed. The simulated results show that the proposed method can achieve the cooperative dynamic assignment of the UAV swarm to perform reconnaissance-and-attack tasks to multi-targets in complex and uncertain combat scenarios, improve the adaptiveness of the swarm under the sudden circumstance, and realize the optimization for task execution efficiency of the UAV swarm.

**Keywords:** swarm control; distributed swarm; dynamic task planning; task assignment; event-trigger

#### **1. Introduction**

The advances of intelligent autonomous systems have led UAV swarm technology and its application to the current scientific research hotspot [1,2]. Cooperative task assignment stands out as an essential component and a precondition of task accomplishment and autonomous control of UAV swarm systems [3].

Cooperative task assignment is to assign a considerable number of different types of subtasks and their order to each UAV in the swarm while meeting the task requirements with UAV capabilities and the multiple constraints involved. In the past few decades, there have been several main sorts of assignment algorithms for UAV swarm cooperative task assignment: the heuristic algorithm and the market-based algorithm, for example [4]. Heuristic algorithms generally search a certain range of solution space in an acceptable time by simulating natural phenomena to obtain feasible solutions to optimization problems. Common methods include the genetic algorithm [5–8], the particle swarm optimization algorithm [9], the ant colony algorithm [10,11], and the wolf swarm algorithm [12]. For UAV swarms with distributed architecture, heuristic algorithms always need to obtain global information. This process consumes lots of communication and computing resources as well as time to achieve global consensus; it is not suitable to apply heuristic algorithms. Compared with heuristic algorithms, market-based approaches (such as contract network protocol), distinctively characterized by distributed computing of swarms, requires only local information of the swarm and have the advantages of flexibility, robustness, and high operation speed [13] and, hence, are more suitable for distributed UAV swarms. Meanwhile, with strong scalability, market-based approaches can well handle cooperative

**Citation:** Qin, B.; Zhang, D.; Tang, S.; Wang, M. Distributed Grouping Cooperative Dynamic Task Assignment Method of UAV Swarm. *Appl. Sci.* **2022**, *12*, 2865. https:// doi.org/10.3390/app12062865

Academic Editors: Vincent A. Cicirello and Seong-Ik Han

Received: 18 January 2022 Accepted: 9 March 2022 Published: 10 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

task assignments with complex constraints such as limited communication [14] and time windows [15] and have been applied to dynamic task assignments [16–18]. The algorithms adopted are a part of the assignment approaches for the task assignment in complex and changeable combat scenarios, which are usually modeled as constraints or objectives from different perspectives.

The crucial topics to be investigated in this field include cooperative task assignment with the trajectory coupled [19,20] as well as under dynamic resistant circumstances [21–23]. In [24], a scheme to assign tasks in UAV swarms based on the contract network protocol (CNP) is presented, in which an A\* algorithm is applied in flight path planning and path length estimation with a no-fly zone and threat considered. Under this scheme, the coupling between task assignment and flight path planning can be solved. However, their work does not take into consideration either pop-up missions or UAV faults. A study [25] has proposed a novel model for UAV coalition and an algorithm derived from basic geometry that generates a path derived from the original Dubins curve for application in remote sensing missions of fixed-wing UAVs. Another study [26] proposed an unmanned air vehicle (UAV) swarm task and a resource dynamic assignment algorithm based on the task sequence mechanism. By establishing a task sequence, each UAV strictly separates the necessary task time and synchronization waiting time. For the newfound targets, each UAV quickly determines its available time period. According to the available time and task resources, an auction algorithm and a consensus algorithm are used to decompose the task assignment into the initial distributed assignment phase and the swarm consensus phase to develop realtime conflict-free task solutions for UAV swarms. However, their work does not take into account the communication topology and time constraints. In [27], a CNP-based approach to a multi-UAV task assignment is proposed, in which a flight path planning method based on PH curves is combined with cooperative particle swarm optimization (PSO), cooperative variables, and cooperative functions to achieve attack synchronization on certain targets. Nevertheless, it does not take into account no-fly zones, threats, communication topology, and time constraints in addition to the task reassignment in the case of UAV faults. On the basis of this, [27,28] introduced local communication constraints with communication distance and information hop times to determine whether other UAVs participate in the local task assignment; however, it neglected the communication constraint caused by the swarm specific topology, which is crucial for certain command structures so as to improve operational effectiveness.

Compared with [21–28], the problem investigated in this paper is the dynamic task assignment for the heterogeneous UAV swarm consisting of distributed subsets with specific topology. In this paper, common constraints such as time windows, the UAV capability model, as well as new constraints such as topology constraints are combined to build the complex model of dynamic task assignment. The key contribution of this paper is that it proposes a solution to rectify the problem of heterogeneous UAV swarm cooperative dynamic task assignments with specific hierarchical communication topologies and other multiple constraints. An extended-CNP-based distributed assignment approach is proposed, along with distributed heterogeneous UAV swarm executing reconnaissance and attack tasks as the main scenario. The swarm consists of several subsets. On the swarm discovering new targets, it firstly assigns each target to subsets according to communication topology, and then each subset assigns subtasks to the UAVs within the group. The modified artificial potential field method is adopted to preplan the threat avoidance flight path, and battlefield survivability and fuel consumption are introduced to describe the flight path's impact on the task assignment. Correspondingly, the consumption penalty and threat penalty functions can be designed so as to solve the coupling between task assignment and path planning. Meanwhile, constraints on time and cooperation are introduced to adjust the task executing sequence.

The rest of paper is organized as follows. The description of distributed grouping UAV swarm task assignment problem with multiple constraints is presented in Section 2. In Section 3, combined with hierarchical communication topologies of the UAV swarm, the distributed assignment algorithm based on extended contract network protocol is thoroughly addressed. In Section 4, the dynamic cooperative task assignment scheme based on an event-trigger strategy is proposed. Section 5 demonstrates the approach's effectiveness, both in uncertain environments and with UAV failure, by numerical examples, and the whole work is concluded in Section 6.

#### **2. Problem Description**

#### *2.1. Mission Scenario Analysis*

There are heterogeneous UAV swarms consisting of distributed subsets in the mission area. Each subset is composed of several reconnaissance UAVs, attack UAVs, and reconnaissance-attack UAVs. The procedure of task assignment includes target allocation and subtask assignment. The swarm accomplishes the initial assignment, and then the UAVs cooperatively perform tasks. If a UAV discovers a new target, it relays the target information to others within a limited range according to hierarchical communication topologies, which triggers dynamic task assignment and then updates each UAV's task sequence.

#### *2.2. Hierarchical Communication Topology*

The UAV swarm is divided into several distributed subsets. Communication exists within each subset and among subsets, hence hierarchical communication topology is established. In the application process, the swarm can cooperate to assign and complete tasks through the two-layer mechanism of inter-group cooperation and intra-group coordination based on hierarchical communication topology. The structure of the distributed subsets adopted integrates the advantages of both centralized structure and fully distributed structure to realize "global centralization and local autonomy", which avoids the problems of low redundancy and the heavy central load of the fully centralized structure as well as the disadvantages of high individual capability requirements, communication complexity, and the command conflicts of the fully distributed structure. The structure conforms to the actual combat scenario as well as the development status of UAV swarm technology and will become more normalized and practical [29–32].

The algebraic graph theory is used to describe the internal interaction of the UAV swarm system. Assume that the UAV swarm has *N* UAVs, and each UAV is regarded as a node, then the communication relationship is seen as an edge. A directed graph *G* = {*V*, *E*, *W*} which consists of the node set *V* = {*v*1,*v*2, ... ,*vN*}, the edge set *<sup>E</sup>* <sup>⊆</sup> (*vi*, *vj*) : *vi*, *vj* ∈ *V*, *i* = *j* and the adjacency matrix *<sup>W</sup>* = [*wij*] <sup>∈</sup> <sup>R</sup>*N*×*N*, with nonnegative entries *<sup>w</sup>ij*. The entries in *<sup>W</sup>* are defined with *<sup>w</sup>ij* = 1 for (*vi*, *vj*) ∈ *<sup>E</sup>* and *<sup>w</sup>ij* = <sup>0</sup> otherwise. In addition, *<sup>w</sup>ii* = 0 for all *<sup>i</sup>* = 1, 2, ··· , *<sup>N</sup>*. The neighbor set of node *vi* is described as *Ni* = *vj* : (*vi*, *vj*) ∈ *E* .

There is a top leader, *Nm* group leaders, and *Nf* followers in UAV swarms with distributed subsets, as shown in Figure 1. Each subset *i* has *Nfi* followers. The top leader is the highest leader node and leads the initial task assignment. The group leader, which obtains information of each UAV in the subset, is the leader node of the subset, participates in target allocation on behalf of the subset, and leads subtask assignment; the followers are members of the subset and participate in subtask assignment and execution.

Each node is numbered in the UAV swarm, in which the top leader index is 1, the indexes of group leaders are *i* = 2, ... , *Nm* + 1, and the indices of followers are *i* = *Nm* + 2, ... , *Nm* <sup>+</sup> *Nf* + 1. The adjacency matrix *<sup>W</sup>* <sup>∈</sup> <sup>R</sup>*N*×*<sup>N</sup>* has the following form:

$$\mathbf{W} = \begin{bmatrix} 0 & \mathbf{W}^{TM} & 0\\ \mathbf{W}^{\mathsf{M}T} & \mathbf{W}^{\mathsf{M}0} & \mathbf{W}^{\mathsf{M}F} \\ 0 & \mathbf{W}^{\mathsf{F}M} & \mathbf{W}^{\mathsf{F}0} \end{bmatrix}\_{N \times N} \tag{1}$$

where *<sup>W</sup>MT* <sup>∈</sup> <sup>R</sup>*Nm*×<sup>1</sup> and *<sup>W</sup>TM* <sup>∈</sup> <sup>R</sup>1×*Nm* indicate the communication topology among the top leader and group leaders, *<sup>W</sup>M*<sup>0</sup> <sup>∈</sup> <sup>R</sup>*Nm*×*Nm* expresses the communication topology among group leaders, the communication topology among group leaders, and their followers are denoted as *<sup>W</sup>M*<sup>0</sup> <sup>∈</sup> <sup>R</sup>*Nm*×*Nm* , while *<sup>W</sup>F*<sup>0</sup> <sup>∈</sup> <sup>R</sup>*Nf* <sup>×</sup>*Nf* is denoted as the communication topology among followers. Assuming that no direct communication exists between each follower and the top leader, the followers of each subset only communicate within UAVs in the subset:

$$\begin{array}{l} \mathsf{W}^{\mathrm{FM}} = \operatorname{diag} \left\{ \mathbf{W}^{\mathrm{FM}\_{1}}, \mathbf{W}^{\mathrm{FM}\_{2}}, \dots, \mathbf{W}^{\mathrm{FM}\_{N\_{\mathrm{M}}}} \right\} \\ \mathbf{W}^{\mathrm{MF}} = \operatorname{diag} \left\{ \mathbf{W}^{\mathrm{MF}\_{1}}, \mathbf{W}^{\mathrm{MF}\_{2}}, \dots, \mathbf{W}^{\mathrm{MF}\_{N\_{\mathrm{M}}}} \right\} \\ \mathbf{W}^{\mathrm{F0}} = \operatorname{diag} \left\{ \mathbf{W}^{\mathrm{F1}}, \mathbf{W}^{\mathrm{F2}}, \dots, \mathbf{W}^{\mathrm{FN}\_{\mathrm{M}}} \right\} \end{array} \tag{2}$$

where *<sup>W</sup>FMi* <sup>∈</sup> <sup>R</sup>*Nf i*×<sup>1</sup> and *<sup>W</sup>MFi* <sup>∈</sup> <sup>R</sup>1×*Nf i* express the communication topology among group leader and its followers in subset *<sup>i</sup>*, and *<sup>W</sup>Fi* <sup>∈</sup> <sup>R</sup>*Nf i*×*Nf i* is the topology among followers in subset *i.*

**Figure 1.** The hierarchical communication topology of UAV swarms with a distributed group.

#### *2.3. UAV and Payload Model*

Denote UAV set *V* = {*Vi*| *i* = 1, 2, ... , *N*}, in which any UAV *Vi* can be described as a seven-element combination < *Xi*, *hi*, *Ci*, *Pi*, *Fi*, *Di*,*max*, *Nvi*,*max* >. *Xi* = (*xi*, *yi*, *zi*, *vi*, *ϕi*, *ψi*) represents the kinetic parameters of UAV *Vi*, including position, velocity, flight path angle, and flight heading angle; *hi* ∈ [0, 1] represents the survivability of UAV *Vi*, and *hi* = 0 indicates *Vi* is destroyed or has encountered faults and then completely loses its mission capability; *Ci* = {*Ci*1, *Ci*2, ... , *CinCi*} expresses indices of UAVs that communicate with *Vi*; *Pi* = *Pi*,*scout*, *Pi*,*attack* is the executable task type of *Vi*, where *Pi*,*scout* <sup>∈</sup> {0, 1}, and *Pi*,*attack* ∈ {0, 1} means reconnaissance capability and attack capability, respectively. When the capability is available, the value is 1, otherwise 0; *Fi* is the fuel consumption rate per unit air-range of *Vi*; *Di*,max is the maximum air-range; *Nvi*,*max* is the maximum number of executable tasks of *Vi*.

The condition that a reconnaissance UAV discovers and confirms the threat or target is that it is located in the detection area of the reconnaissance payload, as shown in Figure 2a. Reference [33] gives the typical mathematic model of the detection area. The condition that an attack UAV can strike a target is that the target is located in the available area of the attack payload, as shown in Figure 2b. References [33,34] give the typical mathematic model of the available area.

**Figure 2.** (**a**) The detection area of reconnaissance payload; (**b**) the available area of attack payload.

#### *2.4. Threat Model*

Any threat can be described by a five-element combination < *qo*,*<sup>i</sup>* , *Ro*,*i*, *Ra*,*i*, *po*,*i*, *Fo*,*<sup>i</sup>* >, where *qo*,*<sup>i</sup>* is the position vector of the threat *Oi* center, *Ro*.*<sup>i</sup>* is the no-fly zone radius of threat *Oi*, and *Ra*.*<sup>i</sup>* is the impact radius of threat *Oi*. When a UAV flies into the no-fly zone, it will be destroyed; when the UAV enters the impact area, it will be affected by the threat. *po*,*<sup>i</sup>* ∈ [0, 1] expresses the estimation of threat impact. *Fo*,*<sup>i</sup>* ∈ {0, 1} indicates the detected status flag of the target, where the value 0 means the target is undetected; otherwise, the value is 1.

The estimation *po*,*i*(*q*) of the threat *Oi* impact on any point in space is denoted as

$$p\_{o,i}(q) = \begin{cases} 1 \text{ } 0 < ||q - q\_{o,i}|| < R\_{o,i} \\ \\ A\_i(q, q\_{o,i}) \text{ } \ R\_{o,i} \le ||q - q\_{o,i}|| \le R\_{a,i} \\ 0 \text{ } \ ||q - q\_{o,i}|| > R\_{a,i} \end{cases} \tag{3}$$

where *Ai*(*q*, *qo*,*<sup>i</sup>* ) ∈ (0, 1), which is a function of the distance between the point and the threat center, represents the effect evaluation within the threat range; for the convenience of the study, *Ai*(*q*, *qo*,*<sup>i</sup>* ) can be taken as a constant value. When the UAV approaches threat *Oi* and enters the impact range, its survivability will decrease. We assume the survivability of UAV *Vj* is *hj* and the survivability becomes *h <sup>j</sup>* under the impact of threat *Oi*, of which the process can be expressed as

$$h'\_{\dot{j}} = h\_{\dot{j}} \cdot (1 - A\_{\dot{i}}) \tag{4}$$

#### *2.5. Dynamic Task Allocation Problem*

We adopt a seven-element combination {*V*, *Sg*, *T*, *O*, *Mt*, *R*, *C*} to describe the dynamic task assignment problem. *V* = {*V*1, *V*2, ... , *VN*} is the set of UAVs and *N* represents the number of UAVs in the swarm; *Sg* = {*Sg*1, *Sg*2, ... , *SgNG*} is the set of subsets and *NG* represents the number of UAV subsets in the swarm; *T* = {*T*1, *T*2, ... , *TNT*} is the set of targets, where *NT* is the number of targets; *O* = {*O*1, *O*2, ... , *ONobs*} is the set of obstacles, where *Nobs* is the number of obstacles; *Mt* = {*Mt*1, *Mt*2, ... , *Mt*,*Ntype}* is the task type set of each target, where *Ntype* is the number of the types. For the "Reconnaissance-Attack" mission scenario, the task type set includes the reconnaissance and attack of two elements, which can be expressed as *Mt* = {*Scout*, *Attack*}; *R* = {*R*1, *R*2, ... , *RNtype*} is the set of the maximum task values for the "Reconnaissance-Attack" mission scenario *R* = {*RS*, *RA*}; *C* is the set of multiple constraints, mainly including UAV capacity constraint, time window constraint, sequential constraint, and cooperation constraint. These constraints are described as follows:

**Definition 1** (UAV capability constraints). *The UAV capability constraints are mainly reflected in three aspects: the maximum range, the executable task types, and the maximum number of executable tasks.*

(a) (The maximum range constraint) Assume that the initial state of UAV *Vi* is *si*0, and the task sequence is *Seqi* = {*si*1, *si*2, ... , *si,Ns,i*}, where *Ns*,*<sup>i</sup>* is the number of tasks to be executed. From Section 2.3, the maximum range of UAV *Vi* is *Di*,*max* and the maximum range constraint can be expressed as

$$\sum\_{j=1}^{N\_{s,i}} L(s\_{i j \prime} s\_{i, j-1}) \quad \le D\_{i, \max} \tag{5}$$

where *L*(*sij*,*si*,*j*−1), which relates to task *si*,*j*−<sup>1</sup> and task *sij*, represents the air-range from the position of task *si*,*j*−<sup>1</sup> to the position of task *sij*. That means the whole air-range couples with the task sequence.

(b) (The executable task type constraint) In the process of the swarm cooperative task execution, different sorts of UAVs perform different types of tasks. There is a mapping between the UAV capability *Pi* = *Pi*,*scout*, *Pi*,*attack* and *Mt* <sup>=</sup> {*Scout*, *Attack*}, which can be expressed as

$$P\_i = \{P\_{i, \text{scout}}, P\_{i, \text{attack}}\} = \begin{cases} \{1, \*\} \to \text{Scut} \\ \{\*, 1\} \to Attack \end{cases} \tag{6}$$

(c) (The task number constraint) The payload number and energy carried by UAVs have limits; thus, it is necessary to restrict the maximum number of tasks performed by the UAV. Assuming that the number of tasks assigned to UAV *Vi* is *Nvt*,*<sup>i</sup>* and the upper limit is *Nvi*,max, the constraint is expressed as

$$N\_{\rm vt,i} \le N\_{\rm vi,max} \tag{7}$$

**Definition 2** (Sequence constraint). *If there is a specific execution order between subtasks Taski and Taskj, there is sequence constraint between Taski and Taskj. Reference* [9] *gives the concrete model of sequence constraint.*

**Definition 3** (Time window constraint). *The start time when task si*,*<sup>j</sup> in Seqi = {si*1*, si*2*,* ... *, si,Ns,i} is performed by UAV Vi needs to be guaranteed to be in the time window tsi*,*<sup>j</sup>* ∈ [*tb*,*j*, *te*,*j*]*, where tb*,*<sup>j</sup> is the earliest start time and te*,*<sup>j</sup> is the latest one. The start time tsi*,*<sup>j</sup> has relations with the last task and the preplanned flight path of the UAV. Suppose te*,*j*−<sup>1</sup> *the time when UAV accomplishes the last task and the preplanning air-range from si*,*j*−<sup>1</sup> *to si*,*j, then the time window constraint can be expressed as*

$$\begin{cases} \ t\_{si,j} \le t\_{c,i} \\\ t\_{si,j} = \max \left\{ t\_{b,i'} t\_{c,j-1} + \frac{L\_i}{\mathcal{V}} \right\} \end{cases} \tag{8}$$

**Definition 4** (Cooperation constraint). *For the process of several UAVs cooperatively performing reconnaissance or attack task Taskj, the expected number of participants Npe*,*<sup>j</sup> is introduced, which means that Taskj can be accomplished by Npe*,*<sup>j</sup> UAVs at most. The actual number of participants is Np*,*<sup>j</sup> and has*

$$0 \le N\_{p,j} \le N\_{pc,j} \tag{9}$$

**Definition 5** (The dynamic task allocation problem). *The objective is to find the best assignment. To be more concise is to optimize the swarm's reward Bs during the task execution, such that:*

$$\begin{array}{rcl} \mathcal{B}\_s &= \max B\_s(V\_{\prime \prime} \mathbb{S}\_{\mathbb{S}^{\prime}}, T, O, M\_{t \prime} R) \\ &= \max \sum\_{i=1}^N \sum\_{j=1}^{N\_{s\cdot i}} B\_{ij}(h\_{i, \text{est}}, L\_{ij \prime} t\_{\text{task}\_j}) \end{array}$$

subject to the constraints:

$$\begin{cases} \sum\_{j=1}^{N\_{bj}} L\left(s\_{ij\prime}, s\_{i,j-1}\right) \le D\_{i,\text{max}}\\ P\_i = \left\{P\_{i,\text{scout},i}, P\_{i,\text{attack}}\right\} = \left\{\begin{array}{l} \{1, \*\} \to S \text{count} \\ \{\*, 1\} \to A \text{stack} \end{array} \right.\\ N\_{vt,i} \le N\_{vt,\text{max}} \\ t\_{si,j} \le t\_{c,i} \\ t\_{si,j} = \max\left\{t\_{bj,i}, t\_{c,j-1} + \frac{L\_i}{V}\right\} \\ 0 \le N\_{p,j} \le N\_{pv,j} \end{cases}, \forall i \in V$$

where the value of *Bij* indicates the reward of UAV *i* performing the task *j*; the survivability estimation is *hi*,*est*, which means more loss to the UAV performing the task; the air-range performing *Taskj* is *Lij*; the start time of task *j* is *ttaskj* .

Due to the objective function coupling with the process of contract net protocol, the detailed expression is designed in Section 3.2, where the bidding function of the market mechanism is introduced.

#### **3. Task Assignment Algorithm Based on Extended CNP**

This section designs a distributed dynamic task assignment algorithm based on extended contract network protocol. The core of contract net protocol (CNP) is to simulate the "bid–win" market mechanism and realize the optimization of task assignment based on the interaction of individuals. The classical CNP is not suitable for sequential task assignments with multiple constraints under multiple rounds, which means that it needs extending according to complex task constraints.

#### *3.1. Distributed Multi-Constraint Dynamic Task Assignment Algorithm*

The algorithm includes four steps: target information release, bidding scheme generation, bid winning authorization, and task execution. The process is shown in Figure 3. The following describes the algorithm process in turn.

Step 1: Release the information of targets

Assume that the UAV detects a sudden target *Tk* and reports the target information to the subset leader UAV *Vi*, which becomes the tendering UAV on behalf of the subset, and sends a bidding invitation to each subset in the local communication network.

According to the hierarchical communication topology, the set of other subset leaders interacting with *Vi* is *Vp*,*<sup>i</sup>* = *Vj wij* = *wji* = 1, *j* ∈ [2, 1 + *Nm*] ∪ *j* = *i* . *Vp*,*<sup>i</sup>* composes the potential bidders of assignment for target *Tk*. Tendering UAV *Vi* releases tendering information *Ii* to each UAV in set *Vp*,*i*, and *Ii* is expressed as

$$I\_i = (V\_{i\prime}, T\_{k\prime}, \{Tash\_k\}, t\_{now\prime\prime}H\_b) \tag{10}$$

where *Vi* is the index of tendering UAV; *Tk* is the index of the sudden target; {*Taskk*} is denoted as the subtask set of target *Tk*; *tnow* is the releasing time; *Hb* is the information state flag, of which value 0 means no relay, otherwise 1. The information structure of *Taskk* can be denoted as:

$$\text{Task}\_{k} = \left\{ \mathbf{M}\_{t,k\prime} \mathbf{q}\_{t,k\prime} \, \text{TimeBar}\_{k\prime} \mathbf{R}\_{x,k\prime} \mathbf{N}\_{\text{pc},j\prime} \, \text{Th}\_{k} \right\} \tag{11}$$

where *Mt*,*<sup>k</sup>* means task type; *qt*,*<sup>k</sup>* is the position coordinates; *TimeBark* = [*tb*,*k*, *te*,*k*] is the time window; *Rx*,*<sup>k</sup>* is the maximum task value; *Npe*,*<sup>j</sup>* is the expected number of participants; *Thk* is the negotiation threshold.

**Figure 3.** The process of task assignments based on extended CNP.

As attached information during the subtask *Taskk* assignment process, negotiation threshold *Thk* is applied to preselect bidders from the potential bidder set *Vp*,*<sup>i</sup>* , reducing the negotiation scale as well as the consumption of communication resources and improving assignment efficiency. To adjust the threshold adaptively, the reward of subtask *Taskk* in performing the subset to which UAV *Vi* belongs is selected as *Thk*. If the bidder's bidding value is greater than *Thk*, it indicates that the swarm efficiency will be optimized and improved after *Taskk* is assigned to the bidder.

In the process of information release, different UAV groups with intersections of local communication topologies may discover the same target, causing multiple bidding UAVs to release information at the same time in their respective local communication topologies, resulting in system conflict and resource waste. In order to avoid this problem, it is necessary to ensure that different UAV subsets reach a consensus on tendering UAVs and tendering information, which is realized by Algorithm 1.

In Algorithm 1, "StopCmd" means to abort the negotiation command, and the operation "." represents the reference to elements in the information structure. For any UAV *Vi* in the local communication topology, the tendering information related to subtask *Taskk* will be released to its directly connected UAV if it is not empty and has not been transmitted (Line 1–4). Meanwhile, the UAV can also receive the tendering information related to *Taskk* transmitted by others directly connected with it (Line 6–11).

**Algorithm 1:** Release tendering information and achieve information consensus.

**1:** if *Ii* is not empty **2:** if *Ii*.*Hb* = 0 **3:** broadcast *Ii* to *Vp*,*<sup>i</sup>* **4:** endif **5:** endif **6:** for*j* = 1 to number of tendering *Taskk* invitation received **7:** if *Ii*.*Taskk*.*Thk* < *Ij*.*Taskk*.*Thk* **8:** *Ii* = *Ij* **9:** broadcast StopCmd to *Vp*,*<sup>i</sup>* − *Vp*,*<sup>j</sup>* **10:** endif **11:** endfor

If UAV *Vi* finds that the negotiation threshold of UAV *Vj* about *Taskk* is greater, *Vi* saves the tendering information *Ij* and issues commands to other UAV groups interacted with *Vi* but not interacted with *Vj* to stop the negotiation so as to make them exit the assignment dominated by UAV *Vj*.

After Step 1, the tendering information released in the local communication network achieves consensus, that is *Ii* = *Ij*.

Step 2: Generate bidding scheme

On the basis of the target information and various constraints, each potential bidder first determines the bidding scheme for the target of the group, including whether the subset participates in the bidding for target *Tk*, subtasks that can be performed and the corresponding UAVs, and the reward for performing each subtask, and so forth.

Through the contract network in subsets, each subtask of target *Tk* is preassigned to form the bidding scheme for target *Tk*. The specific process is as follows:

<sup>1</sup> A subset leader UAV *Vj* releases subtask information to each follower UAV. In order to improve the assignment efficiency and shorten the time, the task concurrency mechanism is introduced to publish the information of each subtask at the same time;

<sup>2</sup> For subtask *Taskk*, the follower UAV *Vm* judges whether it has the corresponding type of payload, whether the number of payloads can meet the task execution requirements, and whether it meets the time window *TimeBark* according to the subtask information. If any constraint is not satisfied, UAV *Vm* will not bid for subtask *Taskk*;

<sup>3</sup> If the above constraints are met, UAV *Vm* preplans the flight path of *Taskk* based on the modified artificial potential field (MAP) method and substitutes the preplanned range *Lmk* and survivability estimation *hm*,*est* into the individual bidding function to calculate the bidding value *Bmk*;

<sup>4</sup> Compare the bidding value *Bmk* with the subtask negotiation threshold *Thk*. If *Bmk* < *Thk*, it means that the system efficiency has not been improved when UAV *Vm* is used to perform subtask *Taskk*, and *Vm* actively abandons pre-assignment bidding for *Taskk*; otherwise, *Vm* participates in the bidding;

<sup>5</sup> Each follower UAV respectively performs steps <sup>2</sup> –<sup>4</sup> to complete the pre-assignment of each subtask. The subset leader UAV *Vj* generates the subset's bidding scheme for target *Tk* according to the pre-assignment results.

The above is the basic process of bidding scheme generation. Due to the existence of multi-UAV cooperatively performing subtasks and the introduction of the task concurrency mechanism, the generation of the bidding scheme needs to solve a "multi-participants– multi-tasks" assignment optimization subproblem.

In order to solve the subproblem, an assignment mechanism based on the contract network within subsets is constructed. For the subtasks that need to be executed by multiple UAVs, a cooperator determination mechanism is introduced to complete the task allocation; for the sequential subtasks with time windows, a sequential task selection mechanism is introduced. Details are as follows:

(1) The determination mechanism on cooperators

The process of the determination mechanism on cooperators is shown in Figure 4. To simplify the assignment process, give the tenderee the initiative, expand the set of bidders, and provide more feasible pre-assignment schemes, the tenderee bidding mechanism is introduced, that is, when the subtask tenderee (subset leader UAV *Vj*) meets conditions such as capability constraints and time constraints, it can participate in bidding.

**Figure 4.** The determination mechanism on cooperators.

If *Vj* meets the constraints for performing *Taskk* and decides to bid with the bidding value *Bjk*, then *Th* is chosen as *Bjk*. UAV *Vj* sends *Taskk* information and negotiation threshold *Bjk* to each follower UAV. The bidding value of follower UAV *Vm* for *Taskk* is *Bmk*. If *Bmk* > *Bjk*, *Vm* decides to bid and feedback *Bmk* to *Vj*. At the time, the set of all bidders participating in the *Taskk* assignment is

$$V\_{b\cdot k} = \left\{ V\_m \Big| w\_{jm} = w\_{m\cdot j} = 1 \,\text{,} \, j \ne m \,\, ; \, B\_{\text{unk}} > B\_{jk} \right\} \cup \left\{ V\_{\vec{j}} \right\}.$$

If tenderee *Vj* does not participate in the bidding, i.e., *Bjk* = ∅,

$$V\_{b,k} = \left\{ V\_{\mathfrak{m}} \,|\, w\_{jm} = w\_{m\mathfrak{j}} = 1, \mathfrak{j} \neq m \; ; \, B\_{mk} > 0 \right\}.$$

To sum up, after preselection of negotiation threshold, all bidders participating in the assignment for *Taskk* can be represented as

$$V\_{b\cdot k} = \begin{cases} \left\{ V\_m | w\_{jm} = w\_{m\cdot j} = 1, j \neq m; B\_{mk} > 0 \right\}, B\_{jk} = \mathcal{Q} \\\\ \left\{ V\_m | w\_{jm} = w\_{m\cdot j} = 1, j \neq m; B\_{mk} > B\_{jk} \right\} \cup \left\{ V\_j \right\}, B\_{jk} \neq \mathcal{Q} \end{cases} \tag{12}$$

Consider the cooperative constraints of subtask *Taskk*, assume *Taskk* requires the participation of *Np*,*<sup>k</sup>* UAVs, and quickly sort the bidding value set *Bjk* . The greedy algorithm is used to select the *Np*,*<sup>k</sup>* largest bidding values in the set to form the winning bidding value set (reward set) *Bk*

$$B\_k = \left\{ B\_{(1),k'} \: \: B\_{(2),k'} \: \ldots \colon \: \: B\_{(Np,k),k} \right\} \tag{13}$$

where *B*(*x*),*<sup>k</sup>* represents the *x*-*th* order element of {*Bmk*} in descending order. The winner set *Winnerk* is correspondingly:

$$Minner\_k = Index(B\_{(1),k'}, B\_{(2),k'}, \dots, B\_{(Np,k),k}) \tag{14}$$

(2) Sequential task selection mechanism

Take the sequential task assignment process of a single follower UAV as an example. After receiving the task information, the bidder determines the executable tasks according to the negotiation threshold, UAV capability constraints, and the execution sequence and combines them to generate an alternative sequence without a time window conflict. Based on the bid winning situation, the optimal sequence is selected as the final signing one. The operation process is shown in Figure 5.

**Figure 5.** The process of task assignment based on extended CNP.

Assume that the concurrent task set *Tasknew* = *Tasknew* <sup>1</sup> ,..., *Tasknew Ntask* is received by the follower UAV *Vm*. The set of tasks that *Vm* can execute and determine to participate in bidding is expressed as

$$\text{Task}^{a,m} = \left\{ Task\_k^{a,m} \, \middle| \, k = 1, 2, \dots, N\_{a,m} \right\} \subseteq Task^{nuc} \tag{15}$$

where *Na*,*<sup>m</sup>* is the element number of *Taska*,*m*. Denote *Combn*(*Na*,*m*, *Nvm*,*n*) as the combination which consists of random *Nvm*,*<sup>n</sup>* elements from set {1, 2, ... , *Na,m*}, where *Nvm*,*<sup>n</sup>* ≤ *Na*,*m*.

**Definition 6** (Task Sequence Alternative).*If* ∀*k* ∈ *Combn*(*Na*,*m*, *Nvm*,*n*)*,* ∀*k* ∈ *Combn*(*Na*,*m*, *Nvm*,*n*)− {*k*}*, there is TimeBark* <sup>∩</sup> *TimeBark* <sup>=</sup> <sup>∅</sup>*. Then, Combn*(*Na*,*m*, *Nvm*,*n*) *is called a sequence alternative. TimeBark is the time window of subtask Taska*,*<sup>m</sup> <sup>k</sup> . The corresponding task sequence alternative is defined as*

$$Seq\_n^m = \left\{ Task\_k^{a,m} \, \middle| \, k \in \text{Comb}\_n(N\_{a,m}, N\_{\text{vm},n}) \right\} \,, n \le N\_{\text{sm}}, Seq\_n^m \subseteq Task^{a,m} \tag{16}$$

*where Nsj is the number of task sequence alternatives*.

After the bidding is completed, the set of losing bidding tasks is denoted by *Taskde*,*m*. After the sequences containing losing bidding tasks are eliminated, the set composed of the remaining sequence alternatives is

$$SeqSet\_{m} = \left\{ Seq\_{n}^{m} \, \middle| \, n \le N\_{sm}; \forall Task\_{k}^{a,m} \in Seq\_{n}^{m}, Task\_{k}^{a,m} \notin Task\_{m}^{d\varepsilon, m} \right\} \tag{17}$$

Assuming that the bidding value of *Vm* for any subtask *Taska*,*<sup>m</sup> <sup>k</sup>* in *Seq<sup>m</sup> <sup>n</sup>* is *Bmk*, where *Seq<sup>m</sup> <sup>n</sup>* ∈ *SeqSetm*, the sum of bidding value for each task in *Seq<sup>m</sup> <sup>n</sup>* is

$$B\_{m,n} = \sum\_{k} B\_{m,k'} \; k \in \mathbb{C}omb\_{n}(N\_{a,m'}N\_{vm,n}) \tag{18}$$

and the final signing sequence of follower UAV *Vm* is

$$Seq^{win,m} = Seq^{m}\_{n\_{best}}, n\_{best} = \underset{n}{\text{argmax}} \left( B\_{m,n} \right) \tag{19}$$

Through the above mechanism, the subtask pre-assignment scheme of the subset in which *Vj* is located is generated, that is, the bidding scheme of the subset

$$BidSch\_{\rangle} = \left\{ Seq^{win,m} \, \middle| \, m \in G\_{\rangle} \right\} \tag{20}$$

After that, UAV *Vj* calculates the bidding function of the subset according to the bidding scheme and bids to the tenderee *Vi*.

Step 3: Determine the winners

After the bidding scheme generation and bidding application in Step 2, the tendering UAV *Vi* selects the scheme of which the efficiency is greatest according to the bidding value of each group so as to determine which subset executes subtasks of target *Tk*.

The set of bidding values of subsets is *Bg*,*jk* . Apply the greedy algorithm and select the greatest value and corresponding subset as the winning value (reward) *Bg*,*<sup>k</sup>* and the winner *Winnerg*,*k*. The process can be expressed mathematically as

$$B\_{\mathbb{g},k} = \max\{B\_{\mathbb{g},jk}, Winner\_{\mathbb{g},k} = \underset{j}{\arg\max} \left(B\_{\mathbb{g},jk}\right)\tag{21}$$

After determining the winner, UAV *Vi* authorizes the subset *Winnerg*,*<sup>k</sup>* to perform each subtask of target *Tk*.

Step 4: Perform the tasks

The winner subset formally authorizes each follower UAV in the subset to perform the subtasks, and the pre-assignment scheme in Step 2 becomes the formal assignment scheme. Follower UAV *Vm* will add the signing subtask sequence *Seqwin*,*<sup>m</sup>* to its task sequence to be executed, that is:

$$A\_m = A\_m \oplus \mathcal{Seq}^{\mathfrak{uvm,m}} \tag{22}$$

where *A<sup>m</sup>* is the task sequence to be executed, ⊕ expresses the operation that it adds the sequence *Seqwin*,*<sup>m</sup>* to the sequence *Am*. Each UAV performs each task according to sequence *Am* and the preplanning flight path in the specific time window.

The pseudo-code of the whole algorithm 2 process described above is as follows:

**Algorithm 2:** The extended Contract Network Protocol. **Input: Task assignment combination** {*V*, *Sg*, *T*, *O*, *Mt*, *R*, *C*}; Topology *G*0; time *tnow* **Output: Task execution sequence set** {*Am*} **1:** /\* **Step 1: Release the information of targets** \*/ **2:** for *i* from the first index of leader UAVs to the last one **3.** if *Vi* is a target tenderee UAV **4:** {*Vp*,*i*} = detTargetPotentialBidders(*G*0,*C*) /\* determine the set of potential bidder subsets. \*/ **5:** for *k* = 1 to number of targets **6:** if *Tk* is tendered by leader UAV *i* **7:** {*Taskk*} = generateSubtasks(*Tk*) /\* Tenderee UAV *i* generates subtask information. \*/ **8:** end if **9:** end for **10:** *Ii* = releaseTargetInfo ({*Tk*}, {*Taskk*}, *tnow*, {*Vp*,*i*}} /\* Tenderee UAV *i* releases target information. \*/ **11:** end if **12:** end for **13:** /\* **Step 2: Generate bidding scheme \*/ 14:** for *j* from the first index of leader UAVs to the last one **15:** for *i* from the first index of tenderee leader UAVs to the last one **16:** if *Vj* belongs to {*Vp*,*i*} **17:** Receive *Ii* and broadcast it to *Vj* 's followers **18:** end if **19:** end for **20:** for *m* from the first index of followers of *Vj* to the last one **21:** {*Taska,m*} = checkTaskConstaints({*Ii*.*Taskk*}) /\* Select the executable subtasks from {*Taskk*} \*/ **22:** {*Seq<sup>m</sup> <sup>n</sup>* } = combineTasks({*Taska,m* }) /\* Generate the set of alternative sequence \*/ **23:** {*Bm,n*} = biddingFuncCal({*SeqSetm*}) /\* Calculate the bidding function of each sequence \*/ **24:** *Seqwin,m* = *Seq<sup>m</sup> nbest* , *nbest <sup>=</sup>* argmax *<sup>n</sup>* (*Bm*,*n*) /\* Determine the sequence to execute \*/ **25:** end for **26:** *Bg,jk* = ∑ *m* max(*Bm*,*n*) **27:** end for **28:** /\* **Step 3: Determine the winners \*/ 29:** for *i* = 1 to number of tenderee leader UAV *i* **30:** for *k* = 1 to number of targets tendered by leader UAV *i* **31:** *Bg,k* = max{*Bg,jk*}, *Winnerg,k* = argmax *j Bg*,*jk* /\* Determine winner of tendering for Target *k* \*/ **32:** end for **33:** end for **34:** /\* **Step 4: Perform the tasks \*/ 35:** for *n* = 1 to number of subsets **36:** for *m* = 1 to number of follower UAV *m* of Subset *n* **37:** *<sup>A</sup><sup>m</sup>* <sup>=</sup> *<sup>A</sup><sup>m</sup>* <sup>⊕</sup> *Seqwin,m* /\* Add the signing subtask sequence to task execution sequence *<sup>A</sup>m*. \*/ **38:** end for **39:** end for

> Analyze the worst time complexity of the algorithm and take the case of task assignment by the top leader as an example:

> The time complexity of the top leader issuing *NT* targets with subtasks bidding information to *NG* subset leaders is *O*(*NTNG*), the complexity of the initial evaluation of 4 types of constraints presented in Section 2.5 by *NG* subset leaders is 4*O*(*NTNG*), the complexity of the bidding of potential bidder subsets {*Vp*,*i*} is *O*(*K*), where *K* is the number of elements in {*Vp*,*i*}; after *K* potential bidder subsets are determined, each potential bidder subset needs to pre-assign the subtasks and generate the bidding scheme of the subset.

> Assume that each target has *Nt* subtasks, each subtask needs the most *Np* participants and each subset consists of (*Nf* + 1) UAVs. Each subset will generate the target scheme based on the CNP within the subset. The time complexity of the subset leaders issuing *Nt* subtasks to their *Nf* followers is *O*(*NTNtNf*). The sequential task selection is an arrangement–

combination problem, of which the time complexity is *NT*•*O*(*Nt* 3), and the time complexity of bidding to the subset leader in each subset is *O*(*NTNtNf*). Subset-leader-selecting cooperators can be regarded as a sorting problem, and the quick sort algorithm can be adopted so that the time complexity is *NTNt*•*O*(*Np*log2*Np*). Therefore, the time complexity of the subtask pre-assignment is 2*O*(*NTNtNf*) + *NT*•*O*(*Nt* 3) + *NTNt*•*O*(*Np*log2*Np*).

After the bidding schemes of each subset are generated, the bidding winner subset will be determined. It is a quick sort process, so the time complexity is *NT*•*O*(*NG*log2*NG*).

To conclude, the whole worst time complexity is 2*O*(*NTNtNf*) +*NT*•*O*(*Nt* 3) +*NTNt*•*O*(*Np*log2*Np*) + *NT*•*O*(*NG*log2*NG*). It indicates that the algorithm is a polynomial one.

#### *3.2. Bidding Function Design*

As the core of extended CNP, the bidding function, which needs to be designed on the basis of the concrete task assignment problem in CNP, is a sort of objective function. According to the method process described above, the whole task assignment is divided into the target assignment and the subtask assignment. Both the target assignment and the subtask assignment are based on the extended CNP. Hence, the bidding function of each subset is designed for the target assignment, while the individual bidding function within the group is designed for the subtask assignment.

#### 3.2.1. Individual Bidding Function

The individual bidding function *Bij* consists of value function *Reij* and penalty function *Peij*. Based on the analysis for the UAV swarm task assignment model in Section 2, the individual bidding function mainly relates to: <sup>1</sup> the maximum task values, which is denoted in Section 2.5; <sup>2</sup> the flight path; <sup>3</sup> the time windows and start time of each subtask.

Since the UAV task assignment couples with path planning, the bidding function needs to be designed in combination with the UAV flight path. By analyzing the scenario, it can be seen that the impact of the flight path on task assignment is mainly reflected in UAV survivability estimation *hi*,*est* and air-range *Lij* (fuel consumption), as shown in Figure 6.

**Figure 6.** The coupling of the UAV mission assignment and path planning.

Supposing that UAV *Vi* participates in the bidding process of subtask *Taskj* with time window *ttaskj* ∈ [*ts*,*j*, *te*,*j*], where *ttaskj* is the start time of task execution, of which the mathematical expression is given in Section 2.5, *ts*,*<sup>j</sup>* is the earliest start time, and *te*,*<sup>j</sup>* is the latest one.

To evaluate the impact of threats {*Ok* | *k* = 1, 2, ... , *Nobs*} on UAV *Vi* performing tasks, it introduces the survivability estimation of *Vi*, which can be expressed as

$$h\_{i,est} = h\_i \prod\_{k=1}^{N\_{app}} (1 - A\_{k,app}) \tag{2.3}$$

where *hi* is the current survivability of *Vi*, *Ak*,*app* is the impact of threats *Ok*,*app* ∈ {*Ok*} approached by *Vi*, *Ak*,*app* ∈ (0, 1), and *Napp* is the number of threats approached.

(1) Value function *Reij*

In the actual scenario, the value of UAVs performing any task is generally related to the start time when the UAV performs the task. The nearer the start time *ttaskj* to the time window *TimeBark*, the greater the value for UAV *Vi* to perform the task; otherwise, the smaller the value; hence, the item related to the start time *ttaskj* is introduced. The value function of *Vi* performing *Taskj* is designed as

$$\operatorname{Re}\_{i\bar{j}}(h\_{i, \text{est}}, t\_{\text{task}\_{\bar{j}}}) = h\_{i, \text{est}} \cdot R\_{x, \bar{j}} \cdot e^{-\lambda \frac{t\_{\text{task}\_{\bar{j}}} - t\_{\nu\_{\bar{j}}}}{t\_{x, \bar{j}} - t\_{\text{task}\_{\bar{j}}}}} \tag{24}$$

where *Rx*,*<sup>j</sup>* is the maximum value of *Taskj*; *λ* is the attenuation factor of the start time *ttaskj* on the task value. Apparently, *Reij* is a monotonic increasing function of battlefield survivability *hi*,*est* and decreases monotonically with the start time *ttaskj* . When *hi*,*est* : 1 → 0, *ttaskj* : *ts*,*<sup>j</sup>* → *te*,*<sup>j</sup>* , there is *Reij*: *Rx*,*<sup>j</sup>* → 0.

(2) Penalty function *Peij*

The penalty function is composed of the consumption penalty and the threat penalty. Assuming that the per-unit fuel consumption of UAV *Vi* is *Fi*, the air-range performing *Taskj* is *Lij*, the estimation of survivability is *hi*,*est*, and the penalty function is designed as

$$P e\_{i\circ} (h\_{i, \text{est}}, L\_{i\circ}) = \frac{1}{h\_{i, \text{est}}} \cdot F\_i L\_{i\circ} \tag{25}$$

Obviously, *Reij* is a monotonic function of battlefield survivability *hi*,*est*. When *hi*,*est* : 1 <sup>→</sup> <sup>0</sup><sup>+</sup> , there is *Peij*: *FiLij* <sup>→</sup> <sup>+</sup>∞.

The penalty indicates the coupling impact of flight path planning on task assignment. On one hand, the fuel consumption of performing subtask *Taskj* is directly related to the preplanned air-range *Lij*; on the other hand, the more threats approached by UAV *Vi* during the task execution, the lower the survivability estimation *hi*,*est*, which means more loss when the UAV performs the task.

Combining the value function and the penalty function, the individual bidding function *Bij* of UAV *Vi* for *Taskj* is denoted by

$$\mathbf{R}\_{\rm ij}(\mathbf{h}\_{\rm i,est}, \mathbf{L}\_{\rm ij}, \mathbf{t}\_{\rm task}) = \mathbf{R}\mathbf{e}\_{\rm ij} - \mathbf{P}\mathbf{e}\_{\rm ij} = \mathbf{h}\_{\rm i,est} \cdot \mathbf{R}\_{\mathbf{x},\rm j} \cdot \mathbf{e}^{-\lambda \frac{\mathbf{t}\_{\rm task} - \mathbf{t}\_{\rm sj}}{\mathbf{t}\_{\rm sj} - \mathbf{t}\_{\rm task}}{\mathbf{t}\_{\rm list}}} - \frac{1}{\mathbf{h}\_{\rm i,est}} \cdot \mathbf{F}\_{\rm i}\mathbf{L}\_{\rm ij} \tag{26}$$

where a greater value of *Bij* indicates the reward of performing the task. *Bij* monotonically increases with *hi*,*est* and decreases with *Lij* and *ttaskj* , which agrees with the actual scenario, which suggests that *Bij* has practical significance.

#### 3.2.2. Bidding Function of Each Subset

The bidding function of subsets is based on the individual function. For target *Tj*, supposing that the bidding value set is {*B*1, *B*2} after the subtask assignment in subset *Gi*, *B*1, *B*<sup>2</sup> respectively express the bidding value of subset *Gi* performing the reconnaissance

subtask and the attack one. The bidding function of subsets is defined as the sum of elements in {*B*1, *B*2}, which is

$$B\_{\mathbb{S},ij} = \sum B\_1 + \sum B\_2 \tag{27}$$

*k* = 1, 2, . . . *Nnewtask*,*<sup>j</sup>*

#### **4. Dynamic Assignment Strategy Based on Event Triggering**

*4.1. Event Trigger Conditions for Dynamic Tasks*

Due to the incomplete perception of the situation at the beginning and the occurrence of emergencies during the task execution, the initial assignment scheme needs to be adjusted. Therefore, the event trigger conditions are introduced to judge whether or not to carry out the dynamic task assignment. Event trigger conditions are the key to realizing dynamic task assignments, which need to be selected in combination with different scenarios. The dynamic task assignment process based on event triggering can be expressed as:

*E* : if *g*(*s*(*t*), *st*,*j*(*t*)) ≥ 0, then *Targetj* ← *TargetIn f o*<sup>∗</sup> *<sup>j</sup>* , *Taskj*,*<sup>k</sup>* ← *TaskIn f o*<sup>∗</sup> *<sup>j</sup>*,*k*(*k* = 1, 2, . . . , *Nf*)

> where *g*(*s*(*t*), *st*,*j*(*t*)) ≥ 0 is the triggering condition, *s*(*t*) is the state of UAV (such as the position), *st*,*j*(*t*) is the status of target *Tj*. *Targetj* ← *TargetIn f o*<sup>∗</sup> *<sup>j</sup>* expresses the update of the set of targets, and *Taskj*,*<sup>k</sup>* ← *TaskIn f o*<sup>∗</sup> *<sup>j</sup>*,*<sup>k</sup>* expresses the update of the set of subtasks.

(1) The unknown targets appear

The initial target assignment is based on the initial situation awareness information, which can only cover the known targets but is not guaranteed to cover all the targets in the mission area. After the reconnaissance UAV or reconnaissance-attack UAV *Vi* flies near the unknown target *Tj* during task execution, it will detect the target and obtains the target information *TargetIn f onew*,*<sup>j</sup>* and then updates the target set and the subtask set. Event trigger conditions can be described as

$$(\mathcal{g}(\mathbf{s}(t), \mathbf{s}\_{t,j}(t)) = DA(r\_{i\prime}r\_{nt,j\prime}R\_{detcut,i}))$$

where *DA*(*ri*,*rnt*,*j*, *Rdetect*,*i*) is defined as the reconnaissance payload constraint, which is the constraint on the relative position and angle relations between UAV *Vi* and target *Tj*. The details are in reference [34]. When target *Tj* is located in the detect area of the reconnaissance-attack UAV or reconnaissance UAV, *DA*(*ri*,*rnt*,*j*, *Rdetect*,*i*) ≥ 0; *r<sup>i</sup>* is the position vector of UAV *Vi*; *rnt*,*<sup>j</sup>* is the position vector of target *Tj*; *Rdetect*,*<sup>i</sup>* is the detection range of UAV *Vi*. The event-triggering process can be expressed as

*E*<sup>1</sup> : if *DA*(*ri*,*rnt*,*j*, *Rdetect*,*i*) ≥ 0, then *TargetNtarget*<sup>+</sup>*<sup>j</sup>* ← *TargetIn f onew*,*j*, 

$$Task\_{N\_{target} + j\_\sim k} \leftarrow TaskInfo\_{New\_\sim j\_\sim k}$$

where *TargetNtarget*<sup>+</sup>*<sup>j</sup>* ← *TargetIn f onew*,*<sup>j</sup>* indicates the update of target set, *Ntarget* is the number of targets; *TaskNtarget*<sup>+</sup>*j*,*<sup>k</sup>* ← *TaskIn f oNew*,*j*,*<sup>k</sup>* indicates the update of subtask set, *Nnewtask*,*<sup>j</sup>* is the number of subtasks of *Tj*.

(2) The UAV encounters faults and cannot perform tasks

During task execution, UAVs may encounter non-cooperative behaviors such as collision and attack and lose mission capability. Its tasks that have not been executed will be reassigned. Event triggering conditions can be described as

$$(g(\mathbf{s}(t), \mathbf{s}\_{t,j}(t)) = h\_{i,b} - h\_i)$$

where *hi* is the survivability of *Vi*; *hi*,*<sup>b</sup>* is the minimum survivability of *Vi* with normal capability and is selected according to the concrete scenario. The event-triggering process can be expressed as

$$\text{If } E\_2: \text{if } h\_{i,b} - h\_i \ge 0, \text{ then } \\ Task\_{j,k} \gets \text{Task} \\ Info\_{i,j,k} \left(k = 1, \, 2, \ldots, \, N\_{i,j}\right)$$

where *Taskj*,*<sup>k</sup>* ← *TaskIn f oi*,*j*,*<sup>k</sup>* is the reassignment for the subtasks that *Vi* is to perform.

#### *4.2. The Basic Process of Dynamic Task Assignment*

Before describing the basic process of cooperative dynamic task assignment, the following assumptions are given for the scenario:

1. Based on the early situation awareness, several targets and pieces of threat information have been obtained, including target location, number of UAVs required to perform each subtask, threat location, and impact range;

2. Due to incomplete situational awareness, there are unknown targets and threats. After the UAV detects an unknown target or threat, it broadcasts the threat information, transmits the target information to the group leader, and triggers the assignment process.

Based on the scenario and assumptions above, the basic process of cooperative dynamic task assignment is shown in Figure 7. The concrete steps are as follows:


**Figure 7.** The procedure for cooperative dynamic task assignment.

#### **5. Numerical Simulation**

It is assumed that the swarm consists of 1 top leader and 2 subsets, with a total of 11 UAVs, including 2 reconnaissance-attack UAVs, 4 reconnaissance UAVs, and 4 attack UAVs. Among them, reconnaissance-attack UAVs are the group leaders. Consider the UAV swarm communication topology *G*0, as shown in Figure 8.

**Figure 8.** The communication topology *G*<sup>0</sup> of the UAV swarm.

Each UAV can perform no more than two subtasks, and each subtask is performed jointly by, at most, two UAVs. For flight path preplanning, the UAV kinematics model in reference [35] and the modified artificial potential field method [36–38] are used to realize the air-range estimation considering threat avoidance. The capability parameters of each UAV under the scenario are shown in Table 1, and the parameters of reconnaissance payload and attack payload are shown in Tables 2 and 3, respectively.

**Table 1.** The table of capability parameters of UAVs.


**Table 2.** The parameter table of UAV reconnaissance payload.




The feasible area of UAV reconnaissance and attack for a target on the ground, based on the above parameter configuration, is shown in Figure 9.

**Figure 9.** The feasible area of payload: (**a**) the available detection area; (**b**) the available attack zone.

Consider a 5 × 5 km mission scenario. The information of known targets and the time window constraints of subtasks randomly generated is shown in Table 4.



#### *5.1. The Initial Task Assignment for Known Targets*

Assume that the UAV swarm performs the reconnaissance-and-attack tasks at six enemy targets. The location of each UAV, threat, and target is shown in Figure 10.

**Figure 10.** The initial setting of the situation between both sides.

Purple threats represent known threats, and blue threats represent unknown ones, all of which the coordinates are randomly generated. The targets are marked with "T"; Each UAV is marked with "V", where "S" represents the reconnaissance UAV, "A" represents the attack UAV, and "SA" represents the reconnaissance–attack UAV.

In the simulation, the maximum value of the reconnaissance subtask is set at 100, the maximum value of the attack subtask is set at 150, and the attenuation factor *λ* is set at 0.9.

Adopting the designed UAV swarm cooperative dynamic task assignment approach, the sequential task assignment results for known targets are shown in Table 5. The task execution plan of each UAV is expressed in the format of (Target, Subtask type, Start time (s), Winning bidding value).



According to the known threat information and the unknown threat information detected by reconnaissance UAV during mission execution, the modified artificial potential field method is adopted to avoid local optimization, and the preplanning flight path considering threat avoidance during mission execution can be realized. Combined with the UAV swarm task assignment results, the preplanned path, and the task sequence, the swarm task execution diagram can be obtained, as shown in Figure 11.

**Figure 11.** (**a**) The execution sequence of the UAV swarm; (**b**) the diagram of the UAV swarm performing tasks.

As can be seen in Figure 11, the cooperative task assignment approaches can realize the "reconnaissance–attack" coverage of known targets under the consideration of flight path planning coupling and task time window constraints.

#### *5.2. Dynamic TASK Assignment for Sudden Targets*

Considering two unknown targets, T7 and T8, the swarm discovers targets through cooperative detection and then assigns tasks. The information of T7 and T8 is shown in Table 6, and the initial operational situation is shown in Figure 12.


**Table 6.** The table of information on unknown targets.

**Figure 12.** The initial setting of the situation between both sides. (Considering unknowned targets).

The time when unknown targets are discovered by the UAVs and the corresponding subsets that discover the targets are shown in Table 7. The generated subtask time window constraints are shown in Table 7 as well. The sequential task assignment results for the unknown targets are shown in Table 8.

**Table 7.** Information of unknown targets and corresponding subtasks.


The swarm realizes the coverage of sudden targets through the distributed cooperative dynamic task allocation mechanism after the reconnaissance UAV V8, belonging to Subset 1, and the reconnaissance UAV V5, belonging to Subset 2, detect sudden targets T7 and T8, respectively, during task execution.

The concrete analysis is as follows: The reconnaissance UAV V4 and attack UAV V6, belonging to Subset 1, is assigned to perform the subtasks of sudden target T8 successively in the corresponding time window. The reconnaissance UAV V8 and reconnaissance-attack UAV V3, belonging to Subset 2, successively perform the subtasks of sudden target T7 in the corresponding time window so as to realize the "reconnaissance–strike" coverage of each sudden target.


**Table 8.** The task assignment of the UAV swarm.

We combine the assignment results with the preplanning flight path and task execution sequence based on the artificial potential field method. The diagrams of UAV swarm task execution and the execution sequence under sudden targets are shown in Figure 13.

**Figure 13.** (**a**) The execution sequence of the UAV swarm; (**b**) the diagram of the UAV swarm performing tasks. (Considering unknowned targets).

Figure 13 indicates that through the dynamic task assignment approach based on the extended CNP, the swarm can mobilize the UAVs with task capability and the best execution effect obtained after bidding negotiation to complete tasks for each sudden target according to the time window.

#### *5.3. Reassignment for Subtasks of Failed UAV*

During the simulation, the reconnaissance UAV V5, belonging to Subset 1, fails and loses the capability to perform tasks. The failure time is 97.5 s.

According to the assignment results in Section 5.1, the reconnaissance UAV V5 prepares to perform the reconnaissance subtask. The fault of V5 triggers dynamic task assignment, and the tasks (T2, s, 117.5, 62.14)→(T6, s, 167.0, 59.20) of V5 become dynamic tasks. The bidding values of each group of potential bidders (including V2, V4, V3, and V8) are shown in Table 9. The reassignment results of failed UAV subtasks obtained by the designed assignment approach are shown in Table 10.


**Table 9.** The bidding values of potential bidders for dynamic subtasks.

**Table 10.** Task assignment under the circumstance of UAV breakdown.


The task assignment results in Table 9 show that UAV V4 of Subset 1 and UAV V8 of Subset 2 replace UAV V5 and continue to perform the tasks, respectively.

Combined with the assignment results, the diagrams of UAV swarm task execution and the execution sequence under UAV V5 failure are shown in Figure 14.

**Figure 14.** (**a**) The execution sequence of the UAV swarm; (**b**) the diagram of the swarm performing tasks under V5 failure.

The explanation of the assignment result is as follows: after the reconnaissance, UAV V5 loses its mission capability, and its subtasks shall be preferentially executed by other reconnaissance UAVs or reconnaissance UAVs (i.e., V2 and V4) in Subset 1. It can be seen from Figure 14a that after the initial assignment in Section 4.1, the number of tasks that can be performed by UAV V4 is 1. Therefore, through the contract network within the group, it can undertake one reconnaissance subtask of V5. At this time, V2 is executing the attack subtask T1(a) according to the initial allocation results. During this execution, it crosses several threat areas, resulting in low survivability estimation, and the bidding value is negative, which is not suitable to continue to perform the subtask. Therefore, the leader UAV V2 of Subset 1 releases the information to Subset 2 for assistance through the contract network among the groups. Subset 2 conducts bidding negotiation through the contract network within the group and finally assigns UAV V8 to assist Subset 1 to take over reconnaissance subtask T6 (s).

#### *5.4. The Analysis of the Real-Time Performance of the Assignment Approach*

All the simulation experiments have been implemented on a personal PC; the parameters are Intel Core i5-5350U CPU @ 1.80 GHz 8 GB RAM, and the programming environment is MATLAB 2018b.

#### 5.4.1. Real-Time Performance with Different Problem Scales

The real-time performance of the assignment approach is mainly influenced by three factors: <sup>1</sup> the number of subsets of the UAV swarm; <sup>2</sup> the number of UAVs in each subset; <sup>3</sup> the number of targets.

To illustrate the impact of the factors above, the variable-controlling method is adopted. The details of simulation cases to be compared are in Table 11. Correspondingly, the comparison of assignment times in different cases is shown in Figure 15.

**Table 11.** The simulation cases to be compared.


**Figure 15.** Assignment time varies with (**a**) the number of subsets, (**b**) the number of UAVs of each subset, and (**c**) the number of targets.

According to the comparison among the simulation, on one hand, due to the slowly increasing tendency of the assignment time in Figure 15a,b, it can be concluded that the real-time performance of the proposed approach has low sensitivity with the size of the UAV swarm. On the other hand, comparing (a) with (b), the task assignment time is more sensitive with the number of subsets than with the number of UAVs of each subset.

Moreover, according to Figure 15c, the task assignment time is sensitive to the number of targets, which indicates that the performance of the approach may become worse with the number of targets increasing.

#### 5.4.2. Algorithm Comparison

In order to verify the effectiveness of the distributed approach based on contract network protocol (CNP) in solving the UAV swarm task assignment problem, it is compared with the centralized task assignment approach used in reference [10]. Scenarios are designed with different sizes of UAV swarm performing multiple task assignments. From the simulation results, both the centralized approach (ACO) and the distributed approach (CNP) are able to obtain the solution to the UAV swarm task assignment problem. However, the distributed approach (CNP) has an obvious advantage in terms of solution efficiency. The solution time for different sized assignment problems is shown in Table 12.


**Table 12.** The real-time performance between the proposed approach and ACO in reference [10].

**Remark**: *Generally, a global optimal solution to such problems can rarely be obtained. Therefore, feasible solutions to the task assignment problem obtained in the solving process are considered in general*.

As a heuristic algorithm, ACO searches in the solution space, learns from the results of each iteration, and finally converges to a feasible solution, which means that ACO requires a lot of computational resources and a much longer time. Simulation results show that this problem becomes more pronounced when the problem size (e.g., size of the UAV swarm, the number of targets) increases.

In contrast to heuristic algorithms such as ACO, CNP merely performs the optimization by bidding on each subset of UAVs as well as UAV individuals without the iterative process of searching for feasible solutions in a large solution space.

The results illustrate that, as a distributed assignment approach, the proposed method in this paper has advantages in real-time performance compared with the ACO proposed in reference [10], which proves the effectiveness of the method to some extent.

#### **6. Conclusions**

Aiming at the problem of the cooperative reconnaissance–attack task assignment of UAV swarms in complex environments, this paper proposes a distributed grouping cooperative dynamic task assignment approach by considering multiple constraints, realizes the effective assignment of cooperative reconnaissance–attack tasks to multiple targets, and optimizes the combat effectiveness of the swarm. The main conclusions include:

(1) The proposed extended CNP algorithm, which is based on the determination mechanism of cooperators and the selection mechanism of sequential tasks, with the bidding function considering the constraints of sequence, flight path, and threat; it can realize the reasonable assignment of reconnaissance and attack tasks on multiple targets under multiple constraints and the optimization of UAV swarm task execution efficiency.

(2) The proposed dynamic task assignment strategy based on the event-trigger mechanism constructs the overall architecture of cooperative dynamic task assignment for the distributed grouping of the UAV swarm and improves the adaptability of the swarm to the dynamic environment and sudden targets during task execution.

(3) Three typical simulation scenarios are designed. The simulation results show that the task assignment approach in this paper can solve the problem of cooperative sequential dynamic task assignment when the UAV swarm with subsets performs reconnaissance– attack tasks in a complex environment with incomplete situational awareness and sudden targets and realize reconnaissance and attack coverage on each target. Moreover, the realtime performance of the assignment approach has been analyzed, which indicates that the proposed approach has low sensitivity to the size of the UAV swarm.

**Author Contributions:** Conceptualization, B.Q. and D.Z.; methodology, B.Q.; software, B.Q.; validation, B.Q.; formal analysis, B.Q.; investigation, B.Q. and D.Z.; resources, D.Z. and S.T.; data curation, B.Q. and M.W.; writing—original draft preparation, B.Q.; writing—review and editing, B.Q.; visualization, B.Q.; supervision, D.Z. and S.T.; project administration, D.Z. and S.T; funding acquisition, D.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** The research was funded by the National Natural Science Foundation of China, grant number (61933010) and (61903301).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available upon request from the corresponding author.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **UAV-Cooperative Penetration Dynamic-Tracking Interceptor Method Based on DDPG**

**Yuxie Luo 1, Jia Song 1,\*, Kai Zhao <sup>1</sup> and Yang Liu <sup>2</sup>**


**\*** Correspondence: songjia@buaa.edu.cn

**Abstract:** The multi-UAV system has stronger robustness and better stability in combat. Therefore, the collaborative penetration of UAVs has been extensively studied in recent years. Compared with general static combat scenes, the dynamic tracking and interception of equipment penetration are more difficult to achieve. To realize the coordinated penetration of the dynamic-tracking interceptor by the multi-UAV system, the intelligent UAV model is established by using the deep deterministic policy-gradient algorithm, and the reward function is constructed using the cooperative parameters of multiple UAVs to guide the UAV to proceed with collaborative penetration. The simulation experiment proved that the UAV finally evaded the dynamic-tracking interceptor, and multiple UAVs reached the target at the same time, realizing the time coordination of the multi-UAV system.

**Keywords:** multi-UAV; deep deterministic policy gradient; cooperative penetration; dynamictracking-interceptor component

#### **1. Introduction**

Compared with traditional manned aerial vehicles, unmanned aerial vehicles (UAVs) can be autonomously controlled or remotely controlled, which have the advantages of low requirements on the combat environment and strong battlefield survivability, and they can be used to perform a variety of complex tasks [1,2]. Therefore, UAVs have been widely studied and applied [3]. With the continuous application of UAVs in the military field, the system composed of a single UAV has gradually revealed the problems of poor flexibility and low stability [4,5]. The cooperative combat method of using a multi-UAV system composed of multiple UAVs has become a new main research direction [6,7]. Under the conditions of the modernized and networked battlefield, the air cluster composed of multiple UAVs has the air power to continuously launch the required strikes, forcing the enemy to spend more resources and deal with more fighters, thereby enhancing the overall capability and overall performance of military combat confrontation.

Multi-UAV-cooperative penetration is one of the key issues to achieve multi-UAVcooperative combat. Multiple UAVs start from the same location or different locations, and finally arrive at the same place. At present, UAVs' penetration-trajectory-planning methods mainly include the A\* algorithm [8], the artificial potential field method [9,10], and the RRT algorithm [11]. Most of the application scenarios of these methods are environments with static no-fly zones and rarely consider dynamic threats. The A\* algorithm is a typical grid method. This type of method rasterizes the map for planning, but the size of the grid will have a greater impact on the result and is difficult to determine. Based on the artificial potential field law, it is easy to fall into the local optimum, leading to the unreachable target. When there is a dynamic-tracking interceptor in the environment, the environment information becomes complicated, and real-time planning requirements are put forward for UAVs. Therefore, traditional algorithms cannot meet the requirements.

**Citation:** Luo, Y.; Song, J.; Zhao, K.; Liu, Y. UAV-Cooperative Penetration Dynamic-Tracking Interceptor Method Based on DDPG. *Appl. Sci.* **2022**, *12*, 1618. https://doi.org/ 10.3390/app12031618

Academic Editors: Wei Huang and Dong Zhang

Received: 24 December 2021 Accepted: 1 February 2022 Published: 3 February 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Multi-UAV-cooperative penetration generally requires that multiple UAVs achieve time coordination to penetrate defenses according to different trajectories and finally reach the target area at the same time or according to a certain time sequence. When the UAVs depart from different locations, it will greatly increase the difficulty of collaboration. The multi-UAV-coordination algorithm can be improved based on the traditional single-UAV-penetration algorithm adapted to the multi-UAV environment. Chen uses the optimal-control method to improve the artificial potential-field method to achieve multi-UAV coordination [11,12], and Kothari improves the RRT algorithm to achieve multi-UAV coordination [13]. At the same time, there are a large number of methods for cooperative control of UAVs based on the graph theory [3]. Li proposed a multi-UAV-collaboration method based on the graph theory [14]. Ruan proposed a multi-UAV-coordination method based on multi objective optimization [15]. The above methods all realize the coordination of multiple UAVs, but their algorithms lack the research on dynamic environments and cannot adapt to the complex and dynamic battlefield environment. Aimed at the environment with dynamic threats, this paper proposes a method based on deep reinforcement learning to achieve multi-UAV-cooperative penetration.

Reinforcement learning is an important branch of machine learning. Its main feature is to evaluate the action policy of the agent based on the final rewards and through the interaction and trial and error between the agent and the environment. Reinforcement learning is a far-sighted machine-learning method that considers long-term rewards [16,17]. It is often used to solve sequential decision-making problems. Reinforcement learning can not only be applied in a static environment but also when the parameters of the environment are constantly changing, and the agent can also be applied in a dynamic environment [16–18]. The research of reinforcement learning is mostly concentrated in the field of a single agent, but there is also a large body of research on reinforcement-learning algorithms for multiagent systems. There is also much research on applying reinforcement learning to UAVs. Pham successfully applied deep reinforcement learning to UAVs to realize autonomous navigation of UAVs [19]. Wang also studied the autonomous navigation of UAVs in large, unknown, and complex environments based on deep reinforcement learning [20]. Wang applied reinforcement learning to the target search and tracking of UAVs [21]. Based on deep reinforcement learning, Yang studied the task scheduling problem of UAV clusters and solved the application problem of reinforcement learning in multi-UAV systems [22]. Through the investigation of relevant literature, it can be found that the application of deep reinforcement learning to multi-UAV systems is a feasible method, which can be used to achieve complex multi-UAV-system tasks and has great research potential.

The main work of this paper uses the multi-UAV-cooperative penetration dynamictracking interceptor as the scenario. Based on the deep reinforcement-learning DDPG algorithm, we establish the intelligent UAV model and realize the multi-UAV-cooperative penetration dynamic-tracking interceptor by designing the reward function related to coordination and penetration. The simulation experiment results show that the trained multi-UAV system can achieve cooperative attack tasks from different initial locations, which proves the application potential of artificial intelligence methods, such as reinforcement learning in the implementation of coordinated tasks in UAV clusters.

#### **2. Problem Statement**

#### *2.1. Motion Scene*

This paper solves the problem of multi-UAV-cooperative penetration of dynamic interceptors. We assume multiple UAVs are respectively numbered as *L* = {1, 2, . . . , *n*}. The scene is set as a two-dimensional engagement plane. Figure 1 is the schematic diagram of the motion scene of the multi-UAVs.

**Figure 1.** The motion scene of the multi-UAVs.

Based on reinforcement learning algorithms, the following are the requirements:


#### *2.2. UAV Movement Model*

First, a two-dimensional movement model of the UAV is established. Figure 2 is the schematic diagram of the movement of the UAV. It is assumed that the linear velocity of the UAV is constant, and the angular velocity, *ω*, is a continuously variable value. It is assumed that the angle between the movement direction of the UAV and the *x*-axis is the azimuth angle, *θ*.

**Figure 2.** Schematic diagram of UAV movement.

The movement of the UAV is divided into *x* and *y* directions. First, the current azimuth angle is obtained by integrating the angular velocity of the UAV, and then the velocity is decomposed on the coordinate axis by the azimuth angle, *θ*, and finally, the position information of the UAV is obtained through integration. The mathematical model is shown in (1):

$$\begin{cases} \quad \theta = \theta\_0 + \int\_0^t \omega dt\\ \quad v\_x = v \sin \theta \\\quad v\_y = v \cos \theta \\\ x = x\_0 + \int\_0^t v\_x dt \\\ y = y\_0 + \int\_0^t v\_y dt \end{cases} \tag{1}$$

#### *2.3. Dynamic-Interceptor Design*

Compared with the static environment, the position of the dynamic-interceptor changes in real time in the environment, so the UAV is required to be able to perform real-time planning. In this paper, the dynamic interceptor is defined as a tracking interceptor according to the proportional-guided pursuit law. Compared with most common dynamic interceptors with simple motion rules, such as interceptors that cycle in a uniform linear motion or circular motion, the tracking interceptor has stronger uncertainty, and it is difficult to evade by predicting the motion. The movement requirements of the multi-UAV system are much higher. In this paper, it is assumed that the linear velocity of the tracking interceptor is constant, and the angular velocity is calculated by proportional guidance. The schematic diagram of its movement is shown in Figure 3.

**Figure 3.** Schematic diagram of interceptor's movement.

The basic principle of the proportional-guidance method is to make the interceptor's rotational angular velocity proportional to the line-of-sight angular velocity. Next, the proportional-guidance mathematical model of the interceptor is introduced.

Assuming that the position of the interceptor is *xb*, *yb*, the speed is *vxb* , *vyb* , the position of the UAV is *x*, *y*, and the speed is *vx*, *vy*, and the relative position and speed of the interceptor to the UAV are shown in (2):

$$\begin{cases} \begin{aligned} \chi\_r &= \chi\_b - \chi \\ y\_r &= y\_b - \chi \\ \upsilon\_{\chi\_r} &= \upsilon\_{\chi\_b} - \upsilon\_x \\ \upsilon\_{y\_r} &= \upsilon\_{y\_b} - \upsilon\_y \end{aligned} \tag{2}$$

The interceptor's line of sight angle to the UAV can be obtained as:

$$q = \arctan\left(\frac{\mathbf{x}\_r}{y\_r}\right) \tag{3}$$

To obtain the line-of-sight angular velocity, (3) is derived, as shown in (4):

$$\dot{q} = \frac{v\_{yr}x\_r - v\_{x\_r}y\_r}{x\_r^2 + y\_r^2} \tag{4}$$

The rotational angular velocity of the dynamic-tracking interceptor can be obtained by (5):

$$
\omega\_b = K\dot{q} \tag{5}
$$

*K* is the proportional guidance coefficient, taking *K* = 2.

Based on the angular velocity of rotation calculated by the proportional guidance, the interceptor performs a two-dimensional movement according to the angular velocity of the rotation obtained by the proportional guidance. The movement model is similar to that of a UAV: ⎧

$$\begin{cases} \begin{aligned} \theta\_b &= \theta\_{b0} + \int\_0^t \omega\_b dt \\ \upsilon\_{bx} &= \upsilon\_b \sin \theta\_b \\ \upsilon\_{by} &= \upsilon\_b \cos \theta\_b \\ \mathbf{x}\_b &= \mathbf{x}\_{b0} + \int\_0^t \upsilon\_{bx} dt \\ y\_b &= y\_{b0} + \int\_0^t \upsilon\_{by} dt \end{aligned} \end{cases} \tag{6}$$

#### **3. Deep Deterministic Policy-Gradient Algorithm**

The DDPG algorithm is a branch of reinforcement learning [5]. The basic process of reinforcement-learning training is that the agent performs an action based on the current observation state. This action acts on the agent's training environment and returns a reward and a new state observation. The goal of training is to maximize the final reward. Reinforcement learning does not need to give any artificial strategies and guidance during training but only needs to give the reward function when the environment is in various states. This is also the only part of training that can be adjusted artificially. Figure 4 shows the basic process of reinforcement learning.

**Figure 4.** The basic process of reinforcement learning.

The DDPG algorithm is an actor-critic framework algorithm that solves the problem of applying reinforcement learning in continuous space. There are two networks in the DDPG algorithm, namely, the state-action value function network *Q*(*s*, *a <sup>θ</sup>Q*) using *<sup>θ</sup><sup>Q</sup>* parameters and the policy network *<sup>μ</sup>*(*s*|*θμ*) using *<sup>θ</sup><sup>μ</sup>* parameters. At the same time, two concepts are introduced, target network and experience replay. When the value function network is updated, the current value function is used to fit the future state-action value function. If both state-action value functions use the same network, it is difficult to fit during training. Therefore, the concept of the target network is introduced. The target network is used as

the future state-action value function, which is the same as the state-action value function network to be updated, except that it is not updated in real time but is updated according to the state-action value function network when the state-action value function network is updated to a certain extent. The policy network also adopts the same training method in DDPG. The experience replay is a function of storing state transfer (*st*, *at*,*rt*,*st*+1), and it will be stored in the experience replay pool every time the agent performs an action that causes the state to transfer. When the value function is updated, it will not be updated directly according to the action of the current policy, but the state transition value will be extracted from the experience playback pool for updating. The advantage of such an update is that the training and learning of the network is more efficient.

Before training, the value function network, *Q*(*s*, *a <sup>θ</sup>Q*), and the policy network, *<sup>μ</sup>*(*s*|*θμ*), are first randomly initialized, and then the target network, *<sup>Q</sup>* and *<sup>μ</sup>* , are initialized. It is also necessary to initialize an action random noise, N , which is conducive to the agent's exploration. During training, the agent selects and executes actions, *at* <sup>=</sup> *<sup>μ</sup>*(*st* <sup>|</sup> *<sup>θ</sup>μ*) <sup>+</sup> <sup>N</sup>*t*, based on the current policy network and action noise and receives rewards, *rt*, and new state observations, *st*+1, based on environment feedback. The state transition (*st*, *at*,*rt*,*st*+1) is stored in the experience replay pool. After that, N state transitions (*si*, *ai*,*ri*,*si*+1) are randomly selected to update the value function network. The principle of updating the value function network is to minimize the loss function. The mathematical expression of the loss function is as follows:

$$L = \frac{1}{N} \sum\_{i} \left( y\_i - Q\left(s\_{i\prime} a\_i \mid \theta^Q \right) \right)^2 \tag{7}$$

where *yi* = *ri* + *γQ si*+1, *μ si*+<sup>1</sup> <sup>|</sup> *<sup>θ</sup><sup>μ</sup>* <sup>|</sup> *<sup>θ</sup><sup>Q</sup>* , *yi* is only related to *θQ*.

Assuming the objective function of training is the following:

$$J(\theta^{\mu}) \;= E\_{\theta^{\mu}} \left[ r\_1 + \gamma r\_2 + \gamma^2 r\_3 + \dots \; \gamma^N r\_N \right] \tag{8}$$

where *γ* is the discount factor. The policy network is updated according to the gradient of the objective function, and its mathematical expression is as follows:

$$\nabla\_{\theta^{\mu}} I \approx \frac{1}{N} \sum\_{i} \nabla\_{a} Q \left( s, a \mid \theta^{Q} \right) \Bigg|\_{s=s\_{i}, a=\mu(s\_{i})} \nabla\_{\theta^{\mu}} \mu(s \mid \theta^{\mu}) \Bigg|\_{s\_{i}} \tag{9}$$

After training and finally updating the target network, the mathematical expression is as follows:

$$\begin{cases} \begin{array}{c} \theta^{Q'} \leftarrow \tau \theta^Q + (1 - \tau)\theta^{Q'} \\ \theta^{\mu'} \leftarrow \tau \theta^{\mu} + (1 - \tau)\theta^{\mu'} \end{array} \end{cases} \tag{10}$$

To extend DDPG to a multi-UAV system, multiple actors and a critic must exist in the system. During each training, the value function network evaluates the policies of all UAVs in the environment, and the UAVs update their respective policy networks based on the evaluation and independently choose to execute actions. Figure 5 shows the structure of the multi-UAV DDPG algorithm.

**Figure 5.** Multi-UAV DDPG algorithm.

The algorithm design also needs to construct the UAV's action space, state space, reward function, and termination conditions. In this paper, all UAVs are tested and simulated using small UAV models for the purpose of preliminary verification of the algorithm.

#### *3.1. Action-Space Design*

It can be seen from the motion model of the UAV that the action the UAV can perform is to change the angular velocity so the action space of multiple UAVs is designed as *A* = {*ω*1, *ω*2,..., *ωn*}, which is the collection of the angular velocity of multiple UAVs.

#### *3.2. State-Space Design*

To realize the coordinated penetration of UAVs, the design of the state space should include the UAVs' positions, *xi*, *yi*, speed, *vxi* , *vyi* , and the central position of the target area, *xt*, *yt*. At the same time, it is necessary to introduce the state observation of the interceptor position, *xb*, *yb*. Therefore, the state space is set to *<sup>S</sup>* = *xi*, *yi*, *vxi* , *vyi* , *xt*, *yt*, *xb*, *yb* , *i* ∈ *L*.

#### *3.3. Termination Conditions Design*

The termination conditions are divided into four items, namely, out of bounds, collision, timeout, and successful arrival.


the two, *db* = (*x* − *xb*) <sup>2</sup> + (*<sup>y</sup>* <sup>−</sup> *yb*) <sup>2</sup> <sup>≤</sup> *dcollision* , it is regarded as a mission failure; an end signal and a failure signal are given.


When any UAV in the environment finishes training, all UAVs finish training and give a failure or success signal according to the distance from the target point.

#### *3.4. Reward-Function Design*

The sparse reward problem is a common problem when designing the reward function. This problem will affect the training process of the UAV, prolong the training time of the UAV and even fail to achieve the training goal. To better achieve collaborative tasks and solve the problem of sparse reward, the reward-function design is divided into four parts, namely, the distance-reward function, *Rd*, which is related to the distance between the UAV and the target, the cooperative reward, *Rco*, which is used to constrain the position of the UAV, the mission success reward, *Rs*, and the mission failure reward, *Rf ail*. The reward function is linearized to improve the efficiency of UAV cluster training. One of the UAVs is an example to introduce the design of the reward function.

Assume *di* = (*xi* − *xt*) <sup>2</sup> + (*yi* <sup>−</sup> *yt*) 2 , *i* ∈ *L* represents the distance between one UAV and the target area, *dtarget*, representing the distance between the UAV's initial location and the target area.

The distance reward, *Rd*, is related to the distance from the UAV to the target area. The closer the distance, the greater the distance-reward value. This type of reward is the key reward on whether the UAV can reach the target area. The specific form is shown in (11):

$$R\_d = 0.6 - d\_i / d\_{taryet} \tag{11}$$

The cooperative reward is related to the cooperative parameters in the UAV cluster. Here, the difference between the farthest and closest distance between the UAV and the target area is selected as the coordination parameter. Its specific expression is shown in (12). When there are two UAVs in the cluster, the distribution diagram is shown in Figure 6.

$$R\_{co} = R\_d \times \left(1 - (d\_{max} - d\_{min}) / d\_{target}\right) \tag{12}$$

where *dmax* = {*d*1, *d*2,..., *dn*}*max*, *dmin* = {*d*1, *d*2,..., *dn*}*min*, respectively, represent the maximum and minimum distances between the UAV and the target area in the environment. It can be seen from the mathematical expression and distribution diagram that there are two major distribution trends for synergistic rewards. When the maximum-distance difference of the UAV cluster is smaller, its value is larger, which will lead the UAVs to move towards time coordination. When the UAV is closer to the target area, its value is larger, and the UAV will be guided to reach the target area.

**Figure 6.** Co-reward distribution.

When the UAV receives the success signal and finally reaches the target area, the mission is successful, and it will give a success reward. The success reward is also related to the farthest distance between the UAV cluster and the target point. The closer the distance, the greater the success reward, as shown in (13).

$$R\_{fail} = -1000\tag{13}$$

When the final mission of the UAV fails, that is, when it is intercepted by an interceptor and fails to reach the target area, it will give a negative reward of failure, as shown in (14).

$$R\_s = 250 + 2500 \,/d\_{\text{max}} \tag{14}$$

By linearly superimposing the above multiple reward functions, the final reward function can be obtained. The reward function, R, of the UAV is shown in (15).

$$R = \beta\_1 R\_d + \beta\_2 R\_{co} + \beta\_3 R\_s + \beta\_4 R\_{fail} \tag{15}$$

where *β*<sup>1</sup> + *β*<sup>2</sup> + *β*<sup>3</sup> + *β*<sup>4</sup> = 1.

#### **4. Experimental-Simulation Analysis**

#### *4.1. Simulation-Scene Settings*

Based on environment modeling, a simulation experiment of multi-UAV-cooperative penetration was carried out. To simplify the training, the scene and the speed of the UAV are all set to smaller values. The number of UAVs in the cluster is set to two. In the environment, there are as many dynamic interceptors as there are UAVs. Each dynamic interceptor is responsible for the tracking of an unmanned aerial vehicle, that is, each interceptor calculates the angular velocity of the proportional guidance rotation for an unmanned aerial vehicle. The initial positions of the two interceptors are at the center of the target point, and the radius of the target area is set to 60 m. Table 1 shows the simulation parameters during training.


**Table 1.** Simulation Parameters.

During the training process, the learning rate of the actor network and the critic network are set to *α*1= 0.0001,*α*<sup>2</sup> = 0.001, the discount factor is set to *γ* = 0.9, and the action noise is set to 0.3.

#### *4.2. Simulation Results and Analysis*

After 10,000 trainings, the simulation results are shown in Figure 7.

Two UAVs can pass enough to bypass the dynamic interceptor and finally reach the target area at the same time, and the time to reach the target area will not exceed the simulation step (1 s).

Figures 8 and 9 shows the angular velocity change curves of UAVs and interceptors and the line-of-sight change curves of the interceptors.

**Figure 7.** Experiment Result.

**Figure 8.** The parameter change curve of UAV A and Interceptor A during training.

**Figure 9.** The parameter change curve of UAV B and Interceptor B during training.

It can be seen from the figure that the UAVs carried out a wide-angle exercise, the interceptor's line of sight to the UAVs continued to increase, and the interception failed.

Figure 10 shows the distance curve between the two UAVs and the interceptor. The black line at the bottom of the figure represents the distance captured by the interceptor.

**Figure 10.** The distance between the UAV and the interceptor.

It can be seen from the figure that the minimum distance between the two UAVs and the interceptors is above the black line, that is, they are not captured by the interceptors.

Figure 11 is the distance between the two UAVs and the target point. In the figure, there is a certain gap between the initial distance between the two UAVs and the target area, which is about 10 m.

**Figure 11.** The distance curve between the UAV and the target.

It can be seen from the figure that the final distance difference between the two UAVs and the target point is almost zero. At the beginning of the movement, the distance between the two UAVs and the target point is about 3 m. UAV A will deliberately go around to ensure that it is moving. Finally, the target point can be reached at the same time.

The simulation experiment shows that the DDPG algorithm can complete the cooperative penetration of the dynamic-tracking interceptor through the training of the UAV cluster, which meets the high-performance requirements of the multi-UAV system and is a method with huge application potential.

#### **5. Conclusions**

Based on the deep reinforcement-learning DDPG algorithm, this paper studies the UAV-cooperative penetration. Based on the mission scenario model, the UAV's action space, state space, termination conditions, and reward functions are designed. Through the training of the UAV, the coordinated penetration of multiple aircraft was realized. The main conclusions of this paper are as follows:

The collaborative method based on the DDPG algorithm designed in this paper can achieve coordinated penetration between UAVs. After UAVs are trained, they can coordinate to evade interceptors without being intercepted by them. Compared with traditional algorithms, the UAV's penetration performance is stronger, the applicable environment is more complex, and it has great application prospects.

The reinforcement learning collaboration method in this paper realizes the time collaboration between UAVs on the premise of exchanging state information between multiple aircrafts, and the movement of UAVs finally reaches the target area at the same time from different initial locations, achieving time coordination. However, this paper doesn't consider factors such as communication delays or failures between UAVs, which causes the UAV to receive the wrong information. This problem may be solved by predicting UAVs' states and information fusion. Further research can be carried out in the follow-up work.

This paper only considers the movement of the engagement plane. In the follow-up work, the movement can be expanded to three dimensions, and multi-UAV-cooperative penetration in a 3D environment will put forward higher requirements on the algorithm.

**Author Contributions:** Conceptualization, J.S. and Y.L. (Yuxie Luo); methodology, Y.L. (Yuxie Luo) and K.Z.; software, Y.L. (Yuxie Luo); validation, Y.L. (Yuxie Luo), K.Z. and Y.L. (Yang Liu); formal analysis, J.S.; investigation, Y.L. (Yuxie Luo); resources, K.Z. and Y.L. (Yang Liu); data curation, J.S.; writing—original draft preparation, Y.L. (Yuxie Luo); writing—review and editing, Y.L. (Yuxie Luo); visualization, Y.L. (Yuxie Luo); supervision, J.S.; project administration, J.S.; funding acquisition, J.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by National Natural Science Foundation of China (General Program) grant number (62073020).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Optimal Unmanned Ground Vehicle—Unmanned Aerial Vehicle Formation-Maintenance Control for Air-Ground Cooperation**

**Jingmin Zhang 1,2, Xiaokui Yue 1,\*, Haofei Zhang <sup>2</sup> and Tiantian Xiao <sup>2</sup>**


**Abstract:** This paper investigates the air–ground cooperative time-varying formation-tracking control problem of a heterogeneous cluster system composed of an unmanned ground vehicle (UGV) and an unmanned aerial vehicle (UAV). Initially, the structure of the UAV–UGV formation-control system is analyzed from the perspective of a cooperative combat system. Next, based on the motion relationship between the UAV–UGV in a relative coordinate system, the relative motion model between them is established, which can clearly reveal the physical meaning of the relative motion process in the UAV–UGV system. Then, under the premise that the control system of the UAG is closed-loop stable, the motion state of the UGV is modeled as an input perturbation. Finally, using a linear quadratic optimal control theory, a UAV–UGV formation-maintenance controller is designed to track the reference trajectory of the UGV based on the UAV–UGV relative motion model. The simulation results demonstrate that the proposed controller can overcome input perturbations, modelconstant perturbations, and linearization biases. Moreover, it can achieve fast and stable adjustment and maintenance control of the desired UAV–UGV formation proposed by the cooperative combat mission planning system.

**Keywords:** UAV–UGV; cooperative engagement; optimal control; time-varying output formation; formation keeping

#### **1. Introduction**

In the wake of rapid development of information technology (IT), artificial intelligence (AI), and unmanned equipment, as well as their common application in the military field, the traditional combat style, in which each platform uses its own sensors and weapon systems to detect, track, and strike targets individually, can no longer meet the needs of digital warfare. As an emerging combat style, cooperative operations can organically link various geographically dispersed sensors, command and control systems, and weapons systems into a cross-platform information network. In this manner, it is possible to connect and share battlefield information, obtain real-time situational awareness of the battlefield, and improve the integrated combat effectiveness. Cooperative combat has received increasing academic attention in recent years. However, current research on cooperative operations mainly focuses on multi-missile cooperative guidance [1–3], formation control [4–6], formation design [7–9], path planning [10–12], and multitarget assignment [13–15]. Research on air–ground cooperative operations is currently lacking.

Unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) are the two most representative objects in a cluster system. Although UAVs can quickly reconnoiter a wide area, they are limited by endurance and flight altitude, which prevents them from carrying out their given tasks in special environments. For UGVs, they are able to approach targets at a close range and have a high endurance, but are slower and have a limited

**Citation:** Zhang, J.;Yue, X.; Zhang, H. Xiao, T. Optimal Unmanned Ground Vehicle—Unmanned Aerial Vehicle Formation-Maintenance Control for Air-Ground Cooperation. *Appl. Sci.* **2022**, *12*, 3598. https://doi.org/ 10.3390/app12073598

Academic Editor: Yosoon Choi

Received: 6 March 2022 Accepted: 29 March 2022 Published: 1 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

field of view. The advantages and disadvantages of UAVs and UGVs are summarized in Table 1. For a UAV–UGV cooperative formation, they can make full use of the advantages of spatial distribution, parallel execution of tasks, functional distribution, and fault tolerance, so that they can complement each other to compensate for the shortcomings of a single type of object and effectively improve the outcome of cooperative operations. For example, in the reconnaissance of complex terrain conditions, combat reconnaissance vehicles are often unable to detect effectively owing to obstacle blockages in the surrounding environment, and may even suffer communication losses between multiple vehicles. Introducing multiple UAVs to allow for formation control to achieve companion-flight cooperative reconnaissance can provide a large range of environmental information for the overall reconnaissance decisions of ground vehicles. Additionally, UAVs can be used as a communication relay in the case of ground communication obstacles to realize the complementary advantages and cooperation of the cluster system [16,17].

**Table 1.** Advantages and disadvantages of UAVs and UGVs.


Based on this demand for engineering applications, many scholars have conducted research on the air–ground cooperative control between UAVs and UGVs. As an emerging research field, its theoretical knowledge and technical aspects involve the intersection of several disciplines and technical fields, such as embedded systems, aerodynamics, and control theory, which includes path tracking, formation clustering, automatic landing, and other research fields characterized by many research contents and difficulties [18,19]. This highlights that many problems need to be solved and explored in this area. In particular, the heterogeneity of UAVs and UGVs, for example, having different working fields, different physical models, and different technical indicators [20–22], poses new challenges to cooperative control between UAVs and UGVs [23,24].

Hence, this study investigates the air–ground cooperative time-varying formationkeeping control problem of a heterogeneous cluster system composed of a UGV and a UAV in the scenario of air–ground cooperative combat, which enables the UAV to track the motion trajectory of the UGV.

#### **2. Related Work**

Grocholsky et al. [25] proposed the use of an air–ground cooperative model to address the shortcomings of a single platform, e.g., the low accuracy of UAVs when locating ground targets and the narrow view of UGVs when observing distant obstacles. They designed a cooperative control framework and algorithm to search for and locate targets in a specified area, providing ideas for cooperative air–ground platforms to search for targets. Manyam et al. [26] considered the problem of communication constraints between two UAVs that cooperate to complete a mission. To address this issue, they proposed using UAVs to collect information and developed a branch-and-cut algorithm to solve the path-planning problem of UGVs and UAVs, offering a new idea for the cooperative path planning of UGVs and UAVs under communication constraints. In addition, they presented a cooperative reconnaissance and data-collection method for UGVs and UAVs. To address the problem that the line of sight of a ground vehicle's sensors is limited by the geometry of the environment, Peterson et al. [27] presented a method to capture aerial images of the mission area using multi-rotors, construct a terrain map, and then navigate the ground robot to the destination. This method provides a theoretical and practical basis for the application of air–ground cooperative systems for cooperative navigation in outdoor environments.

However, the aforementioned studies on air–ground cooperation focus mostly on image transmission, map construction, and target localization, and do not consider the impact of UAV–UGV formation on the overall quality of cooperative operations. In fact, when UGVs and UAVs perform cooperative combat missions, formation adjustments are required, including initial formation generation, formation maintenance, contraction, expansion, and reconfiguration. Therefore, research on the formation control of UGVs and UAVs is of great significance, and can even affect whether a combat mission can be successfully completed.

UAVs and UGVs have completely different physical structures, and their established kinematic and dynamic models are entirely different. Moreover, UAVs perform threedimensional movements in air, whereas UGVs perform two-dimensional movements on the ground. As a result, studying the time-varying formation-tracking control of heterogeneous cluster systems is a key problem to solve in cross-media heterogeneous collaboration, including air–ground collaboration, which has great theoretical research value and engineering significance.

Currently, common formation-control strategies include behavior-, virtual structure-, and consistency-based methods [28–30]. However, behavior-based formation methods rely on qualitative behavior rules, making it difficult to establish a quantitative model of the entire system. Consequently, such a method cannot guarantee the stability of the formation movement of an entire system. Virtual structure-based methods require centralized control by a central node, and cannot be implemented in a distributed manner. In addition, most existing methods can only achieve time-invariant formation configurations, whereas in practical applications, heterogeneous cluster systems must be able to dynamically adjust their formations in real time to cope with complex external environments and changes in tasks.

In the past, scholars have mainly used two methods to establish the relative motion models of UGVs and UAVs. One method is to directly analyze the geometric relationship of the motion of a single vehicle in a two-dimensional plane and establish a relative-motion model in a two-dimensional plane. The other is to derive the difference between the different vehicle positions in an inertial coordinate system and then obtain an expression of the relative motion state in the inertial space. This description cannot directly reflect the motion characteristics of UGVs and UAVs in a relative coordinate system, and their relative motion processes are insufficiently clear.

Based on the above research background, this study investigates the UAV–UGV formation-maintenance control problem for cooperative combat mission requirements. First, from the perspective of a UAV–UGV cooperative combat system, the architecture of the control system and the relationship between the sub-modules are analyzed. Subsequently, a relative motion model between the UGV and the UAV is established. The optimal UAV–UGV cooperative formation-maintenance control is then realized based on the proportional-integral (PI) optimal control theory. The main contributions of this study are as follows:


nonlinear-model linearization bias. This controller can potentially achieve fast and stable optimal control of UAV–UGV formation.

#### **3. Relative Kinematic-Equation Building for UAV–UGV Formations**

In this study, we investigate the UAV–UGV formation-maintenance control problem for cooperative combat mission requirements. Considering the actual motion state of a UGV as the input perturbation of the formation-maintenance controller, we assume that the control system of the UAV is closed-loop stable, i.e., the directions of the UAV velocity command, trajectory-inclination command, and trajectory-declination command can be stably followed. The three channels were set in the following first-order system [31]:

$$\begin{cases} \dot{V}\_f = -\frac{1}{\tau\_{vf}} \left( V\_f - V\_{fc} \right) \\ \theta\_f = -\frac{1}{\tau\_{\theta f}} \left( \theta\_f - \theta\_{fc} \right) \\ \psi\_{vf} = -\frac{1}{\tau\_{\psi rf}} \left( \psi\_{vf} - \psi\_{vfc} \right) \end{cases} \tag{1}$$

where *τv f* denotes the inertia time constant of the velocity control channel of the UAV, *τθ <sup>f</sup>* denotes the inertia time constant of the track-inclination control channel of the UAV, *τψv f* denotes the inertia time constant of the track-declination control channel of the UAV, *Vf* is the actual velocity magnitude of the UAV, *θ <sup>f</sup>* is the actual inclination angle of the UAV, *ψv f* is the actual declination angle of the UAV, *Vf c* is the commanded velocity magnitude of the UAV, *θ f c* is the commanded inclination angle of the UAV, and *ψvfc* is the commanded declination angle of the UAV.

To study the relative motion between the UGV and UAV, the following coordinate systems are first defined:

(1) Inertial coordinate system *O*1*X*1*Y*1*Z*1: The origin *O*<sup>1</sup> is located at a fixed point on the ground, the *O*1*X*<sup>1</sup> axis points to the target, the *O*1*Y*<sup>1</sup> axis is vertically upward, and the *O*1*Z*<sup>1</sup> axis forms a right-handed coordinate system with the first two axes.

(2) Relative coordinate system *OrXrYrZr*: The origin *Or* is located at the center of mass of the UGV, the *OrXr* axis points in the direction of the UGV velocity, the *OrYr* axis is in the vertical plane perpendicular to the *OrXr* axis, and the *OrZr* axis forms a right-handed coordinate system with the first two axes.

(3) Vehicle-body coordinate system *O*2*X*2*Y*2*Z*2: The origin *O*<sup>2</sup> is located at the center of mass of the UAV, the *O*2*X*<sup>2</sup> axis points in the direction of the UAV velocity, the *O*2*Y*<sup>2</sup> axis is in the vertical plane perpendicular to the *O*2*X*<sup>2</sup> axis, and the *O*2*Z*<sup>2</sup> axis forms a right-handed coordinate system with the first two axes.

In the relative coordinate system, according to Coriolis theory, the motions of the UGV and UAV are related as follows:

$$\frac{dr}{dt} = V\_{fr} - V\_{lr} = V\_I + \omega \times r \tag{2}$$

where *r* is the relative radius vector between the UGV and UAV; *V f r* and *Vlr* are the velocity vectors of the UAV and UGV in the relative coordinate system, respectively; *V<sup>r</sup>* is the derivative of the radius vector *r* in the relative coordinate system; *ω* is the angular velocity of rotation of the relative coordinate system in the inertial coordinate system.

To obtain the absolute velocity of the UGV and UAV in the relative coordinate system, the following transformation can be performed:

$$\begin{cases} \mathbf{V}\_{ft} = \Phi\_1^r \Phi\_2^l \mathbf{V}\_{f2} \\ \mathbf{V}\_{lr} = \mathbf{V}\_{l2} \end{cases} \tag{3}$$

where Φ*<sup>r</sup>* <sup>1</sup> is the conversion matrix from the inertial to the relative coordinate system; <sup>Φ</sup>*<sup>l</sup>* 2 is the conversion matrix from the UAV body coordinate system to the inertial coordinate system; *V <sup>f</sup>* <sup>2</sup> is the velocity vector under the UAV body coordinate system; *Vl*<sup>2</sup> is the velocity vector under the UGV body coordinate system. The values of each of these variables are as follows:

$$
\Phi\_1' = \begin{bmatrix}
\cos\theta\_l \cos\psi\_{vl} & \sin\theta\_l & -\cos\theta\_l \sin\psi\_{vl} \\
\sin\psi\_{vl} & 0 & \cos\psi\_{vl}
\end{bmatrix} \tag{4}$$

$$
\Phi\_2^I = \begin{bmatrix}
\cos\theta\_f \cos\psi\_{vf} & -\sin\theta\_f \cos\psi\_{vf} & \sin\psi\_{vf} \\
\sin\theta\_f & \cos\theta\_f & 0 \\
\end{bmatrix} \tag{5}
$$

$$\mathbf{V}\_{l2} = \begin{bmatrix} V\_l \\ 0 \\ 0 \end{bmatrix} \tag{6}$$

$$\mathbf{V}\_{f2} = \begin{bmatrix} V\_f \\ 0 \\ 0 \end{bmatrix} \tag{7}$$

where *Vl* denotes the velocity magnitude of the UGV in the inertial coordinate system; *θ<sup>l</sup>* denotes the inclination angle of the UGV; *ψvl* denotes the deflection angle of the UGV; *Vf* is the velocity magnitude of the UAV in the inertial coordinate system; *θ <sup>f</sup>* is the inclination angle of the UAV trajectory; *ψv f* is the deflection angle of the UAV trajectory. The relative velocities of the UGV and UAV in the relative coordinate system are expressed as follows:

$$\mathbf{V}\_{\mathbf{r}} = \begin{bmatrix} \ \dot{\mathbf{x}} \\ \ \dot{y} \\ \dot{z} \end{bmatrix} \tag{8}$$

where *x*, *y*, and *z* are the components of the position vector *r* of the UAV along each axis in the relative coordinate system. The angular velocity of rotation *ω* in the relative coordinate system with respect to the inertial space can be expressed as follows:

$$
\omega = \begin{bmatrix}
\psi\_{vl}\sin\theta\_l \\
\psi\_{vl}\cos\theta\_l \\
\theta\_l
\end{bmatrix} \tag{9}
$$

Then, based on Equation (2), the following equation can be obtained:

$$\mathbf{V}\_{fr} = \left(\mathbf{V}\_{fr} - \mathbf{V}\_{lr}\right) - \boldsymbol{\omega} \times \mathbf{r} \tag{10}$$

The relative motion relationship between the UAV and UGV in the three-dimensional plane can be obtained by combining Equations (2), (8), and (10).

$$\begin{cases} \begin{aligned} \dot{x} &= V\_f \cos \theta\_f \cos \theta\_l \cos \psi\_\varepsilon + V\_f \sin \theta\_f \sin \theta\_l - V\_l + y \dot{\theta}\_l - z \dot{\psi}\_{vl} \cos \theta\_l \\ \dot{y} &= -V\_f \cos \theta\_f \sin \theta\_l \cos \psi\_\varepsilon + V\_f \sin \theta\_f \sin \theta\_l - x \dot{\theta}\_l + z \dot{\psi}\_{vl} \sin \theta\_l \\ z &= V\_f \cos \theta\_f \sin \psi\_\varepsilon - y \dot{\psi}\_{vl} \sin \theta\_l + x \dot{\psi}\_{vl} \cos \theta\_l \\ \psi\_\varepsilon &= \psi\_{vl} - \psi\_{vf} \end{aligned} \tag{11}$$

#### **4. Optimal Control Modeling for the UAV–UGV Formation-Maintenance Controller**

The design goal of the UAV–UGV formation-maintenance controller is to produce flight commands *Vf c*, *θ f c*, *ψv f* for the UAV, to keep it at the desired relative distance from the UGV.

#### *4.1. Linearization of the Relative Equations of Motion*

During the UAV–UGV formation-maintenance process, assuming that *θ <sup>f</sup>* , *θl*, and *ψ<sup>e</sup>* = *ψvl* − *ψv f* can be considered as small quantities, while considering the state of the UGV as an input quantity, Equation (11) is converted to

$$\begin{cases} \dot{\mathbf{x}} = V\_f + V\_f \theta\_f \theta\_l - V\_l + \mathcal{y}\theta\_l - z\dot{\boldsymbol{\psi}}\_{vl} \\ \quad \dot{\mathbf{y}} = -V\_f \theta\_l + V\_f \theta\_f - x\dot{\theta}\_l + z\dot{\boldsymbol{\psi}}\_{vl}\theta\_l \\ \quad \dot{z} = V\_f \psi\_t - \mathcal{y}\dot{\psi}\_{vl}\theta\_l + x\dot{\boldsymbol{\psi}}\_{vl} \\ \quad \boldsymbol{\psi}\_t = \psi\_{vl} - \Psi\_{vf} \end{cases} \tag{12}$$

Using the small-perturbation linearization method, Equation (12) can be linearized as

$$\begin{cases} \dot{\mathbf{x}} = \dot{\theta}\_l \mathbf{y} - \dot{\boldsymbol{\Psi}}\_{vl} \mathbf{z} + \left( \mathbf{1} + \theta\_{f0} \theta\_l \right) V\_f + V\_{f0} \theta\_l \theta\_f - V\_l \\ \dot{\mathbf{y}} = -\dot{\theta}\_l \mathbf{x} + \dot{\boldsymbol{\Psi}}\_{vl} \theta\_l \mathbf{z} + \left( \theta\_{f0} - \theta\_l \right) V\_f + V\_{f0} \theta\_f \\ \dot{\mathbf{z}} = \boldsymbol{\Psi}\_{vl} \mathbf{x} - \dot{\boldsymbol{\Psi}}\_{vl} \theta\_l \mathbf{y} + \left( \boldsymbol{\Psi}\_{vl} - \boldsymbol{\Psi}\_{vf0} \right) V\_f - V\_{f0} \boldsymbol{\Psi}\_{vf} \end{cases} \tag{13}$$

where *vf* 0, *θ <sup>f</sup>* 0, and *ψv f* <sup>0</sup> are the state equilibrium points selected during linearization. Describing Equation (13) as a state-space form (SSF) yields

$$\begin{cases} \begin{array}{l} \dot{X} = AX + B\mathcal{U} + \mathcal{B}W \\ \mathcal{Y} = CX \end{array} \end{cases} \tag{14}$$

where

$$\mathbf{A} = \begin{bmatrix} 0 & \dot{\theta}\_L & -\dot{\psi}\_{vl} \\ -\dot{\theta}\_I & 0 & \theta\_L \dot{\psi}\_{vl} \\ \dot{\psi}\_{vl} & -\theta\_L \dot{\psi}\_{vl} & 0 \end{bmatrix} \tag{15}$$

$$\mathbf{B} = \begin{bmatrix} 1 + \theta\_{f0}\theta\_l & V\_{f0}\theta\_l & 0\\ \theta\_{f0} - \theta\_l & V\_{f0} & 0\\ \Psi\_{vl} - \Psi\_{vf0} & 0 & -V\_{f0} \end{bmatrix} \tag{16}$$

$$\mathbf{C} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \tag{17}$$

$$
\mathcal{B} = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} \tag{18}
$$

where *<sup>X</sup>* <sup>=</sup> *xyz <sup>T</sup>* denotes the state variable; *<sup>U</sup>* <sup>=</sup> *Vf <sup>θ</sup> <sup>f</sup> <sup>ψ</sup>v f <sup>T</sup>* denotes the control variable of the formation controller, which is the motion state of the UAV; *<sup>Y</sup>* = *xyz <sup>T</sup>* denotes the output variable; the perturbation variable is the velocity *W* = *V<sup>l</sup>* of the UGV.

#### *4.2. Optimal Formation-Maintenance Controller Design*

According to the state-space equations of the system, the controller design problem for UAV–UGV formations is a non-zero set-point output-regulation problem with constant or slow-varying perturbations. Two steps can be taken to solve this problem: the first step is to design an optimal output regulation controller that can solve constant or slow varying perturbations. The second step is to design an optimal controller to solve the non-zero set-point problem so that the UAV–UGV formation can be stably maintained at the desired formation position.

#### 4.2.1. PI Optimal Formation-Maintenance Controller Design

The perturbed system of type shown in Equation (14), with constant or slowly varying values, can be overcome using a PI type of control, similar to that in classical control theory. This section analyzes the problem of designing PI optimal controllers with zero values as the output regulation set points.

First, the system perturbation is transformed into a perturbation of the control input, at which point, system (14) becomes

$$
\dot{X} = AX + B(\mathcal{U} + \mathcal{W}) \tag{19}
$$

where *W*˜ denotes the transformed form of the original system perturbation. The following equation can be obtained from system (14):

$$
\bar{\mathcal{W}} = \mathcal{B}^{-1} \bar{\mathcal{B}} \mathcal{W} \tag{20}
$$

Next, we design the PI optimal controller of the perturbed system (19), which can be augmented as

$$\begin{cases} \begin{aligned} \dot{X}\_1 &= A\_1 X\_1 + B\_1 \mathcal{U}\_1 \\ Y\_1 &= \mathcal{C}\_1 X\_1 \end{aligned} \end{cases} \tag{21}$$

where

$$X\_1 = \begin{bmatrix} X \\ \mathbf{U} + \tilde{\mathbf{W}} \end{bmatrix} \tag{22}$$

$$A\_1 = \begin{bmatrix} A & B \\ \mathbf{0} & \mathbf{0} \end{bmatrix} \tag{23}$$

$$\mathbf{B}\_1 = \begin{bmatrix} \mathbf{0} \\ I \end{bmatrix} \tag{24}$$

$$\mathbf{C}\_{1} = \begin{bmatrix} I & \mathbf{0} \ \end{bmatrix} \tag{25}$$

For the augmented system (21), the quadratic performance index is given as

$$J = \int\_{t\_0}^{t\_f} \left[ \mathbf{X}\_1^T \mathbf{Q}\_1 \mathbf{X}\_1 + \mathbf{U}\_1^T \mathbf{R}\_1 \mathbf{U}\_1 \right] dt \tag{26}$$

where *Q*<sup>1</sup> is the state-regulation power-coefficient matrix of the augmented system and *R*<sup>1</sup> is the control energy power coefficient matrix of the augmented system. When the system (21) is controllable, according to optimal control theory, the minimum optimal control to achieve the quadratic performance index (26) is

$$
\mathcal{U}\_1^\* = -\mathcal{R}\_1^{-1} \mathcal{B}\_1^T \mathcal{P} \mathcal{X}\_1 \tag{27}
$$

Let us further analyze the quadratic performance index (26). Based on the composition of the state variable *X*1, *Q*<sup>1</sup> can be decomposed as follows:

$$Q\_1 = \begin{bmatrix} Q & \mathbf{0} \\ \mathbf{0} & \mathbf{R} \end{bmatrix} \tag{28}$$

Then, we can have

$$\mathbf{X}\_1^T \mathbf{Q}\_1 \mathbf{X}\_1 = \mathbf{X}^T \mathbf{Q} \mathbf{X} + (\mathbf{U} + \tilde{\mathbf{W}})^T \mathbf{R} (\mathbf{U} + \tilde{\mathbf{W}}) \tag{29}$$

where *Q* is the state-regulation power-coefficient matrix of the original system (14) and *R* is its control energy power coefficient matrix.

According to system (14), we can have

$$\mathbf{X}^T \mathbf{Q} \mathbf{X} = \mathbf{X}^T \mathbf{C}^T \mathbf{Q}\_Y \mathbf{C} \mathbf{X} = \mathbf{Y}^T \mathbf{Q}\_Y \mathbf{Y} \tag{30}$$

where *Q<sup>Y</sup>* is the output regulation power-coefficient matrix of the original system.

The optimal quadratic state-regulation performance index described in Equation (26) becomes the optimal quadratic output-regulation problem.

$$J = \int\_{t\_0}^{t\_f} \left[ \mathbf{Y}^T \mathbf{Q}\_Y \mathbf{Y} + (\mathbf{U} + \bar{\mathbf{W}})^T \mathbf{R} (\mathbf{U} + \bar{\mathbf{W}}) + \mathbf{U}\_1^T \mathbf{R}\_1 \mathbf{U}\_1 \right] dt\tag{31}$$

Assuming that the perturbation *W*˜ is a slow variable, it follows that

˙

$$
\tilde{\mathbf{W}} = \mathbf{0} \tag{32}
$$

Then, we have

$$\mathcal{U}\_1 = \dot{\mathcal{U}} \tag{33}$$

Thus, the quadratic optimal output performance index of the system (21) can be described as

$$J = \int\_{t\_0}^{t\_f} \left[ \mathbf{Y}^T \mathbf{Q}\_Y \mathbf{Y} + (\mathbf{\tilde{M}} + \mathbf{\tilde{W}})^T \mathbf{R} (\mathbf{\tilde{M}} + \mathbf{\tilde{W}}) + \dot{\mathbf{\tilde{U}}}^T \mathbf{R}\_1 \dot{\mathbf{\tilde{U}}} \right] dt\tag{34}$$

For the optimal control quantity (27), the following equation can be obtained by combining it with (33):

$$
\hat{\mathcal{U}}\_1^\* = \dot{\hat{\mathcal{U}}}^\* = -\mathcal{R}\_1^{-1} \mathcal{B}\_1^T P \mathcal{X}\_1 \tag{35}
$$

where *P* is the minimum solution of the Riccati optimal control equation for achieving the performance index in Equation (26). Then, Equation (35) can be further expanded as

$$\begin{split} \dot{\mathcal{U}}^{\*} &= -\mathcal{R}\_{1}^{-1} \begin{bmatrix} \mathbf{0} \\ I \end{bmatrix} \begin{bmatrix} \mathbf{P}\_{11} & \mathbf{P}\_{12} \\ \mathbf{P}\_{21} & \mathbf{P}\_{22} \end{bmatrix} \begin{bmatrix} \mathbf{X} \\ \mathbf{\mathcal{U}}^{\*} + \dot{\mathbf{W}} \end{bmatrix} \\ &= -\mathcal{R}\_{1}^{-1} \mathbf{P}\_{21} \mathbf{X} - \mathcal{R}\_{1}^{-1} \mathbf{P}\_{22} \left( \dot{\mathbf{U}}^{\*} + \dot{\mathbf{W}} \right) \end{split} \tag{36}$$

The above equation is an indicator of the minimum optimal control amount required to achieve the zero-valued set point in Equation (26).

#### 4.2.2. Design of Non-Zero Set-Point OPTIMAL Controllers

To maintain the output quantity *<sup>Y</sup>* = *xyz <sup>T</sup>* at a non-zero set point, the state and control input of the system at the steady state must also be non-zero. The optimal control input based on Equation (35) should take the following form.

$$
\dot{\mathcal{U}}^\* = -\mathcal{R}\_1^{-1} \mathcal{B}\_1^T P \mathcal{X}\_1 + \mathcal{U}\_0' = -\mathcal{K} \mathcal{X}\_1 + \mathcal{U}\_0' \tag{37}
$$

where *U* <sup>0</sup> is the additional control quantity that stabilizes the non-zero set point. The output equation of the augmented system is

$$\mathcal{Y} = \mathcal{C}\_1 \mathcal{X}\_1 \tag{38}$$

From the expanded state *X*1, the augmented system is

$$\mathbf{X}\_1 = \begin{bmatrix} x \\ y \\ z \\ V\_f + \tilde{W}(1) \\ \tilde{W}(2) \\ \tilde{W}(3) \end{bmatrix} \tag{39}$$

To ensure that the output of the augmented system is consistent with the output of the original system, the output matrix of the augmented system is set to

$$\mathbf{C}\_{1} = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \end{bmatrix} \tag{40}$$

By substituting the control quantity (37) into the augmented system (21), we obtain

$$
\dot{X}\_1 = (A\_1 - B\_1 \mathbf{K})X\_1 + B\_1 \mathbf{U}\_0' \tag{41}
$$

Because the closed-loop system (41) is asymptotically stable, the following condition should exist at steady state:

$$\lim\_{t \to \infty} \dot{\mathcal{X}}\_1(t) = 0 \tag{42}$$

This leads to the following asymptotically stable form.

$$\mathbf{0} = (A\_1 - B\_1 \mathbf{K})\mathbf{X}\_{10} + B\_1 \mathbf{U}\_0' \tag{43}$$

where *X*<sup>10</sup> is the steady-state value of state *X*1.

If all eigenvalues of matrix (*A*<sup>1</sup> − *B*1*K*) are in the left half-complex plane, then (*A*<sup>1</sup> − *B*1*K*) is a non-singular matrix, and the following equation can be obtained by combining with Equation (43).

$$X\_{10} = -\left(A\_1 - B\_1 \mathbf{K}\right)^{-1} B\_1 \mathbf{U}\_0' \tag{44}$$

Then, based on Equation (38), the relationship between the non-zero set point *Y*∗ <sup>10</sup> and the steady-state value *X*<sup>10</sup> of state *X*<sup>1</sup> is obtained as

$$Y\_{10}^\* = \mathcal{C}\_1 \mathcal{X}\_{10} \tag{45}$$

Finally,

$$\dot{\mathbf{U}}^{\*} = -\mathbf{R}\_{1}^{-1} \mathbf{B}\_{1}^{T} \mathbf{P} \mathbf{X}\_{1} + \left[ \mathbf{C}\_{1} \left( -\mathbf{B}\_{1} \mathbf{R}\_{1}^{-1} \mathbf{B}\_{1}^{T} \mathbf{P} - \mathbf{A}\_{1} \right)^{-1} \mathbf{B}\_{1} \right]^{-1} \mathbf{Y}\_{10}^{\*} \tag{46}$$

can be used to achieve optimal control of the relative motion system between the UGV and the UAV, as described in Equation (14); thus, maintaining the UAV–UGV formation in the desired state.

#### **5. Simulation Analysis**

In this section, we describe the numerical simulations performed for the UAV–UGV formation-maintenance control system consisting of a UGV with a given motion state, a UAV with a first-order stability control system, and an optimal maintenance controller. The inertial time constants of the three channels of the UGV and UAV are assumed to be *τv f* = 3 s, *τθ <sup>f</sup>* = 1 s, and *τψv f* = 1 s. The initial positions of the UAV and UGV in the inertial coordinate system are chosen as (−5, 20, 10) m and (0, 0, 0) m, respectively. It is assumed that the UAV–UGV formation moves for 150 s, and the simulation step size is taken to be 0.01 s. Under the action of Equation (43), the entire motion of the UAV is adjusted and maintained from the initial position to the desired position at (−5, 20, 10) m. In all simulations, the required control system power coefficient matrix in Equation (34) appears as the matrices shown in Equation (47). Considering the feasible motion envelope of the

UAV, limits are set for the control amount of the formation controller and commanded motion state of the UAV, as shown in Equation (52).

$$\mathbf{Q}\_{\mathcal{Y}} = \begin{bmatrix} 4 & 0 & 0 \\ 0 & 6 & 0 \\ 0 & 0 & 6 \end{bmatrix}, \mathbf{R} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}, \mathbf{R}\_{1} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \tag{47}$$

To verify the effectiveness of the proposed UGV–UAV formation-maintenance controller, we set two cases in which the UGV adopts different motion modes. The motion of the UGV is set as in Equations (48) to (51). The simulation results are shown in Figures 1 and 2. case 1:

$$V\_l(t) = \begin{cases} 2 + \sin(0.1t) \text{ m/s, } 0 \le t < 50 \text{ s} \\ 2 \text{ m/s, } 50 \le t < 100 \text{ s} \\ 2 + \sin(0.1t) \text{ m/s, } 100 \le t < 150 \text{ s} \end{cases} \tag{48}$$

$$\begin{cases} \ \theta\_l(t) = 0 \text{ rad} \\ \ \psi\_{vl}(t) = 0.5 \cos(0.1t + 0.7728) \text{ rad} \end{cases} \tag{49}$$

case 2:

$$V\_l(t) = \begin{cases} 1.5 \text{ m/s}, 0 \le t < 50 \text{ s} \\ 2 + \sin(0.1t) \text{ m/s}, 50 \le t < 100 \text{ s} \\ 1.5 \text{ m/s}, 100 \le t < 150 \text{ s} \end{cases} \tag{50}$$

$$\begin{cases} \ \theta\_l(t) = 0 \text{ rad} \\ \ \psi\_{vl}(t) = 0.2 \sin(0.15t + 0.5523) \text{ rad} \end{cases} \tag{51}$$

$$\begin{cases} 1 \text{ m/s} \lessdot V\_{fc} \lessdot 3 \text{ m/s} \\ -15^{\circ} \lessdot \theta\_{fc} \lessdot 25^{\circ} \\ -80^{\circ} \lessdot \psi\_{vfc} \lessapprox 80^{\circ} \end{cases} \tag{52}$$

It can be seen from Figures 1 and 2 that under the control command in Equation (46), the UGV can guide the UAV to fly at a fixed altitude in accordance with the specified motion, which results in the maintenance of a fixed formation in the three-dimensional plane. In addition, the velocity and declination angle commands of the UVA vary slightly and reasonably without the jitter vibration phenomenon, which satisfies the constraints of the laws of physics. Moreover, the motion of the UGV perturbs the UAV–UGV formationmaintenance control. As the UGV changes its motion form at 50 and 100 s, the motion trajectory of the UAV fluctuates at the corresponding moments. Nevertheless, it can be quickly stabilized at the desired motion state using a formation-maintenance controller. In case 1, the maximum values of the motion state errors of the UAV in the *Xr*, *Yr*, and *Zr* directions are 1.2 m , 0 m, and 1.8 m, respectively. In Case 2, the maximum values of the motion state errors of the UAV in the *Xr*, *Yr*, and *Zr* directions are 0.6 m, 0 m, and 0.6 m, respectively.

**Figure 1.** Simulation results of case 1. (**a**) Three-dimensional motion trajectory of UGV and UAV; (**b**) velocity of UAV; (**c**) deflection angle of UAV; (**d**) motion trajectory in *Xr*-axis direction; (**e**) motion trajectory in *Yr*-axis direction; (**f**) motion trajectory in *Zr*-axis direction.

**Figure 2.** Simulation results of case 2. (**a**) Three-dimensional motion trajectory of UGV and UAV; (**b**) velocity of UAV; (**c**) deflection angle of UAV; (**d**) motion trajectory in *Xr*-axis direction; (**e**) motion trajectory in *Yr*-axis direction; (**f**) motion trajectory in *Zr*-axis direction.

#### **6. Conclusions**

This study analyzed the structure of the UAV–UGV formation motion control system and the relationship between the subsystems. A UAV–UGV formation–maintenance controller was designed based on the optimal control theory. In addition, simulations were performed for the UAV–UGV formation-maintenance control system, including a UGV with a given motion state, a UAV with a first-order stability-control system, and an optimal maintenance controller. The following conclusions were drawn.


**Author Contributions:** The contributions of the authors are the following; conceptualization: all authors; methodology: H.Z. and X.Y.; software: T.X.; writing: J.Z.; investigation: T.X. All authors have read and agreed to the published version of the manuscript.

**Funding:** The authors appreciate the financial support from the National Natural Science Foundation of China (NSFC) (62173274), Natural Science Foundation of Shaanxi Province (2020JQ-219), and Aviation Fund (20200001053005).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All data used during the study appear in the submitted article.

**Conflicts of Interest:** All of the authors declare that they have no known competing financial interests or personal relationships that could appear to influence the work reported in this paper.

#### **References**


## *Article* **Optimal Cruise Characteristic Analysis and Parameter Optimization Method for Air-Breathing Hypersonic Vehicle**

**Hesong Li 1, Yunfan Zhou 1, Yi Wang 1,\*, Sha Du <sup>2</sup> and Shangcheng Xu <sup>1</sup>**


**Abstract:** There is an optimal cruise point with the lowest fuel consumption when a hypersonic vehicle performs steady-state cruise. The optimal cruise point is composed of the optimal cruise altitude and the optimal cruise Mach number, and its position is closely related to the aircraft parameters. This article aims to explore the relationship between the optimal cruise point and relevant aircraft parameters and establish a model to describe it, then an aircraft parameter optimization method of adjusting the optimal cruise point to the target position is explored with validation by numerical simulation. Firstly, a parameterized model of a hypersonic vehicle is obtained as a basis, then the optimal cruise point is obtained by the optimization method, and the influence of a single aircraft parameter on the optimal point is investigated. In order to model the relationship between the aircraft parameters and the optimal cruise point, a neural network is employed. Finally, the model is used to optimize the aircraft parameters under multiple constraints. The results show that, after aircraft parameters optimization, the optimal cruise point is located at the predetermined position and the fuel consumption is lower, which provides a new perspective for the design of aircraft.

**Keywords:** hypersonic vehicle; steady-state cruise; aircraft parameter; neural network

#### **1. Introduction**

An air-breathing hypersonic vehicle generally refers to an aircraft powered by airbreathing engines and flying at a Mach number above 5 [1]. It has a series of advantages, such as high altitude, fast speed, and strong penetrability, which has a far-reaching impact on the future development of the aerospace field [2,3]. Therefore, the research of hypersonic technology has received extensive attention from researchers all over the world.

In the cruise phase of a hypersonic vehicle, steady-state cruise refers to the cruise mode whose altitude and speed remain constant [4]. Steady-state cruise is simple and direct and has high stability for a hypersonic vehicle [5]. In order to save fuel and increase the range of the aircraft, the fuel consumption averaged by the range is an important indicator that many studies of trajectory optimization focus on. Research has shown that, when the aircraft performs steady-state cruise at different altitudes or Mach numbers, the fuel consumption is different [6–9]. Under a determined aircraft model, there is an optimal steady-state cruise point composed of the optimal cruise Mach number and the optimal cruise altitude. When the aircraft performs steady-state cruise at the Mach number and altitude of the optimal point, the fuel consumption averaged by the range is lowest [10]. To obtain the optimal cruise point, a great deal of research is carried out.

Based on a parametric model of HL-20, a hypersonic vehicle widely used in trajectory optimization research, Liu et al. [11] developed a two-level optimization algorithm combining particle swarm optimization and sequential quadratic programming to solve the optimal steady-state cruise point; Gao et al. [12] employed the fmincon function in MATLAB to determine the position of the optimal cruise point. In [13], it was pointed out

**Citation:** Li, H.; Zhou, Y.; Wang, Y.; Du, S.; Xu, S. Optimal Cruise Characteristic Analysis and Parameter Optimization Method for Air-Breathing Hypersonic Vehicle. *Appl. Sci.* **2021**, *11*, 9565. https:// doi.org/10.3390/app11209565

Academic Editors: Giovanni Bernardini, Wei Huang and Dong Zhang

Received: 14 September 2021 Accepted: 12 October 2021 Published: 14 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

that a higher altitude was beneficial to reduce drag, but, at the same time, it would reduce the specific impulse. Therefore, the optimal cruise point was a combined result of these two aspects.

Most of this research is based on the HL-20 aircraft model. However, the parametric model of HL-20 is an ideal model whose specific impulse is independent from the equivalence ratio, while it has been found that the specific impulse should be negatively correlated with the equivalent ratio in the ramjet engine [14]. As a hypersonic vehicle enters the stage of engineering practice, to better develop the hypersonic vehicle, it is significant to clarify the cruise characteristics based on a more practical aircraft model.

In addition, for different aircraft models, the positions of the optimal cruise points are also different, which reveals the position of the optimal cruise point is closely related to the aircraft parameters. Even though some methods have been developed to solve the optimal cruise point, there is little research about how the aircraft parameters influence the optimal cruise point. In the aircraft design process, when the aircraft parameters are determined, there is a corresponding optimal steady-state cruise point. However, if the subjectively desired cruise Mach number and altitude deviate from the objective Mach number and altitude of the optimal cruise point, the fuel consumption cannot be reduced fully. Then, the optimal cruise point can be taken into account in the aircraft design to guide the optimization of the aircraft parameters. The optimal cruise point can be changed after aircraft parameters optimization and then coincide with the subjective desired cruise point, which is of great benefit to save fuel. Therefore, exploring the influence law of aircraft parameters on the optimal cruise point and carrying out the research on aircraft parameter optimization is of great significance to provide a new perspective for aircraft design.

Regarding the analysis and optimization of aircraft parameters, a great deal of research has been carried out. In the early days, researchers focused on analyzing a single aircraft parameter to investigate its impact on the range, lift-to-drag ratio, and other aspects of performance [15]. The control variable method was widely adopted, especially in the research about range. To study the relationship between the range of a hypersonic vehicle and the aircraft parameters, the Bruguet formula was employed [16,17], and some methods, such as the cell mapping method in [18], were applied as well. Moreover, the aerodynamic layout of the hypersonic vehicle was analyzed to clarify the influence on the lift-drag ratio and other indicators [19,20].

However, due to the strong interaction between the aircraft parameters, the hypersonic vehicle is a complex system. The analysis and optimization of the aircraft parameters cannot focus only on a single factor. The coupling relationship between various parameters also needs to be taken into consideration. Therefore, in recent years, surrogate model technology has been widely employed in the analysis of aircraft parameters. The relationship between the aircraft parameters and performance indicators can be described by a surrogate model, and then parameter optimization can be carried out based on the surrogate model [21]. Currently, to establish a surrogate model, polynomial regression, the Kriging model, radial basis function, and a neural network can be employed [22]. In addition, many statistical methods have been introduced into the research based on the surrogate model. In the analysis of the aircraft parameters, the sensitivity analysis method [23] and correlation analysis method were widely used to dig out important parameters whose influence was most obvious [24,25]. The orthogonal test and some novel methods, such as the technology identification, evaluation and selection method in [26] and hybrid algorithm in [27], were employed to explore an optimal design space and parameter estimation in aircraft parameters analysis as well.

Based on the surrogate model, optimization methods have been widely employed in the parameter optimization research [28]. There are various parameter optimization methods, which can be mainly divided into gradient-based methods and the intelligent method. With the development of computing science, the intelligent method increasingly plays an important role. A genetic algorithm (GA), particle swarm optimization (PSO), differential evolution (DE) algorithm, and so on have been widely developed [29]. For

an optimization problem with high dimensions, the PSO algorithm is demonstrated to be suitable and is also easy to program, so a great deal of research has adopted it [30,31].

In this paper, in order to describe the influence of aircraft parameters on the optimal cruise point, a sensitivity analysis and surrogate model technology with the optimization method is employed. Firstly, based on a parametric hypersonic vehicle model, the position of the optimal cruise point is quickly solved by the PSO algorithm. On this basis, the influence law of a single aircraft parameter on the optimal cruise point is explored, and a model about the relationship between the aircraft parameters and optimal cruise point is established by neural network. Based on the model, the aircraft parameters are optimized under various constraints to adjust the optimal cruise point to the given position. The aircraft parameters optimization method proposed reveals how the aircraft parameters should be adjusted, which is able to provide guidance in the design of a hypersonic vehicle.

#### **2. Models**

#### *2.1. Dynamic Equations*

For simplicity, it is considered that Earth is a homogeneous sphere with a radius of 6378 km. The Coriolis inertial force is ignored, and the gravitational acceleration is regarded as a constant at 9.8 m/s2. The force analysis of aircraft is shown in Figure 1, where *T* is the thrust vector generated by the aircraft engine, *R* is the aerodynamic force vector that contains lift and drag, and *G* is the gravity vector.

**Figure 1.** Force analysis diagram of aircraft.

The dynamic equation of the aircraft mass center in a form of vectors is as follows:

$$m\frac{d^2\vec{r}}{dt^2} = \vec{T} + \vec{R} + \vec{G} \tag{1}$$

In the velocity coordinate system, Equation (1) can be decomposed into

$$\begin{cases} \begin{aligned} m\frac{dv}{dt} &= T\cos\mathfrak{a} - D - G\sin\gamma\\ m(v \cdot \frac{d\gamma}{dt} - \frac{v^2 \cos\gamma}{r}) &= T\sin\mathfrak{a} + L - G\cos\gamma \end{aligned} \tag{2}$$

where *v* denotes the flight velocity, *γ* is the elevation angle of trajectory, *α* is the angle of attack, and *m* is the mass of the aircraft. Due to

$$\begin{cases} \text{ } h = r - R\_c\\ \text{ } v = Ma \cdot c \end{cases} \tag{3}$$

where *h* denotes the flying height (km) and *Ma* is Mach number, then the dynamic equation of the cruise phase is [6]:

$$\begin{cases} \frac{dh}{dt} = Ma \cdot c \cdot \sin \gamma\\ \frac{dMa}{dt} = \frac{T \cos \kappa - D - mg \sin \gamma}{m \cdot c} \\ \frac{d\gamma}{dt} = \frac{T \sin \kappa + L}{mM \cdot c} + \cos \gamma (\frac{Ma \cdot c}{R\_c + h} - \frac{\mathcal{J}}{Ma \cdot c}) \end{cases} \tag{4}$$

Based on the dynamic equation, the trajectory can be solved by numerical method. Before that, the magnitude of lift, drag, and thrust need to be determined.

#### *2.2. Aerodynamic Force Calculation*

The magnitude of lift and drag is calculated by lift coefficient and drag coefficient, which are denoted by *CL* and *CD*, respectively, as follows:

$$\begin{array}{l} L = \mathbb{C}\_{L} \cdot \frac{1}{2} \rho \left( Ma \cdot c \right)^{2} \cdot \text{S} \\ D = \mathbb{C}\_{D} \cdot \frac{1}{2} \rho \left( Ma \cdot c \right)^{2} \cdot \text{S} \end{array} \tag{5}$$

where *ρ* is the atmospheric density and *S* is the reference area of the aircraft. The lift coefficient and drag coefficient under different angles of attack, rudder angles, and incoming Mach numbers are obtained by CFD simulation. Then, in the process of trimming angle of attack, the rudder angle is adjusted to make the pitching moment equal to 0, and the lift coefficient and drag coefficient at this moment are regarded as the data under this angle of attack in steady state [32]. Subsequently, the lift coefficient and drag coefficient are parameterized by polynomial fitting. The fitting polynomials are as follows, and the values of coefficients are given in Tables 1 and 2.

$$\begin{aligned} \mathbf{C}\_{L} &= \sum\_{i=0, j=0}^{i+j \le 3} A\_{ij} \cdot Ma^{i} \cdot a^{j} \\ \mathbf{C}\_{D} &= \sum\_{i=0, j=0}^{i \le 3, i+j \le 4} B\_{ij} \cdot Ma^{i} \cdot a^{j} \end{aligned} \tag{6}$$

**Table 1.** Fitting coefficients of *CL*.


**Table 2.** Fitting coefficients of *CD*.


Figure 2 shows the comparison between the original data and the fitted results. It can be seen that, with the increase of angle of attack, the lift coefficient and drag coefficient increase, and, with the increase of Mach number, both of them decrease slowly.

**Figure 2.** The change of *CL* and *CD* with angle of attack and Mach number.

The atmospheric parameters change a lot at different altitudes, which has a significant impact on the calculation of aerodynamic forces. In this paper, the American Standard Atmospheric Parameters Model is employed. Firstly, a gravity potential height is defined [33]:

$$H = \frac{h}{1 + h/R\_c} \tag{7}$$

where *Re* denotes the radius of Earth. When the altitude is within 20 to 32 km, the atmospheric density and temperature are calculated by the following formula:

$$\begin{aligned} \mathcal{W} &= 1 + \frac{H - 124.9021}{221.552} \\ \frac{\rho}{\rho\_0} &= 2.5158 \times 10^{-2} \mathcal{W}^{-34.1629} \\ T &= 221.552 \mathcal{W} \text{ (K)} \end{aligned} \tag{8}$$

where *ρ*<sup>0</sup> = 1.225 kg/m3.

The sound velocity calculation formula is:

$$
\sigma = \sqrt{1.4 \times 287 \times T} \tag{9}
$$

According to the formulas above, the lift and drag of aircraft can be calculated.

#### *2.3. Thrust Calculation*

The thrust of a hypersonic vehicle is generated by the ramjet. Air flow is captured by the inlet, then mixed with fuel, and burns to generate thrust. The air mass flow rate captured by the engine inlet, denoted by . *minlet*, is evaluated:

$$
\dot{m}\_{inlet} = \varphi\_{inlet} \cdot \rho \cdot Ma \cdot c \cdot A \tag{10}
$$

where *A* denotes the inlet area and *ϕinlet* is the inlet flow capture coefficient, which reflects the ability to capture air flow under different Mach numbers and angles of attack.

The calculation of *ϕinlet* is shown in Equation (11), and the values of coefficients are given in Table 3.

$$\varphi\_{init} = K\_1 \cdot \alpha^2 + K\_2 \cdot Ma^2 + K\_3 \cdot \alpha + K\_4 \cdot Ma + K\_5 \cdot Ma \cdot \alpha + K\_6 \tag{11}$$

**Table 3.** Fitting coefficients of *ϕinlet*.


Since the kind of fuel in the scramjet is fixed, there is a relationship between the fuel mass flow rate and the air mass flow rate in complete combustion, which is displayed in Equation (12), where *<sup>k</sup>* is a proportion coefficient with a value at 0.07 and . *mf st* denotes the fuel mass flow rate required for complete combustion.

$$
\dot{m}\_{fst} = k \cdot \dot{m}\_{inlet} \tag{12}
$$

The ratio of current fuel mass flow rate to the fuel mass flow rate required for complete combustion is called the equivalence ratio, which is denoted by *Φ* [34]:

$$
\Phi = \frac{\dot{m}\_{fuel}}{\dot{m}\_{fst}} \tag{13}
$$

When *Φ* = 1, it indicates that current amount of air flow can be completely consumed; when *Φ* < 1, it indicates that the current combustion in the engine is in an oxygen enriched state. However, when *Φ* < 0.45, it leads to insufficient fuel, and the engine cannot work normally. Therefore, the range of *Φ* is between 0.45 and 1.

Combining Equations (12) and (13), the fuel mass flow rate is:

$$
\dot{m}\_{fuel} = \Phi \cdot k \cdot \dot{m}\_{inlet} \tag{14}
$$

Then, the thrust is:

$$T = \dot{m}\_{fuel} \cdot I\_{sp} = \Phi \cdot k \cdot \dot{m}\_{inlet} \cdot I\_{sp} \tag{15}$$

where *Isp* denotes the specific impulse, a parameter related to equivalence ratio and Mach number:

$$I\_{sp} = f(\Phi) \cdot \mathcal{g}(Ma) \tag{16}$$

*f*(*Φ*) and *g*(*Ma*) are shown in Equation (17), and the values of coefficients are given in Table 4.

$$\begin{aligned} f(\Phi) &= (1 + \frac{1 - \Phi}{2}) \\ g(Ma) &= \sum\_{i=1}^{4} \mathbb{C}\_{i} \cdot Ma^{i} \end{aligned} \tag{17}$$

**Table 4.** Fitting coefficients of *g*(*Ma*).


*f*(*Φ*) reflects the influence of equivalence ratio on specific impulse. If the equivalence ratio is lower, fuel is more likely to burn completely due to more adequate oxygen; thus, larger thrust can be generated by a unit mass of fuel, and the specific impulse is larger as a result. Compared with the HL-20 aircraft model, whose impulse is independent from equivalence ratio [4,6], this model is closer to the actual situation.

*g*(*Ma*) reflects the influence of Mach number on the specific impulse. When Mach number increases, specific impulse will decrease.

Figure 3 shows the comparison between the parametric thrust model and the original data at an angle of attack of 5◦. It can be seen that the parametric model can accurately reflect the trend of thrust with Mach number and equivalence ratio: with the increase of Mach number, the thrust increases gradually; when the equivalence ratio increases, more fuel is burned and more thrust is generated.

**Figure 3.** Comparison between the parametric model and original data.

#### *2.4. Fuel Consumption Calculation*

To evaluate the flight cost during steady-state cruise, the fuel consumption averaged by range is employed as an indicator. If the fuel consumption averaged by range is smaller, the range is larger using the same mass of fuel. The time derivative of range is [35]:

$$\frac{dr}{dt} = Ma \cdot c \cdot \cos \gamma \left(\frac{R\_\varepsilon}{R\_\varepsilon + h}\right) \tag{18}$$

Combined with Equations (10) and (14), the derivative of fuel consumption to range can be obtained as follows:

$$\frac{dm}{dr} = \frac{\Phi \cdot k \cdot \varphi\_{inlet} \cdot \rho \cdot A}{\cos \gamma} (1 + \frac{h}{R\_\varepsilon}) \tag{19}$$

Due to the Mach number and altitude remaining unchanged during the steady-state cruise and the fuel consumption per second being rather small compared with the total aircraft mass, it is considered that all the parameters in Equation (19) remain basically unchanged [11]; thus, the fuel consumption averaged by range can be calculated by Equation (19) directly.

So far, the parametric aircraft model is established. Given the angle of attack, Mach number, and equivalence ratio, the aerodynamic force and thrust can be calculated, then the cruise trajectory can be obtained by the dynamic equation. Finally, the fuel consumption averaged by the range can be solved.

In the dynamic equation and parametric aircraft model, the relevant aircraft parameters include lift coefficient, drag coefficient, inlet area, specific impulse, aircraft mass, and reference area. In this paper, these six aircraft parameters are regarded as the parameter group that is mainly focused on.

#### **3. Solution of Optimal Cruise Point**

Previous studies have found that, when the aircraft is cruising at different Mach numbers and altitudes, its fuel consumption averaged by the range is different, and there is a certain value of altitude and Mach number to minimize the fuel consumption. According to the aircraft model in this paper, the steady-state cruise characteristics at different Mach numbers and altitudes are analyzed.

#### *3.1. Steady-State Cruise Parameter Solution*

In steady-state cruise, the elevation angle of trajectory is constant at 0, and the altitude and Mach number remain unchanged; thus, Equation (4) is simplified into:

$$\begin{cases} \frac{dMa}{dt} = \frac{T\cos a - D - m\underline{\chi}\sin\gamma}{m\cdot c} = 0\\ \frac{d\gamma}{dt} = \frac{T\sin a + L}{m\cdot Ma \cdot c} + \cos\gamma \left(\frac{Ma\cdot c}{R\_c + h} - \frac{\mathcal{E}}{Ma\cdot c}\right) = 0 \end{cases} \tag{20}$$

Lift and drag are related to altitude, Mach number, and angle of attack, while thrust is related to altitude, Mach number, angle of attack, and equivalence ratio. In short, there are two equations with four unknowns. From Equation (20), it can be obtained that

$$\begin{cases} \begin{aligned} \,^T \cos \alpha - D - mg \sin \gamma &= 0 \\ \,^T \sin \alpha + L + \cos \gamma \cdot \frac{m \cdot (Ma \cdot c)^2}{R\_\varepsilon + h} - mg \cos \gamma &= 0 \end{aligned} \tag{21}$$

If the upper equation in Equation (21) is multiplied by tan*γ*, *T* can be eliminated:

$$D\tan\alpha + L - mg + m\frac{\left(Ma \cdot c\right)^2}{\left(R\_c + h\right)} = 0\tag{22}$$

Then, substitute Equation (5) into Equation (22) and it can be obtained that

$$\left(\mathbf{C}\_{D} \cdot \frac{1}{2}\rho (Ma \cdot \mathbf{c})\right)^{2} \cdot \mathbf{S} \cdot \tan a + \mathbf{C}\_{L} \cdot \frac{1}{2}\rho (Ma \cdot \mathbf{c})^{2} \cdot \mathbf{S} - m\mathbf{g} + m\frac{\left(Ma \cdot \mathbf{c}\right)^{2}}{\left(R\_{\mathbf{c}} + h\right)} = 0\tag{23}$$

If the value of *h* and *Ma* are given, *ρ* and *c* can be obtained. Due to the fact that *CL* and *CD* are related to *Ma* and *α*, Equation (23) is an equation only related to *α*, which can be solved by the method of bisection. After the value of *α* is obtained, there is only one unknown left. Then, the equivalent ratio can be solved by Equation (21) so as to solve the fuel consumption.

The fuel consumption averaged by the range is solved in the range of Mach 4 to 7 and 20 km to 29 km by the traversing method at a Mach number interval of 0.1 and an altitude interval of 200 m. The contour of fuel consumption at different altitudes and Mach numbers is displayed in Figure 4. Since the equivalence ratio ranges from 0.45 to 1, there are boundaries in the contour caused by the constraints of the equivalence ratio, which means there is no solution of Equation (21) and steady-state cruise cannot be achieved in the region outside the boundaries. The distribution of fuel consumption is similar to a basin. If the Mach number or altitude is too high or too low, the fuel consumption will increase up to 1.2 kg/km, while the minimum is only about 0.4 kg/km. The group of the Mach number and altitude with the minimum fuel consumption is called the optimal steady-state cruise point and is denoted by (*Maopt*, *hopt*), which is in the range of Mach 4.5 to 5.5 and 24 km to 26 km.

**Figure 4.** Contour of fuel consumption averaged by range.

#### *3.2. Fast Solution of Optimal Cruise Point*

Although a rough position of the optimal cruise point can be obtained by the traversing method, it is time-consuming and accuracy is limited. In order to determine the position of the optimal cruise quickly and accurately, the optimization algorithm is employed.

According to the analysis, there are actually only two variables when solving steadystate cruise. Given the cruise Mach number and altitude, the other trajectory parameters can be calculated. Therefore, the solution of the optimal cruise point can be regarded

as an optimization problem with two variables in which the fuel consumption is the optimization objective and the cruise Mach number and altitude are the optimization variables. Therefore, the mathematical expression of the optimization problem is:

$$\begin{cases} \min f = \frac{\Phi \cdot k \cdot q\_{\min} \cdot p \cdot A}{\cos \gamma} (1 + \frac{h}{R\_c}) \\\\ s.t \begin{cases} 22 \le h \le 28 \\\ 4 \le Ma \le 7 \\\ \frac{T \cos a - D - mg \sin \gamma}{m \cdot c} = 0 \\\ \frac{T \sin a + L}{m \ln c \cdot c} + \cos \gamma (\frac{Ma \cdot c}{R\_c + h} - \frac{\mathcal{S}}{M \cdot c}) = 0 \end{cases} \end{cases} \tag{24}$$

The particle swarm optimization (PSO) algorithm is used to solve the optimization problem. The optimization algorithm process is displayed in Figure 5, and the details are as follows:

Step 1: randomly generate an initial particle swarm, and each particle is two-dimensional and represents a group of Mach number and altitude;

Step 2: calculate the steady-state cruise trajectory parameters at the Mach number and altitude represented by a particle;

Step 3: calculate the fuel consumption averaged by range at the altitude and Mach number represented by a particle as the fitness function value;

Step 4: update the velocity and position of particles according to the fitness function value to obtain a new iteration of particle swarm;

Step 5: repeat Steps 2 to 5 until the termination conditions are satisfied.

**Figure 5.** Optimization process.

Based on the optimization process, the optimal cruise point can be obtained. The size of the particle swarm is 10, and the maximum number of iterations is 30. Figure 6 shows the change of the fitness function value. The algorithm converges after 10 iterations. Finally, the altitude of the optimal cruise point is 24.95 km, and the Mach number is 4.95. Figure 7 displays the position of the optimal cruise point in the fuel consumption contour. The angle of attack corresponding to the optimal cruise point is 9.47◦, and the equivalence ratio is 0.556. The minimal fuel consumption averaged by the range is 0.396 kg/km.

**Figure 6.** The change of fitness function value during optimization.

**Figure 7.** The location of optimal cruise point in the contour of fuel consumption.

Therefore, if the aircraft parameters are determined, the position of the optimal cruise point can be obtained quickly and accurately by this PSO algorithm. On this basis, the aircraft parameters can be adjusted and corresponding optimal cruise points can be obtained, then the influence of the aircraft parameters on the optimal cruise point can be explored.

#### **4. Analysis and Modeling of Aircraft Parameters**

According to previous analysis, for a determined parametric aircraft model, there is a corresponding optimal steady-state cruise point, and the position of the optimal cruise point is determined by the aircraft parameters. How the aircraft parameters affect the position of the optimal cruise point is also a problem worthy of exploration, which has guiding significance for the design of aircraft.

#### *4.1. Analysis of Single Aircraft Parameter*

The aircraft parameters group studied in this paper includes lift coefficient, drag coefficient, inlet area, specific impulse, aircraft mass, and reference area. These six parameters can be divided into three aspects, as shown in Figure 8: aerodynamic shape (lift coefficient and drag coefficient), propulsion system (inlet area and specific impulse), and structural design (aircraft mass and reference area).

**Figure 8.** Considered parameters in this paper.

The values of these six aircraft parameters are adjusted to explore their influence on the optimal cruise point denoted by (*Maopt*, *hopt*). The ratio between the value after adjustment and the original value is denoted by *XL*, *XD*, *XA*, *XI*, *Xm*, and *XS*, respectively.

Within the range of 0.8 to 1.5, the influence of a single factor in (*XL*, *XD*, *XA*, *XI*, *Xm*, *XS*) is investigated. Only one variable is changed at a time, and the rest remain at 1. After the aircraft parameters are adjusted, the corresponding optimal cruise points are obtained by the PSO algorithm in Figure 5.

Figure 9 shows the influence of different factors. From Figure 9a, *Maopt* has a positive correlation with *XA* and *XI*, and the effects of *XA* and *XI* on *Maopt* are very similar, which indicates that, if the magnitude of thrust determined by *XA* and *XI* is larger, *Maopt* will increase as well. Further, *Maopt* has a negative correlation with *XD* and *XS*, and the effects of *XD* and *XS* on *Maopt* are also similar, which indicates that, if the magnitude of drag determined by *XD* and *XS* is larger, *Maopt* is lower. Differently, *XL* and *Xm* have little effect on *Maopt*.

**Figure 9.** Influence of different factors on the optimal cruise point: (**a**) Influence on *Maopt*; (**b**) Influence on *hopt*.

It can be seen from Figure 9b that *hopt* is sensitive to all the factors. In detail, *hopt* is positively correlated with *XL*, *XA*, *XI*, and *XS*, and the effects of *XA* and *XI* on *hopt* are very similar, which indicates that the magnitude of thrust determined by *XA* and *XI* also has a great impact on *hopt*, while *hopt* has a negative correlation with *XD* and *Xm*.

A multiple regression analysis is employed to quantify the influence of the six factors on the optimal cruise point, and the regression coefficient is normalized as an index of sensitivity. Figure 10 shows the sensitivity percentages of the different factors, where a negative sign indicates a negative correlation between the factor and the dependent variable. It can be seen that *XA* and *XI* have the largest impact on the optimal cruise Mach number, accounting for nearly 30%, followed by *XD* and *XS*, which are negatively correlated, while the sensitivity of *XL* and *Xm* is almost 0; *XA*, *XI* and *Xm* have the greatest influence on the optimal cruise altitude, accounting for about 20%, where the influence of *Xm* is negative.

**Figure 10.** Sensitivity of different factors on the optimal cruise point: (**a**) Sensitivity on *Maopt*; (**b**) Sensitivity on *hopt*.

Based on the conclusions above, the inlet area, specific impulse, and drag coefficient have the largest impact on the optimal steady-state cruise Mach number. If the optimal cruise Mach number needs to be enhanced, the inlet area and specific impulse should be larger or the drag of aircraft should be reduced. Due to the inlet area, the specific impulse and aircraft mass have the largest impact on the optimal cruise altitude. To improve the optimal cruise altitude, the most effective way is to increase the inlet area and specific impulse, or reduce the weight of the aircraft.

Figure 11 shows the influence of the six factors on fuel consumption averaged by the range. It can be seen that *XD*, *Xm*, and *XS* are positively correlated with fuel consumption, while *XL*, *XA*, and *XI* are negatively correlated with it. Therefore, in order to reduce the fuel consumption, the lift coefficient and specific impulse as well as the inlet area should be increased, where the effect of increasing specific impulse is the best; in addition, the mass and drag coefficient, as well as the reference area of the aircraft, should be reduced, where the effect of drag coefficient reduction is the best.

**Figure 11.** Influence of different influence factors on the fuel consumption.

However, compared with the altitude and fuel consumption of the optimal cruise point, the change range of the Mach number is the smallest, which means that it is difficult to greatly change the optimal cruise Mach number by adjusting the aircraft parameters in the current range.

#### *4.2. Modeling of Aircraft Parameters*

Through the analysis of a single factor, the influence of different aircraft parameters on the optimal cruise point is clarified. However, due to the complex interaction among these parameters, the influence of a single factor has limited ability to guide aircraft design, and it is necessary to consider simultaneous changes of different aircraft parameters. Therefore, it is of significance to establish a mathematical model that can comprehensively describe the relationship between the aircraft parameters and optimal cruise point.

A neural network (NN) is a complex network system that simulates the human brain. It consists of a large number of simple neurons connected with each other [36–38]. A feed forward neural network (FFNN) is a kind of neural network. In recent years, it has been widely used in data predicting and so on [39]. In this paper, due to there being six parameters, it is difficult to model the influence law by polynomial fit explicitly. Therefore, FFNN is employed to establish a mathematical model of (*Maopt*, *hopt*) and (*XL*, *XD*, *XA*, *XI*, *Xm*, *XS*); the model has six inputs and two outputs.

Firstly, 500 sample points are generated for the six inputs from the Optimal Latin Hypercube Distribution [40] in a range of 0.8 to 1.5, and the values of (*Maopt*, *hopt*) for the 500 samples are obtained by the PSO for the optimal cruise point in Figure 5. Furthermore, 400 of the samples are used as the training set, 75 as the validation set, and the other 25 as the test set. The logsig function is employed in the hidden layer, and the output layer transfer function is Purelin Linear. The L-M method is employed as the learning method. The structure of the FFNN is displayed in Figure 12.

The regression results after training are illustrated in Figure 13. It can be seen that the target value and output results are basically on the same line, and the value of the regression coefficient is close to 1, which shows a good training effect.

The model estimated values and accurate values of the 25 points in the test set are compared in Figure 14. It can be seen that the estimated values are within the deviation range of 3% of the accurate value, indicating that the trained neural network model can accurately describe the relationship between (*Maopt*, *hopt*) and (*XL*, *XD*, *XA*, *XI*, *Xm*, *XS*). Based on this model, given the value of the six factors, the values of (*Maopt*, *hopt*) can be quickly obtained.

**Figure 12.** The structure of FFNN.

**Figure 13.** Regression results of FFNN after training.

**Figure 14.** Comparison between estimated value and accurate value: (**a**) Estimated value and accurate value of *Maopt*; (**b**) Estimated value and accurate value of *hopt*.

#### **5. Aircraft Parameter Optimization Method**

After the model is obtained, given the value of (*XL*, *XD*, *XA*, *XI*, *Xm*, *XS*), (*Maopt*, *hopt*) can be obtained quickly. Therefore, if various parameters change simultaneously, the reaction of the optimal cruise point can be studied. By this way, if the subjective desired cruise point does not coincide with the actual optimal cruise point, the optimal cruise point can be adjusted to the target position by optimizing the aircraft parameters.

However, due to the interaction between the aircraft parameters, the coupling relationship between the aircraft parameters needs to be considered when optimization is carried out.

Since there is a complex coupling relationship between the aircraft parameters, take the following constraints as examples to illustrate the optimization method of the aircraft parameters based on the neural network model:


These constraints can be adjusted according to the actual situation. Now that it is difficult to adjust the optimal cruise Mach number greatly, the optimal cruise altitude is mainly focused. The target of the optimal cruise point is set at (*Ma*5, 26 km).

However, if the aircraft parameters can be changed at the same time, to adjust the optimal cruise altitude to the desired value, there is more than one plan since six relevant aircraft parameters are considered. The fuel consumption after aircraft parameters optimization is different by various adjustment plans. Therefore, in this paper, the plan with the minimum fuel consumption after adjustment is explored. Since there may be more than one solution, under the condition that the optimal cruise point after adjustment is located at (*Ma*5, 26 km), reducing the fuel consumption is also an objective. The condition that the optimal cruise point is located at (*Ma*5, 26 km) is reflected in the fitness function in the form of the penalty function, so the fitness function is written as:

$$fitness = F\_{\rm h} + F\_{\rm M} + \frac{\widetilde{T}}{\widetilde{g}\,\widetilde{I}\_{\rm sp}\widetilde{M}a \cdot c \cdot \cos\gamma} (1 + \frac{\widetilde{h}}{R\_c}) \tag{25}$$

where "~" indicates that the parameter after adjustment, and the penalty functions *Fh* and *FM* are regarding altitude and Mach number, respectively.

$$F\_h = \begin{cases} \lambda\_1 \cdot \frac{|\bar{h}\_{opt} - 26|}{26}, & \text{if } \frac{|\bar{h}\_{opt} - 26|}{26} > \varepsilon \\ 0, & \text{else} \\ \lambda\_2 \cdot \frac{|\bar{M}\_{opt} - 5|}{5}, & \text{if } \frac{|\bar{M}\_{opt} - 5|}{5} > \varepsilon \end{cases} \tag{26}$$

*λ*<sup>1</sup> and *λ*<sup>2</sup> are both large numbers, and *ε* is a small tolerance.

Therefore, the aircraft parameter optimization problem can be expressed mathematically as:

$$\begin{aligned} \text{minimize } fitness &= F\_h + F\_M + \frac{T}{\sqrt{l\_{sp}Mc \cdot c \cdot \cos \gamma}} (1 + \frac{h}{R\_c}) \\ \text{subject to } & \begin{cases} X\_L \le 1.05 \\ X\_D \ge 1 + \frac{1}{2}(X\_L - 1) \\ X\_A \le 1.3 \\ X\_m \ge 1 + \frac{1}{3}(X\_A - 1) \\ X\_S \ge 1 + \frac{1}{4}(X\_A - 1) \\ X\_I \le 1.1 \end{cases} \end{aligned} \tag{27}$$

The particle swarm optimization algorithm is employed, whose process is outlined in Figure 5. Each particle in the algorithm represents a group of (*XL*, *XD*, *XA*, *XI*, *Xm*, *XS*), and the objective is to minimize the fitness of the particles. The optimal solution should be a group of (*XL*, *XD*, *XA*, *XI*, *Xm*, *XS*), with the lowest fitness computed by (25).

The results are displayed in Table 5. The lift coefficient needs to increase by 5%, while the drag coefficient will increase by 2.5% as a result; the inlet area should increase by 25.93%, while the mass will increase by 8.64%, and the reference area will increase by 12.91%; the specific impulse needs to increase by 10%. It can be seen that the variation of the lift coefficient and impulse reach their boundaries, while the variation of the inlet area and reference area are within the feasible range. If the constraints are different, the optimization results will change as well. After the aircraft parameters are adjusted, the optimal steady-state cruise point estimated by the neural network model is located at (*Ma*4.99, 25.95 km).

**Table 5.** Optimized results of aircraft parameters.


Under the optimized aircraft parameters, the position of the accurate optimal cruise point and contour of fuel consumption are displayed in Figure 15. It can be seen that the actual optimal cruise point after adjustment is located at (*Ma*4.99, 25.92 km), which basically meets the goal of adjusting the optimal cruise point from (*Ma*4.95, 24.95 km) to (*Ma*5, 26 km). In addition, the adjustment adopts a plan with the lowest fuel consumption at 0.3642 kg/km with the constraints satisfied. Therefore, the method of optimizing the aircraft parameters based on the neural network model has been demonstrated to be effective.

**Figure 15.** Optimal point and contour of fuel consumption after adjustment.

So far, not only the optimal cruise point is adjusted to the target position but also the fuel consumption after adjustment is the lowest. However, it is difficult to greatly adjust the optimal cruise Mach number under the current constraints. If the technical bottleneck of the ramjet can be broken through and an engine with a larger size and significantly increased specific impulse can be obtained, it will be of great benefit to adjust the optimal cruise Mach number in a large range.

#### **6. Conclusions**

There is an optimal cruise point for hypersonic vehicles when performing steady-state cruise, and, at this point, the fuel consumption is the lowest. The position of the optimal cruise point is closely related to the aircraft parameters. In this paper, to clarify the influence of the aircraft parameters on the optimal cruise point, the optimal cruise point under a basic parametric aircraft model was firstly solved by an optimization algorithm, and then the influence of different parameters was investigated with modeling by a neural network. In order to make the aircraft cruise at the desired Mach number and altitude with the lowest fuel consumption, an aircraft parameter optimization method whose aim was to adjust the optimal cruise point to a given position was explored with numerical vindication. The main conclusions of this paper are as follows:


The parameter analysis and optimization method can be applied in other aircraft models, and the constraints of the aircraft parameters can be changed according to the actual situation as well, which can reveal how the aircraft parameters should be adjusted and thus provide guidance in the design of aircraft.

**Author Contributions:** Conceptualization, H.L. and Y.W.; Investigation, Y.W.; Methodology, H.L. and Y.Z.; Software, Y.Z. and S.D.; Visualization, S.D.; Writing—original draft, H.L.; Writing—review & editing, S.X. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Postgraduate Scientific Research Innovation Project of Hunan Province, grant number CX20200084.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Acknowledgments:** The authors want to express their thanks to Tao Tang.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Applied Sciences* Editorial Office E-mail: applsci@mdpi.com www.mdpi.com/journal/applsci

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com

ISBN 978-3-0365-4232-4