*2.1. AUV Nonlinear Model*

To study the motion of the AUV, the fixed coordinate system {E} and the motion coordinate system {O} established in this section are shown in Figure 1.

**Figure 1.** Coordinate system diagram, where *E* − *ξηζ* is the fixed coordinate system, *ξ* points due north, *η* points due east, O − xyz is the motion coordinate system, and O coincides with the center of gravity of the AUV, where the *x*-axis points to the bow of the vehicle.

A fixed point at sea level is usually chosen as the origin of the fixed coordinate system, where the *ξ* axis points to due north and the *η* axis points to due east. In order to simplify the nonlinear model of the AUV, the center of gravity of the AUV is chosen as the origin of the motion coordinate system {O}, where the *x* axis is located in the longitudinal midprofile and points to the bow of the AUV, and the *y* axis is perpendicular to the longitudinal mid-profile and points to the starboard side of the AUV.

In model building, it may be assumed that the AUV studied in this paper is a rigid body with a certain mass distribution, and the effect of its transverse rocking motion is not considered when the AUV is operating underwater, i.e., the transverse rocking attitude angle and angular velocity are kept as desired values. In the following, the nonlinear model of the AUV and the feedback linearization process are based on this assumption.

For the purpose of the following study, the following motion variables are defined:

The position vector in a fixed coordinate system is *η* = [*xyz θ ψ*] *<sup>T</sup>* <sup>∈</sup> *<sup>R</sup>*<sup>3</sup> <sup>×</sup> *<sup>S</sup>*2. The position is *η*<sup>1</sup> = [*xyz*] *<sup>T</sup>* <sup>∈</sup> *<sup>R</sup>*3, The attitude angle is *<sup>η</sup>*<sup>2</sup> = [*θ ψ*] *<sup>T</sup>* <sup>∈</sup> *<sup>S</sup>*2. The velocity vector in the motion coordinate system is *v* = [*uvwqr*] *<sup>T</sup>* <sup>∈</sup> *<sup>R</sup>*5. The linear velocity in the motion coordinate system is *v*<sup>1</sup> = [*uvw*] *<sup>T</sup>* <sup>∈</sup> *<sup>R</sup>*3. The angular velocity in the motion coordinate system is *v*<sup>2</sup> = [*q r*] *<sup>T</sup>* <sup>∈</sup> *<sup>R</sup>*2. The forces and moments in the motion coordinate system are *T* = [*XYZMN*] *<sup>T</sup>* <sup>∈</sup> *<sup>R</sup>*6. The force in the motion coordinate system is *T*<sup>1</sup> = [*XYZ*] *<sup>T</sup>* <sup>∈</sup> *<sup>R</sup>*3. The moment in the motion coordinate system is *T*<sup>2</sup> = [*M N*] *<sup>T</sup>* <sup>∈</sup> *<sup>R</sup>*2.

Where *R*<sup>3</sup> denotes the three-dimensional Euclidean space and *S*<sup>3</sup> denotes the threedimensional torus, i.e., there exist three angles in the range [0, 2*π*].

Combining the AUV kinematic model and dynamics model, the AUV nonlinear mathematical model vector expression can be obtained as:

$$\begin{aligned} \dot{\eta} &= J(\eta)v\\ M\_R \dot{v} + M\_A \dot{v} + \mathbb{C}\_R(v)v + Y(v) + \mathbb{g}(\eta) &= T + \lambda \end{aligned} \tag{1}$$

The kinematic and kinetic mathematical model derivation process and model parameters of the AUV are shown in the literature [19] shown.

#### *2.2. AUV Feedback Linearization Model*

As can be seen from Equation (1), the nonlinear model of the AUV is still very complicated even if it is written in vector form. In this subsection, we simplify the AUV nonlinear model by using the transformation method to make the complex problem simple. By coordinate transformation, we can transform the nonlinear model of the AUV in the motion coordinate system to a specific coordinate system, in which the nonlinear model will realize the decoupling of each control channel and transform into a second-order integral model.

According to the literature [20], the AUV model is transformed appropriately:

$$\begin{cases} \dot{\eta} = J(\eta)v\\ \dot{v} = M^{-1}N(\eta, v) + M^{-1}T \end{cases} \tag{2}$$

where *M* = *MR* + *MA* is the sum of the inertia matrix and the additional inertia matrix. *T* denotes the control input forces and moments. Synthesizing the three terms of the model *CR*(*v*)*v*, *Y*(*v*), *g*(*η*) into a column vector *N*(*η*, *v*), then Equation (2) can be transformed into:

$$
\begin{bmatrix} \dot{\eta} \\ \dot{\upsilon} \end{bmatrix} = \begin{bmatrix} I & 0 \\ 0 & M^{-1} \end{bmatrix} \begin{bmatrix} J(\eta)\upsilon \\ N(\eta, \upsilon) \end{bmatrix} + \begin{bmatrix} 0 \\ M^{-1} \end{bmatrix} T \tag{3}
$$

In Equation (3), a mathematical model with three axial thrusters and two rudders is considered, replacing the controller input T in Equation (3) with the thrust of the axial thrusters *Xprop*, *Yprop*, *Zprop* and the rudder angles *δr*, *δs*. The vector *ξ* = [*ηT*, *vT*] *<sup>T</sup>* will be formed by *η* and *v*. The two matrices in Equation (3) are taken to be *M*<sup>1</sup> = *I* 0 <sup>0</sup> −*M*−<sup>1</sup> ∈ 0 

*R*10×<sup>10</sup> and *M*<sup>2</sup> = *M*−<sup>1</sup> ∈ *<sup>R</sup>*10×5, respectively. The above Equation (3) is transformed into the following vector form for model linearization:

$$\dot{\tilde{\xi}} = f(\tilde{\xi}) + M\_2 \mathbf{g}'(\tilde{\xi}) \,\hat{\mathfrak{n}}\tag{4}$$

Among them *f*(*ξ*) = *M*<sup>1</sup> *J*(*η*)*v N*(*η*, *v*) ∈ *<sup>R</sup>*10×1, *<sup>g</sup>* (*ξ*) =  *g ij*(*ξ*) ∈ *<sup>R</sup>*5×5, *u*ˆ = ' *Xprop*,*Yprop*, *Zprop*, *δs*, *δ<sup>r</sup>* (*T* .

Vector field: the nonlinear first-order model is taken as the following equation:

$$\begin{array}{l} \dot{\mathbf{x}} = f(\mathbf{x}) + \mathbf{g}(\mathbf{x})u \\ \mathbf{y} = h(\mathbf{x}) \end{array} \tag{5}$$

where *<sup>f</sup>*(*x*), *<sup>g</sup>*(*x*), *<sup>h</sup>*(*x*) is smooth enough over the definition domain *<sup>D</sup>* ∈ *<sup>R</sup>n*, the mapping *<sup>f</sup>* : *<sup>D</sup>* → *<sup>R</sup><sup>n</sup>* and *<sup>g</sup>* : *<sup>D</sup>* → *<sup>R</sup><sup>n</sup>* are vector fields over the domain of definition D.

Lie derivative: derivative of *y* in Equation (5).

$$\dot{y} = \frac{\partial h}{\partial \mathbf{x}}[f(\mathbf{x}) + \mathbf{g}(\mathbf{x})u] = L\_f h(\mathbf{x}) + L\_\mathcal{g} h(\mathbf{x})u \tag{6}$$

where *Lf h*(*x*) = *<sup>∂</sup><sup>h</sup> <sup>∂</sup><sup>x</sup> <sup>f</sup>*(*x*), *Lgh*(*x*) = *<sup>∂</sup><sup>h</sup> <sup>∂</sup><sup>x</sup> g*(*x*), is said to be the Lie derivative of *h* along the smooth vector field *f* .

Define the output function *ζ* = *h*(*ξ*), then the dynamics of the AUV are modeled as:

$$\begin{array}{l} \dot{\xi} = f(\xi) + M\_2 \xi'(\xi) \mathfrak{u} \\ \zeta = h(\xi) \end{array} \tag{7}$$

The basic idea of feedback linearization is to find an appropriate coordinate transformation and a control rate after the coordinate transformation.

Select the coordinate transformation *z* = *ϕ*(*x*).

$$\begin{aligned} z\_1 &= \left[ h\_1(\mathbf{x}), h\_2(\mathbf{x}), h\_3(\mathbf{x}), h\_4(\mathbf{x}), h\_5(\mathbf{x}) \right]^T \\ z\_2 &= \left[ L\_f h\_1(\mathbf{x}), L\_f h\_2(\mathbf{x}), L\_f h\_3(\mathbf{x}), L\_f h\_4(\mathbf{x}), L\_f h\_5(\mathbf{x}) \right]^T \end{aligned} \tag{8}$$

From transforming the coordinates, we have:

$$\begin{aligned} z\_1 &= h(\mathbf{x}) \\ z\_2 &= L\_f h(\mathbf{x}) \end{aligned} \tag{9}$$

The transformation gives:

$$\begin{array}{l}\dot{z}\_1 = z\_2\\\dot{z}\_2 = L\_f^2 h(\mathbf{x}) + L\_\mathcal{g} L\_f h(\mathbf{x})\hat{u}\end{array} \tag{10}$$

In a given coordinate system, to obtain a simpler form, we might as well allow *u* to equal *L*<sup>2</sup> *<sup>f</sup> h*(*x*) + *LgLf h*(x)*u*ˆ. So, we can obtain the following equation:

$$u = B(\mathbf{x}) + \Gamma(\mathbf{x})\mathfrak{A} = L\_f^2 h(\mathbf{x}) + L\_\mathfrak{J} L\_f h(\mathbf{x})\mathfrak{A} \tag{11}$$

Then, the second-order integral model of the AUV in the new coordinate system after transformation can be obtained under the action of Equations (6) and (10).

$$\hat{u} = \Gamma^{-1}(\mathbf{x})(\mu - B(\mathbf{x})) \tag{12}$$

The AUV linearized mathematical model can be obtained as:

$$\begin{array}{l}\dot{z}\_1 = z\_2\\\dot{z}\_2 = \mu\end{array} \tag{13}$$

where, *z*<sup>1</sup> is the position information of the AUV after the coordinate transformation, *z*<sup>2</sup> is the speed information of the AUV after the coordinate transformation, and *u* is the control input of the AUV after the coordinate transformation.

#### **3. CNN-LSTM Prediction Model**

*3.1. Pre-Requisite Knowledge*

3.1.1. Convolutional Neural Network

In underwater formations of multiple AUVs, the transmitted track data from the leader to the follower may be subject to both delay and noise interference caused by various factors such as oceanic noise. To enable accurate trajectory prediction, the data must be filtered prior to analysis. In this study, a convolutional neural network is employed to filter the data and extract the relevant trajectory data features. The basic structure diagram of the network is illustrated in Figure 2.

**Figure 2.** CNN structure schematic.

Figure 2 shows the structure of a convolutional neural network (CNN), which consists of an input layer, a convolutional layer, a ReLU layer, a pooling layer, and a fully connected layer. CNNs differ from traditional neural networks in two main ways:


### 3.1.2. Long Short-Term Memory

For problems related to time series, such as AUV formation tracking, traditional neural network algorithms such as CNNs are not fully applicable. Long short-term memory (LSTM) networks are better suited for these problems due to their memory effect. LSTM networks use memory modules instead of traditional storage units, which are interconnected recursive subnetworks. The memory module contains gates that control the flow of information, allowing for memory information to affect neuronal nodes at longer time intervals. The three gates of an LSTM cell are the input gate, output gate, and forgetting gate, which control the storage and inflow of information as well as the core cell unit. The cell structure of LSTM is shown in Figure 3. The activation function plays an important role in the neural network by introducing nonlinear factors into the model, enabling it to perform well on problems where the linear model is not suitable.

**Figure 3.** LSTM cell structure diagram.

In Figure 3, the symbol " *ft*" represents the forgetting gate, "*it*" represents the input gate, and "*ot*" represents the output gate. "*xt*" denotes the input to the input layer at time "*t*", "*ht*" denotes the output at time "*t*", "*Ct*" denotes the state value of the memory cell at time "*t*", and "*σ*" represents the sigmoid function. The mathematical expressions for "*σ*" and "tanh" in the figure are as follows:

$$
\sigma(z) = \frac{1}{1 - e^{-z}} \tag{14}
$$

$$\tanh(\mathfrak{x}) = \frac{e^{\mathfrak{x}} - e^{-\mathfrak{x}}}{e^{\mathfrak{x}} + e^{-\mathfrak{x}}} \tag{15}$$

The LSTM processes the data internally as follows:

$$f\_t = \sigma \left( \mathcal{W}\_{xf} \mathbf{x}\_t + \mathcal{W}\_{hj} h\_{t-1} + b\_f \right) \tag{16}$$

$$\dot{\mathbf{u}}\_t = \sigma(\mathbf{W}\_{ri}\mathbf{x}\_t + \mathbf{W}\_{hi}h\_{t-1} + b\_i) \tag{17}$$

$$o\_f = \sigma(\mathcal{W}\_{\text{x}\text{v}}\mathfrak{x}\_t + \mathcal{W}\_{\text{hu}}h\_{t-1} + b\_o) \tag{18}$$

$$\mathcal{L}\_t = f\_t \cdot \mathcal{L}\_{t-1} + i\_t \cdot \tanh(\mathcal{W}\_{\mathbf{x}\mathbf{c}} \mathbf{x}\_t + \mathcal{W}\_{\mathbf{h}\mathbf{c}} h\_{t-1} + b\_{\mathbf{c}}) \tag{19}$$

$$h\_l = o\_l \cdot \tanh(c\_l) \tag{20}$$

where, *W* is the weight matrix, · is the product of point pairs, and *b* is the deviation.

From Equations (14)–(18), it can be seen that the LSTM is computed by first calculating the values of the forgetting gate, input gate, output gate, and candidate state *ht*−<sup>1</sup> and the input at the current moment based on the external state. Next, the internal state *ct*−<sup>1</sup> is used to compute the values of the forgetting gate, the input gate and the candidate state in order to update the internal state *ct*. Finally, the information is passed to the external state *ht* via the current internal state and output gates.

#### *3.2. CNN-LSTM Prediction Model Building*

This paper proposes a neural network prediction model that combines the advantages of CNN feature extraction and noise filtering with LSTM temporal memory. The model is designed by connecting the CNN and LSTM layers in series, and its structure is depicted in Figure 4.

**Figure 4.** CNN-LSTM prediction model diagram.

The proposed structure is composed of two main modules: the data processing module and the model prediction module. Upon receiving the navigator state information, the data are first preprocessed and then fed into the prediction model. As illustrated in Figure 4, the CNN module is composed of three convolutional layers: a BatchNorm layer, a dropout layer, an expansion layer, and a fully connected layer, which is responsible for receiving the preprocessed data and extracting data features. The LSTM module, on the other hand, consists of two LSTM layers, which analyze the features extracted by the CNN, explore the time series relationships in the data, and predict multiple future points.

The overall prediction process is as follows: the navigator state information is preprocessed by the data processing module, and the processed data are passed to the CNN module for filtering and spatial feature learning. The CNN generates a sequence of highlevel features representing the capture and passes it to the tensor processing module. The tensor processing layer then reshapes the output of the CNN so that it can be accepted by the LSTM sub-module. Finally, the LSTM module learns the time-series dependencies of the delayed data and outputs the predicted values for the current moment.

#### **4. Predictive Control of Multi-AUV Formations Based on CNN-LSTM Models**

*4.1. Multi-AUV Formation Controller Design under Ideal Communication Conditions*

It may be assumed that there are five AUVs in the formation: one leader and four followers. The formation that the formation wants to form and maintain is an isosceles triangle (the specific formation is shown in Figure 5 below), and the AUVs are required to maintain the formation even when making a spiral dive.

**Figure 5.** Formation diagram.

As shown in Figure 5, L denotes the leader, and *F*1, *F*2, *F*3 and *F*4 are all follower AUVs. According to the formation that we want to achieve, we introduce the variables *R* and *β* to constrain the formation, where the distance from the leader to the followers *F*1 and *F*2 line is *R*, the distance from the follower *F*1 and *F*2 line to the follower *F*3 and *F*4 line is also *R*, and the attitude angle of the formation hold is *β*. The formation constraints proposed in this paper are:

$$\begin{cases} \begin{aligned} \eta\_{F1} + d\_1 &= \eta\_L \\ \eta\_{F2} + d\_2 &= \eta\_L \\ \eta\_{F3} + d\_3 &= \eta\_L \\ \eta\_{F4} + d\_4 &= \eta\_L \end{aligned} \end{cases} \tag{21}$$

$$\begin{cases} \begin{aligned} \dot{\eta}\_{F1} + d\upsilon\_1 &= \dot{\eta}\_L \\ \dot{\eta}\_{F2} + d\upsilon\_2 &= \dot{\eta}\_L \\ \dot{\eta}\_{F3} + d\upsilon\_3 &= \dot{\eta}\_L \\ \dot{\eta}\_{F4} + d\upsilon\_4 &= \dot{\eta}\_L \end{aligned} \end{cases} \tag{21}$$

where *d*1, *d*2, *d*3, *d*4, *dv*1, *dv*2, *dv*<sup>3</sup> and *dv*<sup>4</sup> are denoted as:

$$\begin{cases} \begin{aligned} &d\_{1} = \left(-(\cos\beta)^{-1}R\cos(\psi\_{L}-\beta-\frac{\pi}{2}), (\cos\beta)^{-1}R\cos(\psi\_{L}+\beta-\frac{\pi}{2}), 0, 0, 0\right)^{T} \\ &d\_{2} = \left(-(\cos\beta)^{-1}R\cos(\psi\_{L}+\beta-\frac{\pi}{2}), -(\cos\beta)^{-1}R\cos(\psi\_{L}-\beta-\frac{\pi}{2}), 0, 0, 0\right)^{T} \\ &d\_{3} = \left(-(\cos\beta)^{-1}2R\cos(\psi\_{L}-\beta-\frac{\pi}{2}), (\cos\beta)^{-1}2R\cos(\psi\_{L}+\beta-\frac{\pi}{2}), 0, 0, 0\right)^{T} \\ &d\_{4} = \left(-(\cos\beta)^{-1}2R\cos(\psi\_{L}+\beta-\frac{\pi}{2}), -(\cos\beta)^{-1}2R\cos(\psi\_{L}-\beta-\frac{\pi}{2}), 0, 0, 0\right)^{T} \\ &\left\{ \begin{aligned} &dv\_{1} = l\_{\eta}(r\_{L}\tan(-\beta)\times R, 0, 0, 0, 0)^{T} \\ &dv\_{2} = l\_{\eta}(r\_{L}\tan(\beta)\times R, 0, 0, 0, 0)^{T} \\ &dv\_{3} = l\_{\eta}(r\_{L}\tan(-\beta)\times 2R, 0, 0, 0, 0)^{T} \\ &dv\_{4} = l\_{\eta}(r\_{L}\tan(\beta)\times 2R, 0, 0, 0, 0)^{T} \end{aligned} \right. \end{cases} \end{cases} \tag{22}$$

In a leader-follower formation control with five AUVs, the motion state vector of the *i* th follower AUV at the moment of *t* is *εi*(*t*) = *ηi*(*t*) and the motion state vector of the leader is *εL*(*t*) = *ηL*(*t*). If the formation satisfies Equation (23), it is said that the formation can achieve formation maintenance and stability convergence.

$$\begin{vmatrix} \lim\_{t \to \infty} \left| \varepsilon\_i(t) - \varepsilon\_L(t) + d\_i \right| = 0\\ \lim\_{t \to \infty} \left| \dot{\varepsilon}\_i(t) - \dot{\varepsilon}\_L(t) + d\upsilon\_i \right| = 0 \end{vmatrix} = 1, 2, 3, 4 \tag{23}$$

Let the attitude vector of the *i* th follower AUV at the time of *t* and the attitude vector of the leader AUV at the time of *z*1*<sup>d</sup>* in the lead follower formation control of the AUV be *z*1*i*.

Define the trajectory tracking error of the *i* th follower AUV as *zi*<sup>1</sup>*<sup>e</sup>* = *zi*<sup>1</sup> − *z*1*<sup>d</sup>* + *di*, then . *zi*<sup>1</sup>*<sup>e</sup>* <sup>=</sup> *zi*<sup>2</sup> <sup>−</sup> . *z*1*d*.

Define the following Lyapunov function:

$$V\_{i1} = \frac{1}{2} z\_{i1\mathfrak{e}} \,\,^2\,\tag{24}$$

Define *zi*<sup>2</sup> <sup>=</sup> *zi*<sup>2</sup>*<sup>e</sup>* <sup>+</sup> . *z*1*<sup>d</sup>* − *ci*1*zi*<sup>1</sup>*e*, where *ci*<sup>1</sup> is the positive constant and *zi*<sup>2</sup>*<sup>e</sup>* is the intermediate virtual control item. We can get *zi*<sup>2</sup>*<sup>e</sup>* <sup>=</sup> *zi*<sup>2</sup> <sup>−</sup> . *z*1*<sup>d</sup>* + *ci*1*zi*<sup>1</sup>*e*, and the derivation gives . *zi*<sup>1</sup>*<sup>e</sup>* <sup>=</sup> *zi*<sup>2</sup> <sup>−</sup> . *zi*<sup>1</sup>*<sup>d</sup>* = *zi*<sup>2</sup>*<sup>e</sup>* − *ci*1*zi*<sup>1</sup>*e*.

The derivative of *Vi*<sup>1</sup> gives:

$$
\dot{V}\_{i1} = z\_{i1\varepsilon} \dot{z}\_{i1\varepsilon} = z\_{i1\varepsilon} z\_{i2\varepsilon} - c\_{i1} z\_{i1} \,\,^2 \tag{25}
$$

Define the switching function as:

$$
\sigma\_i = k\_{i1} z\_{i1c} + z\_{i2c} \tag{26}
$$

Among them, *ki*<sup>1</sup> > 0. Because of . *zi*<sup>1</sup>*<sup>e</sup>* = *zi*<sup>2</sup>*<sup>e</sup>* − *ci*1*zi*<sup>1</sup>*e*, we can derive:

$$
\sigma\_i = k\_{i1} z\_{i1c} + z\_{i2c} = k\_{i1} z\_{i1c} + \dot{z}\_{i1c} + c\_{i1} z\_{i1c} = (k\_{i1} + c\_{i1}) z\_{i1c} + \dot{z}\_{i1c} \tag{27}
$$

Because of *ki*<sup>1</sup> <sup>+</sup> *ci*<sup>1</sup> <sup>&</sup>gt; 0, there is *<sup>σ</sup><sup>i</sup>* <sup>=</sup> 0 only when *zi*<sup>1</sup>*<sup>e</sup>* <sup>=</sup> 0, *zi*<sup>2</sup>*<sup>e</sup>* <sup>=</sup> 0 and . *Vi*<sup>1</sup> ≤ 0. For this, the next design step is needed.

Define the following Lyapunov function.

$$V\_{i2} = V\_{i1} + \frac{1}{2}\sigma\_i^2\tag{28}$$

The derivative of *Vi*<sup>2</sup> gives:

$$\begin{array}{l} \dot{V}\_{2} &= \dot{V}\_{i1} + \sigma\_{i}\dot{r}\_{i} \\ &= z\_{i1\epsilon}z\_{i2\epsilon} - c\_{i1}z\_{i1\epsilon}^{2} + \sigma\_{i}\dot{r}\_{i} \\ &= z\_{i1\epsilon}z\_{i2\epsilon} - c\_{i1}z\_{i1\epsilon}^{2} + \sigma\_{i}(k\_{i1}\dot{z}\_{i1\epsilon} + \dot{z}\_{i2\epsilon}) \\ &= z\_{i1\epsilon}z\_{i2\epsilon} - c\_{i1}z\_{i1\epsilon}^{2} + \sigma\_{i}(k\_{i1}(z\_{i2\epsilon} - c\_{i1}z\_{i1\epsilon}) + \dot{z}\_{i2} - \ddot{z}\_{1d} + c\_{i1}\dot{z}\_{i1\epsilon}) \\ &= z\_{i1\epsilon}z\_{i2\epsilon} - c\_{i1}z\_{i1\epsilon}^{2} + \sigma\_{i}(k\_{i1}(z\_{i2\epsilon} - c\_{i1}z\_{i1\epsilon}) + \mathcal{U}\_{i} + F - \ddot{z}\_{1d} + c\_{i1}\dot{z}\_{i1\epsilon}) \end{array} \tag{29}$$

where *Ui* is the expression of the controller to be designed. *F* is the total uncertainty of the system.

The design of the *i* follower controller is shown below.

$$\mathcal{L}I\_{\bar{i}} = -k\_{\bar{i}1}(z\_{\bar{i}2\varepsilon} - c\_{\bar{i}1}z\_{\bar{i}1\varepsilon}) - \overline{F}\tanh(\sigma\_{\bar{i}}) + \ddot{z}\_{1d} - c\_{i1}\dot{z}\_{i1\varepsilon} - k\_{\bar{i}}(\sigma\_{\bar{i}} + \mathcal{J}\_{\bar{i}}\tanh(\sigma\_{\bar{i}})) \tag{30}$$

where *hi* and *β<sup>i</sup>* are positive constants.

Substituting Equation (30) into . *Vi*<sup>2</sup> yields:

$$\begin{array}{rl} \dot{V}\_{i2} &= z\_{i1\varepsilon}z\_{i2\varepsilon} - c\_{i1}z\_{i1\varepsilon}{}^2 - h\_{i}\sigma\_{i}^{2} - h\_{i}\beta\_{i}|\sigma\_{i}| + F\sigma\_{i} - \sf{F}\sigma\_{i} \\ &\leq -c\_{i1}z\_{i1\varepsilon}{}^2 + z\_{i1\varepsilon}z\_{i2\varepsilon} - h\_{i}\sigma\_{i}^{2} - h\_{i}\beta\_{i}|\sigma\_{i}| \end{array} \tag{31}$$

Let *Qi* be equal to the following matrix.

$$Q\_i = \begin{bmatrix} c\_{i1} + h\_i k\_{i1}^2 & h\_i k\_{i1} - \frac{1}{2} \\ h\_i k\_{i1} - \frac{1}{2} & h\_i \end{bmatrix} \tag{32}$$

Due to

$$\begin{array}{rcl}z\_{i\varepsilon}{}^{\mathrm{T}}Q\_{i}z\_{i\varepsilon}&=& \left[\begin{array}{cc}z\_{i1\varepsilon}&z\_{i2\varepsilon}\end{array}\right]\left[\begin{array}{c}c\_{i1}+h\_{i}k\_{i1}^{2}&h\_{i}k\_{i1}-\frac{1}{2}\\h\_{i}k\_{i1}-\frac{1}{2}&h\_{i}\end{array}\right]\left[\begin{array}{c}z\_{i1\varepsilon}&z\_{i2\varepsilon}\end{array}\right]^{\mathrm{T}}\\&=\prescript{}{{}\_{i1}}z\_{i1\varepsilon}^{2}-z\_{i1\varepsilon}z\_{i2\varepsilon}+h\_{i}k\_{i1}^{2}z\_{i1\varepsilon}^{2}+2h\_{i}k\_{i1}z\_{i1\varepsilon}z\_{i2\varepsilon}+h\_{i}z\_{i2\varepsilon}^{2}\\&=\prescript{}{{}\_{i1}}z\_{i1\varepsilon}^{2}-z\_{i1\varepsilon}z\_{i2\varepsilon}+h\_{i}\sigma\_{i}^{2}\end{array}\tag{33}$$

Among them, *zie<sup>T</sup>* = ' *zi*<sup>1</sup>*<sup>e</sup> zi*<sup>2</sup>*<sup>e</sup>* ( . If you want to guarantee that *Qi* is a positive definite matrix, then

$$\dot{V}\_{i2} = -z\_{i\mathfrak{e}}{}^{\mathsf{T}}Q\_{i}z\_{i\mathfrak{e}} - h\_{i}\beta\_{i} \left| \mathfrak{e}\_{i} \right| \lessapprox 0\tag{34}$$

Due to

$$\left| Q\_i \right| = h\_i \left( c\_{i1} + h\_i k\_{i1}^2 \right) - \left( h\_i k\_{i1} - \frac{1}{2} \right)^2 = h\_i (c\_{i1} + k\_{i1}) - \frac{1}{4} \tag{35}$$

Therefore, it is possible to guarantee . *Vi*<sup>2</sup> ≤ 0 by taking the values of *hi*, c*i*<sup>1</sup> and *ki*<sup>1</sup> such that |*Qi*|> 0, i.e., *Qi* is a positive definite matrix.

By taking the values of *h*, c1 and *k*1, you can make |*Q*|> 0. Thus, it can be deduced that *<sup>Q</sup>* is a positive definite matrix and that . *V*<sup>2</sup> ≤ 0 is guaranteed.

According to LaSalle's invariance principle, when . *Vi*<sup>2</sup> ≡ 0 is taken, it can be deduced that *zie* <sup>≡</sup> 0, *<sup>σ</sup><sup>i</sup>* <sup>≡</sup> 0. When *<sup>t</sup>* <sup>→</sup> <sup>∞</sup>, since *zi*<sup>1</sup>*<sup>e</sup>* <sup>→</sup> 0, *zi*<sup>2</sup>*<sup>e</sup>* <sup>→</sup> 0, it can be deduced that *zi*<sup>2</sup>*<sup>e</sup>* <sup>→</sup> 0, . *zi*<sup>1</sup> <sup>→</sup> . *z*1*<sup>d</sup>* .

In summary, it can be seen that the Lyapunov functions *Vi*<sup>1</sup> and *Vi*<sup>2</sup> are positive definite, and the values of *Vi*1, <sup>c</sup>*i*<sup>1</sup> and *ki*<sup>1</sup> can be reasonably chosen to ensure that . *Vi*<sup>1</sup> and . *Vi*<sup>2</sup> are negative definite, so the designed AUV formation controller (30) is stable and convergent.

#### *4.2. Sliding Window-Based Predictive Control of Multi-AUV Formations under Communication Constraints*

In the previous section, the backstepping sliding mode control method was used and the formation controller was designed according to the formation constraint relationship. The controller for the follower AUV in the formation with time-lag state is presented below due to the communication delay between the leader and the follower and the limitations of the hydroacoustic sonar in transmitting high-frequency signals, resulting in a longer communication interval between them. As a consequence, the follower may not receive the real-time status information of the leader.

$$\mathcal{U}I\_{\bar{i}} = -k\_{\bar{i}1}(z\_{\bar{i}2\varepsilon} - c\_1 z\_{\bar{i}1\varepsilon}) - \overline{F} \tanh(\sigma\_{\bar{i}}) + \bar{z}\_{1d} - c\_1 \dot{z}\_{\bar{i}1\varepsilon} - h(\sigma\_{\bar{i}} + \beta \tanh(\sigma\_{\bar{i}})) \tag{36}$$

where, *zi*<sup>1</sup>*<sup>e</sup>* = *zi*<sup>1</sup> − *z*1*d*(*t* − *τ*) + *di*, *zi*<sup>2</sup>*<sup>e</sup>* = *zi*<sup>2</sup> − *z*2*d*(*t* − *τ*) + *c*1*zi*<sup>1</sup>*<sup>e</sup>* + *dvi*, *σ<sup>i</sup>* = *k*1*zi*<sup>1</sup>*<sup>e</sup>* + *zi*<sup>2</sup>*e*, and *τ* are the communication delay times between the navigator and the follower.

To illustrate the effect of communication delay on formation control while preparing for a new predictive control strategy, the following assumptions are made about the communication delay between the leader and the follower and the hydroacoustic sonar occurrence interval:

**Assumption 1.** *The distance between the navigator and the follower is close, and the speed of acoustic wave transmission in the water is 1500 m/s, so the communication time delay caused by the communication transmission is small, where it is assumed that the delay time between the broadcast of the navigator sending the status information and the follower receiving the information and measuring the settlement is 1 s.*

**Assumption 2.** *Due to the limitation of communication bandwidth, the navigator cannot send too many beats of historical status data to the follower at one time; so, suppose the navigator can send five beats of status data to the follower at one time.*

**Assumption 3.** *The hydroacoustic sonar is unable to sound at high frequencies and the sounding time is affected by the size of the data sent, assuming that the communication interval of the hydroacoustic sonar is 6–9 s.*

To solve the communication delay and communication interval problem between the leader and the follower, this section proposes a formation control strategy based on a sliding window to achieve multi-step prediction, which iterates the historical state information of the leader to predict the current state information of the leader step by step, which saves computational efficiency and has better adaptability compared with the observer-based iterative prediction method. The specific principle of the strategy is described below.

At the *M* time, the navigator sends its own status data {*Z*1, *Z*2, ······ *ZM*−1, *ZM*} from the previous *M* time to the follower in the formation. Due to the communication transmission delay *τtran* and the hydroacoustic sonar sounding time consuming *τ*int*er*, a fixed time delay *τonce*\_*tal* = *τtran* + *τ*int*er* is defined, and the follower receives the status information of the navigator at the *M* + *τonce*\_*tal* time, and the status information of the navigator received by the follower at this time is the status information of the navigator at the *M* time. So, the follower needs to predict the state information of the leader at the *M* + *τonce*\_*tal* time as the tracking target based on the state information of the leader at the *M* time.

The second sounding of the sonar starts immediately after the first sounding. Since the transmission delay after the first sounding is included in the second sonar sounding elapsed time, the follower needs time *τ*int*er* to receive the information of the navigator for the second time. Therefore, after the follower receives the status information of the navigator at the *M* + *τonce*\_*tal* time, the follower firstly has to predict the status information of the navigator at the *M* time as the tracking target; secondly, since the follower cannot receive the status information of the navigator at the *τ*int*er* time in the future, the follower needs to then predict the status information of the navigator at the *τ*int*er* time in the future. A schematic diagram of the information transfer process is shown in Figure 6.

**Figure 6.** Information transmission diagram.

This paper focuses on predicting the real-time state of the leader in AUV formation by using delayed state data received by the followers as input to the prediction model. The delayed data are in the form of a time series, and to achieve continuous prediction, a sliding window approach is designed where the delay information is fed into the window as input and the real information as output, as illustrated in Figure 7. To evaluate the model's performance, a delay time of 10 s is set, and the size of the sliding window, which corresponds to the time step of the input data, is set to 5. The prediction equation is given as follows:

$$z(t) = f(\{z(t-14), \cdot, \cdot, z(t-11), z(t-10)\})\tag{37}$$

where *z* = [*z*1, *z*2] denotes the position vector of the navigator in time. *z*<sup>1</sup> = [*x*, *y*, *depth*, *θ*, *ψ*] *T*, where *x*, *y* and *depth* represent the displacement in three coordinate directions; *θ* and *ψ* represent the pitch and heading angles. *z*<sup>2</sup> = [*u*, *v*, *w*, *q*,*r*] *<sup>T</sup>*, where *u*, *v* and *w* are the longitudinal, lateral and vertical velocities respectively; *q* and *r* are the longitudinal and bow angular velocities.


**Figure 7.** Schematic diagram of the sliding window.

The prediction strategy designed in this paper has two main phases: fixed delay prediction and communication interval prediction. In the fixed delay prediction stage, the follower puts the state information sent by the navigator into the designed sliding window and uses the prediction model to predict the state quantity *Z*ˆ *<sup>M</sup>*+<sup>1</sup> at the *M* + 1 th time based on the data in the first *M* times of the sliding window. Put *Z*ˆ *<sup>M</sup>*+<sup>1</sup> into the sliding window, and then the sliding window moves forward to obtain *Z*ˆ *<sup>M</sup>*+<sup>2</sup> using *Z*ˆ *<sup>M</sup>*+<sup>1</sup> and the historical state quantity prediction, and finally obtain the state prediction *<sup>Z</sup>*<sup>ˆ</sup> *<sup>M</sup>*+*τonce*\_*tal* at *<sup>M</sup>* <sup>+</sup> *<sup>τ</sup>once*\_*tal* moments through continuous iterative prediction.

At the same time, due to the effect of hydroacoustic sonar sounding time consumption, the follower will only receive the next status data from the navigator at the moment of *M* + *τonce*\_*tal* + *τ*int*er*, so the follower will continue to make iterative predictions based on the status quantity *<sup>Z</sup>*<sup>ˆ</sup> *<sup>M</sup>*+*τonce*\_*tal* obtained from the prediction compensation during this period, obtain *Z*ˆ *<sup>M</sup>*+*<sup>τ</sup> once*\_*tal*+1<sup>+</sup>1, *<sup>Z</sup>*<sup>ˆ</sup> *<sup>M</sup>*+*τonce*\_*tal* <sup>+</sup>2······ *<sup>Z</sup>*<sup>ˆ</sup> *<sup>M</sup>*+*τonce*\_*tal* <sup>+</sup>*τ*int*er*, and output in turn until it receives the time delay status data from the navigator again. Based on the above strategy, the follower will get the predicted value of the current moment of the leader; the controller of the follower in the AUV formation at this time is shown below.

$$\mathcal{L}I\_{\hat{i}} = -k\_{\hat{i}1}(\hat{z}\_{i2\epsilon} - c\_1 \hat{z}\_{i1\epsilon}) - \overline{F} \tanh(\hat{\sigma}\_{\hat{i}}) + \ddot{z}\_{1d} - c\_1 \dot{\hat{z}}\_{i1\epsilon} - h(\hat{\sigma}\_{\hat{i}} + \beta \tanh(\hat{\sigma}\_{\hat{i}})) \tag{38}$$

where, *z*ˆ*i*1*<sup>e</sup>* = *zi*<sup>1</sup> − *z*ˆ1*d*(*t*) + *di*, *z*ˆ*i*2*<sup>e</sup>* = *zi*<sup>2</sup> − *z*ˆ2*d*(*t*) + *c*1*z*ˆ*i*1*<sup>e</sup>* + *dv*2, *σ*ˆ*<sup>i</sup>* = *k*1*z*ˆ*i*1*<sup>e</sup>* + *z*ˆ*i*2*e*, *z*ˆ1*d*(*t*) and *z*ˆ1*d*(*t*) are the predicted values of CNN-LSTM model.

The block diagram of CNN-LSTN-based multi-AUV formation prediction control under communication constraints is shown in Figure 8. Based on the pilot-follower formation control strategy, there is a communication delay when the follower AUV receives the position and speed information from the pilot due to the influence of hydroacoustic communication. In this paper, a CNN-LSTM prediction model is established to make predictions based on the historical information of the pilot, which can well offset the effects of noise and communication delay on formation control. The prediction information and feedback information are used as the input of the AUV formation controller to finally realize the AUV formation prediction control.

**Figure 8.** AUV formation prediction control block diagram.

#### **5. Simulation Verification and Analysis**

#### *5.1. Simulation Results and Analysis of CNN-LSTM Model*

The trajectory data of a small AUV, consisting of longitude and latitude measurements from multiple positioning systems, as well as values from GPS, bathymetry, and Doppler measurements with a maximum depth of 20 m, were selected as the training set for this study. The relevant information of the training set is shown in Table 1.

**Table 1.** AUV status information.


These training data were obtained from the trajectory data of an AUV on-lake experiment, and some of its trajectories are shown in Figure 9. The raw data were preprocessed and used for the training of the CNN-LSTM model.

**Figure 9.** AUV partial trajectory data.

The CNN model designed in this paper contains three convolutional layers with filter sizes of (2, 1), (3, 1) and (3, 1) for each layer, and a dropout layer is added to prevent overfitting. The processed features were passed to the two-layer LSTM model, and the predicted data were output by the last LSTM layer. Through continuous debugging, it was found that the network with 125 and 128 neurons in each layer was trained well. Additionally, to prevent the overfitting of the network, a discard layer with probability 0.3 is built after the hidden layer. The Adam algorithm is used for optimization, and the design learning rate decline period is 100, the learning rate is 0.012, the learning rate decline coefficient is 0.8, and finally, the gradient threshold is set to 1 in order to prevent gradient explosion.

After processing the delayed data according to the aforementioned data processing steps, they are fed into the CNN-LSTM model using the sliding window format. The performance of the model is then evaluated by computing the mean square error (*MSE*) and maximum absolute error (MAXERR) between the predicted values and the actual trajectory data. The evaluation metrics can be formulated as follows:

$$MSE = \frac{1}{N} \sum\_{t=1}^{N} (\text{observed}\_t - \text{predicted}\_i)^2 \tag{39}$$

where *N* indicates the number of samples.

$$\text{Maxerr} = \max \left| \frac{\text{observed}\_l - \text{predicted}\_t}{\text{observed}} \right| \tag{40}$$

According to Assumption 2, the leader broadcasts the data of the past five beats to each follower at a time, so the size of the sliding window is set to 5, and the prediction effect of the prediction model is verified under the fixed delay of 2 s and the communication interval of 7 s. The selected navigator trajectory is a spiral dive trajectory, and Gaussian white noise with an amplitude of 0.003 is superimposed on the trajectory data, and the LSTM prediction model is selected for simulation comparison. The parameters of the two model designs are shown in Table 2.


**Table 2.** AUV status information.

Since the velocity quantities in the selected trajectories are kept constant, in order to objectively compare the advantages and disadvantages of the two prediction models, only the navigator state quantities *z* = [*x*, *y*, *depth*, *θ*, *ψ*] are compared for prediction, and the simulation results are shown in Figure 10.

**Figure 10.** AUV trajectory prediction error: (**a**) northward trajectory prediction error, (**b**) eastward trajectory prediction error, (**c**) vertical trajectory prediction error, (**d**) longitudinal inclination angle prediction error and (**e**) bow angle prediction error.

Based on Figures 10 and 11, it can be observed that the CNN-LSTM model predicts a trajectory that is closer to the actual value, with a smoother prediction curve and lower error fluctuations. These results demonstrate that the CNN-LSTM model provides higher accuracy and stability. The *MSE* values for the predicted states by the LSTM model are 1.7911, 1.7947, 1.1921, 1.6871, and 0.2564, while the CNN-LSTM model predicts the state with lower *MSE* values of 0.6868, 0.6315, 0.0664, 1.3078, and 0.1139. These *MSE* values are smaller compared to those of the pure LSTM model, indicating that the CNN-LSTM model provides better prediction results.

**Figure 11.** AUV trajectory prediction: (**a**) horizontal plane trajectory prediction and (**b**) 3D trajectory prediction.

#### *5.2. Formatting of Mathematical Components*

To verify the prediction effectiveness of the CNN-LSTM prediction model in the formation and formation holding phases of the multi-AUV formation, the communication transmission delay *τtran* is set to 1 s and the hydroacoustic sonar sounding delay *τ*int*er* is set to 4 s, i.e., the fixed time delay *τonce*\_*tal* is defined to be 5 s and the maximum total time delay is 9 s. The formation design is consistent with Figure 5.

The navigational track of the navigator is

$$\begin{cases} \begin{array}{l} \chi\_p = 60 \cos(2\pi t/1000) \\ y\_p = 60 \sin(2\pi t/1000) \\ z\_p = -0.3t \end{array} \end{cases} \begin{array}{l} \begin{array}{l} \chi\_p = 0 \\ \end{array} \end{array} \tag{41}$$

The initial state of the AUV is as follows: initial position *x*(0) is randomly taken in the range of [55, 65] m, *y*(0) is randomly taken in the range of [−10, 10] m, *x*(0) is 65 m, depth is 0 m, initial attitude *θ*(0) is 0 rad, bow angle *ψ*(0) is 4π/3 rad, longitudinal velocity *u*(0) is 0.5 m/s, all other velocities are initialized to 0 m/s, and controller parameters are *h* = 1, *k*<sup>1</sup> = 0.3, *c*<sup>1</sup> = 0.3, *F* = 0.02, *β* = 0.5.

The simulation results are shown in Figures 12 and 13.

In Figure 12, (a) to (e) are the simulation plots of AUV formation position information, from which it can be seen that the leader and the follower always keep the same position, pitch angle and bow angle during the spiral dive under the action of the formation controller; (f) to (j) are the simulation plots of AUV formation speed information, from which it can be seen that the bow speed, lateral speed and vertical speed of the follower in the formation have some fluctuations, but the overall velocity remains stable. Figure 13 shows the 3D trajectory of the AUV formation and its projection on the horizontal plane, from which it can be seen that the followers can follow the leader more accurately and can realize

the multi-AUV formation control in a 3D environment. The simulation results illustrate that the formation control method combining CNN-LSTM prediction and backstepping sliding mode control designed in this paper can better realize the three-dimensional predictive control of multi-AUV formation under the communication constraints.

**Figure 12.** *Cont*.

**Figure 12.** Simulation diagrams of formation position and velocity information: (**a**) AUV northward trajectory, (**b**) AUV eastward trajectory, (**c**) AUV vertical trajectory, (**d**) AUV longitudinal inclination angle state, (**e**) AUV bow angle state, (**f**) AUV longitudinal velocity, (**g**) AUV lateral velocity, (**h**) AUV vertical velocity, (**i**) AUV longitudinal inclination angle velocity and (**j**) AUV bow angle velocity.

**Figure 13.** AUV formation 3D trajectory diagram and its horizontal projection: (**a**) 3D trajectory diagram and (**b**) horizontal projection diagram.

Figure 14 shows the position and attitude errors of the AUV formation under CNN-LSTM prediction. Figure 15 shows the position and attitude error of AUV formation under communication delay. From Figures 14a,b and 15a,b, it can be seen that the northward and eastward errors of the AUV formation under predictive control are much smaller than the control errors under delay, indicating that the CNN-LSTM prediction-based AUV formation control method can better overcome the effect of communication delay on formation control.

**Figure 14.** Errors of follower AUV under CNN-LSTM model prediction: (**a**) northward error, (**b**) eastward error, (**c**) vertical error, (**d**) longitudinal inclination angle error and (**e**) bow angle error.

#### **6. Conclusions**

This paper focuses on the multi-AUV formation control problem under communication constraints. Firstly, a five-degree-of-freedom nonlinear model of the AUV is established and processed by using feedback linearization to obtain a second-order integral model of the AUV. A sliding-window-based formation prediction control strategy is designed to iteratively predict the current state information of the leader by the historical state information of the leader step by step. The method saves computational efficiency and has better adaptability. The CNN-LSTM prediction model is chosen to predict the trajectory state of the navigator for the characteristics of AUV motion trajectory with certain temporality, which compensates for the influence of communication delay on the formation control; and the backstepping method and sliding mode control are combined to design the formation controller, which improves the robustness of the controller. The stability of the control is proved based on Lyapunov stability theory. The effectiveness of the CNN-LSTM prediction model and the designed controller are verified by simulation.

**Author Contributions:** Conceptualization, J.L. and Z.T.; methodology, J.L.; software, J.L.; validation, J.L. and Z.T.; formal analysis, J.L.; investigation, J.L. and Z.T.; resources, J.L.; data curation, G.Z.; writing—original draft preparation, J.L.; writing—review and editing, W.L. and Z.T.; visualization, G.Z. and Z.T.; supervision, J.L.; project administration, J.L.; funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China (Grant No. 5217110503), the Research Fund from Science and Technology on Underwater Vehicle Technology (Grant No. JCKYS2021SXJQR-09) and the Natural Science Foundation of Shandong Province (Grant No. ZR202103070036).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
