*2.2. Stacked Denoising Autoencoder*

The encoder is a commonly used learning model in deep learning. The structure of this model is shown in Figure 4. The stack noise reduction autoencoding network is based on the encoder. The encoder must learn to obtain noise-free input from the noisy data. Unlike the supervised learning model CNN and Recurrent Neural Networks (RNN) [34], it combines unsupervised data feature extraction with supervised overall fine-tuning, and it can mainly realize the noise reduction and dimensionality reduction of the features of highnoise information. The structure is shown in Figure 5. Stack noise reduction autoencoder and encoder are mainly composed of encoder and decoder, which can be used to extract hidden features of samples and reconstruct input.

**Figure 4.** The structure of the encoder.

Assuming that *<sup>C</sup>*(*x*|*x*<sup>&</sup>lt;) represents the error between the original data *x* and the noisy data *x*< the DAE parameters are optimized and adjusted by using back propagation and gradient descent methods. After training DAE, the hidden layer can be regarded as the input of the next DAE, and this multiple DAE can form the model of the stack denoising autoencoder [37].

**Figure 5.** The structure of the denoising autoencoder.

### **3. Proposed Convolutional Neural Network with Stacked Pruning Sparse Denoising Autoencoder**

In this paper, an intelligent quadrotor UAV fault diagnosis method based on stacked pruning sparse noise reduction autoencoder and convolutional neural network is proposed. We mainly use sPSDAE as the first layer of the neural network to reduce noise and dimensionality of the original data. The introduction of stack pruning sparse noise reduction autoencoder can improve the model generalization ability of the neural network and suppress the over-fitting problem. Secondly, convolutional neural network (CNN) is used to extract and classify system features. The algorithm model is shown in Figure 6:

**Figure 6.** sPSDAE-CNN algorithm model.

Firstly, collect the flight data of the drone. In order to simulate the damage of the blades of the drone in the actual flight, we collect the drone data by artificially damaging the blades of the quadrotor rotor drone in a laboratory environment. The individual blades of the UAV are set to have different degrees and types of damage. The main types and degrees of damage are shown in Table 1 below:

**Table 1.** Main types and degrees of damage.


Eight different types and degrees of damage to the blades are shown in Figure 7:

**Figure 7.** Eight different types of blade damage.

This paper chooses to use a quad-rotor drone with pixhawk4 flight control as the main control board for data collection. We let quad-rotor drones conduct flight experiments in different health states, collect data, and convert the collected data into a two-dimensional grayscale image. The paper selects the output of the four actuators in the flight log of the drone, the quaternion representing the attitude of the drone, the angular velocity on the three coordinate axes of the drone, and the position information, velocity information and acceleration information of the flight on the three coordinate axes of XYZ. Taking 20 sampling periods as a data state, a 20 × 20 two-dimensional matrix is formed, which is converted into a 20 × 20 grayscale image. As shown in Figure 8.

**Figure 8.** Converting one-dimensional time-domain signals to two-dimensional gray-scale images.

### *3.1. Proposed sPSDAE-CNN Model Structure*

We convert the drone flight data after batch normalization (BN) into a grayscale image. Using stacked pruning sparse denoising autoencoders to reduce the dimensionality and denoising of the original data, it can also initially extract data features. The data processed by the sparse noise reduction autoencoder will be directly used as the input of the convolutional neural network. On the whole, the structure of the sPSDAE-CNN proposed in this paper is roughly the same as the structure of the traditional convolutional neural network. The main difference is that the stack noise reduction autoencoder is introduced, but the introduction of the noise reduction encoder further increases the complexity of the network and increases the computational cost, so the sparse pruning operation is added to reduce the complexity of the network. The noise reduction autoencoder improves the adaptability of the network to high-noise data, and the pruning operation greatly improves

the calculation efficiency of the encoder. The specific structure of sPSDAE-CNN is shown in Figure 9.

**Figure 9.** The specific structure of sPSDAE-CNN.

Finally, in the classification stage of the model, the softmax function is used to perform logit transformation on the classification results, and eight different four-rotor UAV health state probability distributions are obtained (5).

$$q(z\_j) = \frac{e^{z\_j}}{\sum\_{k}^{8} e^{z\_k}},\tag{5}$$

where *z* represents the logical value of the *j*th neuron.

*3.2. Construction of Sparse Noise Reduction Autoencoding Network*

In order to explore the deep-level features in the time-domain sequence signal, we convert the one-dimensional time-domain sequence signal into a two-dimensional gray-scale image by using a matrix transformation method. Figure 9 shows the structure of a stacked noise reduction autoencoder with four hidden layers. Since each layer of a traditional stacked noise reduction encoder has an impact on its subsequent network levels, we use the pruning method to cut off the layers that have no effect on the training of the next layer of the network, while ensuring the maximum information flow in the network. Therefore, the latter layer can obtain the maximum effective information of the previous layer, which improves the training speed and feature extraction performance. The schematic diagram of constructing the stacked pruning sparse denoising autoencoder(sUPSDAE) fully connected network model based on the DAE model is shown in Figure 10:

**Figure 10.** Schematic diagram of sUPSDAE fully connected network model.

sUPSDE adopts the feature fusion method for information sharing, which reduces the loss of information and broadens the transmission level of the network. As the number of training layers increases, the number of network calculations will increase sharply, and it is also prone to the problem of overfitting. We reduce the amount of calculation by introducing sparse pruning operations while suppressing overfitting.

In Figure 10, we can ge<sup>t</sup> that the model of the *i*th layer, which is related to the first *i* unit nodes when it is trained. In order to introduce sparse operations into sUPSDEA, this paper randomly selects some features of the input layer in the training loop, and uses Formula (6) [38] to randomly discard it, and then periodically introduce sparse operations in subsequent node training until all units have been trained.

$$\begin{aligned} \upsilon &= Berboundli(1-p\_1) \\ \overline{\beta\_i^\*} &= \upsilon \times \overline{\beta}\_i \end{aligned} \tag{6}$$

where *p*1 represents the probability of the current training unit being discarded, and *βi* represents the input matrix before discarding. *βi*<sup>∗</sup> is the input matrix after random discarding in one cycle.

After the sUPSDEA training is over, backpropagation is performed by using Back Propagation Neural Network (BPNN) [39], and the parameters and weights of the network are fine-tuned. In this process, the discarded units are added through Equation (7) to further reduce the possible overfitting of the model.

$$\begin{aligned} \pi &= Berboundli(1-p\_2) \\ \overline{X\_i^\*} &= \pi \times \overline{X\_i} \end{aligned} \tag{7}$$

where *p*2 is the probability of discarding irrelevant nodes in the fine-tuning process, *Xi* is the output of the network in the fine-tuning process, and *Xi*∗ is the input data randomly discarded in one cycle of the fine-tuning process.

### *3.3. The Influence of Various Parts of the Model on the Results*

### 3.3.1. The Effect of Sparse Pruning and Noise Reduction Autoencoder on the Results

The stack sparse noise reduction autoencoder transforms the original two-dimensional 20 × 20 grayscale images into 10 × 10 grayscale images by dimensionality reduction, which dramatically reduces the computational cost of the subsequent convolutional neural network. At the same time, the noise signal contained in the data can be filtered out, which also realizes the prediction of the original signal of the signal destroyed by the noise. By training the model parameters of the model, the model can finally achieve an accurate prediction of the original signal and eliminate the interference of noise to the original signal to a large extent, which can effectively improve the final diagnosis effect of the model.

### 3.3.2. The Effect of Convolutional Neural Networks on Results

The convolutional neural network uses the output of the dimensionality reduction of the stack sparse noise reduction autoencoder as the input of the convolutional neural network, and uses the convolutional neural network to extract the characteristics of the data collected by the drone. By combining the high-dimensional input data, the feature is mapped to the low-dimensional UAV health status, which can easily convert the original data into the UAV health status. At the same time, it has a very good non-linear fitting ability, which is very beneficial to the fault diagnosis of the quad-rotor UAV, which improves the adaptive ability of the model to a certain extent.
