*4.2. Experimental Settings*

This paper compares our methods with traditional convolution neural networks, SVM [44] and traditional unsupervised learning stack autoencoders. Consider the impact of different size datasets on the performance of the neural network, then compare the changes of the performance of the neural network before and after sparse pruning operation. Finally, experiments are carried out in different noise levels to compare and analyze the anti-noise ability of the sPSDAE-CNN model.

### 4.2.1. Parameters of the Proposed Network

The sPSDAE-CNN network model proposed by this paper consists of an sPSDAE sparse pruning noise reduction autoencoder and a convolution neural network. The sPS-DAE consists of one input layer, one output layer, and four hidden layers. The specific structure is shown in Figure 9, The specific structural parameters of convolutional neural networks are shown in Table 3.


**Table 3.** Structural parameters of convolutional neural networks.

The introduced pruning operation also improves the training speed of the network. The output of sPSDAE is used as input of CNN. The main structure of a convolution network is three convolution layers and pooling layers. Then there is a hidden layer of full connection layer. Finally, the output layer is reached by a softmax layer. The convolution cores of the system select small convolution cores to convolute, the pooling layer chooses maximum pooling, and the activation function of the neural network chooses RELU function. In order to improve the performance of the network, a batch normalization operation is added behind each convolution layer and the full connection layer. The batch normalization operation can accelerate the convergence speed of the training of the neural network and suppress the over-fitting phenomenon in the network. The convolution and pooling parameters of convolution neural networks are detailed in Table 3.

### 4.2.2. Hyperparameter Optimization of the Proposed Network

We use PyTorch, a deep learning framework developed by an American company called Facebook, to conduct our actual experiment. To minimize our loss function, this paper uses the random gradient descent method to optimize our convolution neural network model. In the actual experiment, we choose the Adam optimizer as the final hyperparameter optimization method. The Adam optimizer combines the advantages of AdaGrad and RMSProp algorithms, and calculates the update step by step, comprehensively considering the first-order moment estimation and the second-order moment estimation of the loss function. Adam has the advantages of simple implementation, high computational efficiency, and low requirement of memory. In addition, the parameter update of this method is not affected by the scaling transformation of gradient, and it is also very suitable for the application of large-scale data and parameters in practice. Finally, it has good performance even in the case of large gradient noise. We set the learning rate of Adam's optimizer to 0.001. At the same time, the cross-entropy loss function is trained as the objective function. Reference in detail [45].

### 4.2.3. The Effect of the Number of Training Data on the Results

As a type of convolution network, there are a large number of parameters in the sPSDAE-CNN model that need to be determined during the training process of the model. In order to improve the recognition accuracy of the network and suppress over-fitting in the system, a large amount of experimental data is needed to train the network. To study the training results of the neural network under different training samples, the number of training data of the neural network is set to 100, 200, 300, 900, 1500, 3000, 6000, 12,000, 15,000, and 20,000 training samples to study the performance of sPSDAE-CNN. In deep learning, there are balanced and unbalanced data collections. In Table 2, our data is fully balanced, so accuracy can still be used to evaluate the algorithm.

Because the data set is completely balanced, the data samples of UAVs under each fault condition are the same. In the actual experiment, the first three datasets do not use the sliding window method to enhance and expand the data. To reduce the influence of the random initial values of the neural network on the training results of the network, 30 repeated experiments were performed on each sample to calculate the average value. The paper uses AMD Ryzen ™ 5 4600H processor, NVIDIA GTX1650 graphics card, and 16GB of memory. The test data collection is tested using DataSet D in Table 3, and the test results are shown in Figure 13.

**Figure 13.** Diagnostic results for different data volumes.

In Figure 13, it is clear that the accuracy of the test dataset increases significantly as the training data goes from a smaller number of samples to a more significant number of samples. When the training data increases from 100 to 300, the accuracy of the test set data increases by about 20%. With the increase of training data, the accuracy of the neural network gradually approaches and converges to 100%. When the training data is increased from 100 to 300, the accuracy of the test set data is improved by about 20%. As the training data increases, the accuracy of the neural network approaches and converges to 100%, and the standard deviation converges to 0. Secondly, we can observe that the average time of a signal diagnosed by the training model of the neural network is 4 ms, which meets the requirements of test data. By comparing the training time of different test

sets, we can find that the increase of the number of training data has little effect on the test time. In Figure 14, the points of different colors represent that the blades of the UAV are in different fault states. In the beginning, due to the small number of training samples, it is not easy to segmen<sup>t</sup> the characteristics between different types of data. With the increase of the number of training samples of the model, the data in the same fault situation begin polymerization, and the characteristics of different types of fault data become easier to segment. At the same time, this shows that by using the data enhancement method to enhance the original data collected by UAV, we can greatly increase the data scale and data diversity of neural network training samples, which can further improve the generalization ability of the model. Therefore, in subsequent experiments, this paper selected as many as 20,000 samples as possible for training.

**Figure 14.** Visualizing test samples from the last hidden fully connected layer with t-SNE under different training data numbers.

In the subsequent model training, we choose 20,000 training samples. The parameters in the neural network model are determined through the training set, and then we use t-SNE visualization to make the t-SNE diagram of the neural network of each layer, as shown in Figure 15. It can be seen from the figure that the separability of different features in the unprocessed original data is very poor. After successively passing through each layer of the neural network, different features in the data begin to separate. In the last layer of the neural network, we can clearly see that different types of features in the data have been completely separated. Finally, different types of faults of UAV are diagnosed through the softmax layer.

In order to evaluate the accuracy of the model for different types of fault diagnosis, we introduce the Confusion Matrix. The Confusion Matrix can evaluate the performance of the classification model by counting the number of correct and wrong classifications. The Confusion Matrix of the model is shown in Figure 16. It can be seen from the figure that the accuracy of unmanned fault diagnosis of different types of four rotors remains basically the same, At the same time, the final model can be obtained through the Confusion Matrix, and the accuracy of fault diagnosis for four rotor UAV can reach about 98%, which can meet the needs of our actual projects and experiments.

**Figure 15.** t-SNE Visualization of each layer of neural network.

**Figure 16.** Confusion Matrix of the proposed model.

### 4.2.4. Training Speed of sPSDAE-CNN

Because this paper adds a stack denoising autoencoder in the front part of the neural network, it will not only improve the performance of the neural network, but also increase the time and cost of model training. Pruning operation is proposed for the stack denoising autoencoder, which not only introduces the noise reduction performance of stack denoising autoencoder, but also reduces the time cost of neural network training as much as possible. In the training of the model, we can find that under the same amount of data, the training speed of the neural network with pruning operation is basically the same as that without stack noise reduction encoder, but its training speed is much better than that without pruning operation. The specific network training speed is shown in Figure 17.

**Figure 17.** Comparison of training time required by different neural network models under different data scales.

### 4.2.5. Performance under Different Noise Interferences

Taking the collected UAV data as the original data, the drone may be disturbed by various signals during the execution of relevant tasks, so as to introduce noise signals to the data of sensors of the drone. It is impossible to obtain all the noisy data through experiments, so Gaussian white noise is artificially added to the original collected data to simulate the noise interference signals that may appear in the actual drone, and the signals with different signal-to-noise ratios are obtained. SNR is defined as follows (8):

$$SNR\_{db} = 10 \log\_{10} (\frac{P\_{signal}}{P\_{noise}}) \,\text{}\tag{8}$$

where *Pnoise* and *Psignal* represent the energy of signal and noise, respectively. It can be seen in Figure 18 that the UAV data collected in the laboratory environment is ideal and contains relatively little noise. In order to simulate the flight data under different interference in the actual flight environment, Gauss white noise of different degrees is added to the data, because Gauss white noise is the most common noise signal in nature. Therefore, we obtain aircraft data with different sizes of Gaussian white noise, that is, data with different signal-to-noise ratios. Finally, it can be seen from Figure 18 that the data after adding Gaussian white noise is closer to the UAV data in the actual flight environment. We evaluate the performance of the proposed model in different noise environments by studying the performance of the algorithm model with a signal-to-noise ratio of −4 dB to 10 dB.

In order to verify the efficiency of our proposed algorithm, we use the same test data to test the performance of CNN, SVM ,and SDAE, as shown in Figure 19:

As can be seen from Figure 19 that, firstly, because the sparse pruning noise reduction autoencoding convolutional neural network proposed by us has good noise reduction characteristics, it can be clearly seen that when the noise in the signal is considerable, the fault diagnosis effect of the model is obviously better than several other intelligent fault diagnosis methods. Secondly, due to the introduction of sparse pruning operation in the stack noise reduction autoencoder, this operation can improve the computational efficiency of the network to a certain extent, and make our proposed model still have very good performance in the case of low noise.

**Figure 18.** Original UAV data, noise data to be added, and final synthesized data containing Gaussian white noise.

**Figure 19.** Comparison of accuracy of different fault diagnosis algorithms under different noise levels.

The experimental conditions are mainly flight experiments outdoors, and fault diagnosis is carried out by using the UAV data collected and saved by pixhawk4 flight control board. We artificially add different degrees of Gaussian white noise signals to the collected data to simulate the actual noise signals, and obtain the experimental data with different degrees of noise. SVM is used for classification. The 20 × 20 gray image obtained from UAV data processing is transformed into a feature vector with a length of 400. Multi classification support vector machine is used, in which the radial basis function (RBF) kernel function is selected as the kernel function. Gamma is set to the best value of 0.001 through many experiments. Finally, the experimental results in this paper are obtained. Secondly, in the use of convolutional neural network, we directly use the convolutional neural network proposed in the article, add a convolution layer to the previous layer of convolutional neural network to extract the data features, reduce the dimension, convert the original 20 × 20 graphics into 10 × 10 gray images, and carry out subsequent operations and classification. When DEA is used for recognition and classification, DAE is used for dimensionality reduction and optimization of the original data. BCE error is used in the training process. Adam optimizer is used, and then convolutional neural network is used for data feature extraction and classification.
