5.1.3. Fully Connected Layer

The fully connected layer is located at the end of the convolutional neural network and it is used to calculate the output of the entire network. After the downsampling is completed, many corresponding feature maps can be obtained. At this time, all of its pixels must be arranged in columns to form a feature vector, and then all of them are connected to the output layer to serve as a fully connected layer. Additionally, Softmax was used as the classifier.

#### 5.1.4. Convolutional Neural Network Structure

Convolutional neural networks can determine the basic parameters of the corresponding structure of the network by analyzing specific problems. Figure 10 shows the structure of a basic convolutional neural network. Its overall architecture is similar to the convolutional neural network model corresponding to the CNN code in Deep Learning Toolbox.

**Figure 10.** Convolution neural network structure.

It can be clearly seen from Figure 10 that the number of input layers, output layers and feature vector layers is one, and the number of convolutional layers and downsampling layers is two. They connect and cooperate with each other to construct a complete network. In this process, when a pixel value of *p* × *p* is used as an input layer signal, *m* convolution kernels of type *k*<sup>1</sup> × *k*<sup>1</sup> start to perform the corresponding convolution operation with a step size of one. Additionally, through the activation function *m* (*p* − *k*<sup>1</sup> + 1) × (*p* − *k*<sup>1</sup> + 1), feature maps are obtained to form the convolution layer *C*1. Then, it completes the entire pooling process through the pooling area of model size *c* × *c*, so as to obtain the sampling layer *S*2. At this time, there are still *m* feature maps, and the side length becomes (*p* − *k*<sup>1</sup> + 1)/*c*. Next, the *S*2 layer and the *n* convolution kernels of all models of *k*<sup>1</sup> × *k*<sup>1</sup> each start to perform the corresponding convolution operation, and then use the activation function to obtain the convolution layer *C*3. After completing the pooling process again, the sampling layer *S*4 can be obtained. Then, arrange all of its *n* feature maps in columns to obtain the feature vector layer *V*5 that was originally required. *V*5 is fully connected to the output layer to obtain the output.

#### *5.2. Attribute Reduction Based on Rough Set*

The aforementioned six sets of sample attributes *Xf*, *Cf*, *Kf*, *If*, *Sf* and *Xp* have different abilities to distinguish the fault mode. In order to reduce unnecessary attribute calculation, on the premise of ensuring the correct rate of fault diagnosis, the attributes that contribute less to fault diagnosis can be deleted. This paper is based on the rough set theory [30–32] for attribute reduction.

For a fixed decision system *S* = (*U*, *C* ∪ *D*, *V*, *f*), once there is *B*<sup>1</sup> ⊆ *B*<sup>2</sup> ⊆ *C*, then there is *PosB*<sup>1</sup> (*D*) <sup>≤</sup> *PosB*<sup>2</sup> (*D*) <sup>≤</sup> <sup>|</sup>*PosC*(*D*)|, while *γB*<sup>1</sup> (*D*) <sup>≤</sup> *γB*<sup>2</sup> (*D*) ≤ |*γC*(*D*)|, that is, the dependence formula intuitively shows that the dependence degree and the attribute set have an inverse proportional relationship. Additionally, the calculation method based on attribute dependency is as follows [33,34]:

(1) Calculation method in the case of deleting attributes: for a fixed decision system *DS* = (*U*, *C* ∪ *D*, *V*, *f*), ∀*B* ⊆ *C*; once the attribute *a* ∈ *B*, then the calculation method of the importance of the conditional attribute *a* for the decision attribute *D* is as follows:

$$\operatorname{Sign}(a, B, D) = \gamma\_B(D) - \gamma\_{B - \{a\}}(D) \tag{12}$$

According to the formula, it can be seen that when the attribute is removed from the condition attribute set, the weakened dependence of the decision attribute on the condition attribute can be used as the basis for the meaning of the attribute to the decision attribute.

(2) Calculation method in the case of adding attributes: for a fixed decision system *DS*, once the attribute *a* ∈ *C*, but *a* ∈/ *B*, the expression of the importance of the conditional attribute *a* relative to the conditional attribute set *B* for the decision attribute *D* is as follows:

$$\operatorname{Sign}(a, B, D) = \gamma\_{\mathbb{B} \cup \{a\}}(D) - \gamma\_{\mathbb{B}}(D) \tag{13}$$

In the same way, when the attributes in the conditional attribute set are greatly added, the size of the increased dependence of the decision attribute on the condition attribute can be used as the basis for the meaning of the attribute to the decision attribute.

In this paper, the attribute reduction in the neighborhood rough set was applied to the fault diagnosis of the shift hydraulic system, and the attribute reduction in the training sample set collected by the experiment was performed.

After calculation, the attribute importance values of the characteristic attributes are shown in Table 3. Obviously, after attribute reduction, mode T1 retains one characteristic attribute of *Xp*; mode T2 retains two characteristic attributes of *Xf* and *Xp*; mode T3 retains two feature attributes of *Xf* and *Xp*; mode T4 retains three feature attributes of *Xf*, *Sf* and *Xp* and mode T5 retains two feature attributes of *Sf* and *Xp*.

**Table 3.** Calculated results of attributes' importance values.


As can be seen from Table 3, among the six characteristic attributes of the five failure modes, after the attribute reduction, the attribute importance of the root mean square value of the pressure *Xp* exceeds 50%, which are common and indispensable characteristic attributes in the fault modes. That is to say, compared with the flow data, the pressure data has the greatest influence on the recognition of the fault mode of the shift hydraulic system. Therefore, on the premise of ensuring a high accuracy of fault diagnosis, it can be considered to use only the original pressure data of the shift hydraulic system as the input to train the convolutional neural network model, and then to obtain a fault diagnosis model with an ideal effect.

### *5.3. Fault Diagnosis Results Based on CNN and Neighborhood Rough Set*

Convolutional neural network training generally requires a large number of data sets, and the input datum is a two-dimensional image, so this paper uses the translation transformation method to amplify the data of 1000 original pressure data in each of the five modes obtained by the experiment. That is to say, in each mode, 400 data are continuously taken as a group for each of the 1000 data at intervals of five to generate a 20 × 20 size two-dimensional grayscale image in PNG format. After translation transformation, 120 grayscale images were generated in each fault mode, 80 groups were set as training samples and 40 groups were set as test samples using a random method. The two-dimensional grayscale image under the five fault modes is shown in Figure 11.

**Figure 11.** Two-dimensional gray image under five types of fault modes. (**a**) Normal mode T1, (**b**) spring fault T2, (**c**) seal ring fault T3, (**d**) blockage of oil passage T4 and (**e**) cavitation seal T5.

This paper used an eight-layer neural network, an input layer, three convolutional layers, two pooling layers, a fully connected layer and an output layer to establish a convolutional neural network model. We used Softmax as the classifier to obtain the classification results and fault diagnosis accuracy. The specific flowchart is shown in Figure 12.

In Figure 12, the fault signal is the oil pressure signal during the shifting process collected by the clutch branch oil pressure sensor. We transformed the oil pressure signal data to generate a two-dimensional gray image and adjusted the two-dimensional image using a size of 20 × 20 as the input training sample. After training the established convolutional neural network model, we inputted the test samples to obtain the sample classification

and its fault recognition rate via the Softmax classifier. The diagnosis results are shown in Figure 13.

**Figure 12.** Flowchart of fault diagnosis for shift hydraulic system.

**Figure 13.** Fault diagnosis effect diagram of shift hydraulic system.

It can be seen from Figure 13, compared with the shallow optimized BP neural network model, that the hydraulic system fault diagnosis model based on the convolutional neural network can make the average diagnosis accuracy rate of each failure mode reach 97.5% with few iterations. This not only shows that it is feasible to use the pressure data of the hydraulic system as the only input parameter to identify the fault mode of the HMCVT shift hydraulic system, but also it is proved that the convolutional neural network as a deep learning algorithm is significantly better than the shallow optimized BP neural algorithm in the application effect of fault diagnosis of the shift hydraulic system.
