*2.1. Gated Recurrent Unit*

The spatiotemporal feature fusion network extracts the temporal features of motor vibration signals through the variant (gated recurrent unit) of the recurrent neural network. A gated recurrent unit introduces a gating mechanism to improve recurrent neural networks. A GRU can selectively forget some unimportant information while memorizing the state of the previous moment. A GRU alleviates the gradient disappearance of recurrent neural networks and solves the problem of untimely update of network parameters. The GRU controls the input, output, and state information of the hidden layer by the update gate *zt* and the reset gate *rt*. The internal structure is shown in Figure 1.

The update gate *zt* takes the current moment *xt* and the previous moment information *ht*−<sup>1</sup> by the weighting operation. Then the value between [0, 1] is obtained by the sigmoid function The value controls the effect of historical information on the state of the hidden layer at the current moment. The equation is as follows

$$z\_{l} = \sigma(\mathsf{W}\_{l\:\!tz} \cdot [h\_{l-1}, \mathsf{x}\_{l}] + b\_{\mathsf{z}}) \tag{1}$$

where σ is the sigmoid function, *Wtz*, and *bz* are the weights, *ht*−<sup>1</sup> is the output at the previous moment, and *xt* is the input at the current moment.

The reset gate *rt* operates the current moment *xt* and the previous moment information *ht*−<sup>1</sup> with different weights, so that the model selectively forgets historical information that is irrelevant to the results. The equation is as follows

$$r\_l = \sigma(\mathcal{W}\_{tr} \cdot [h\_{l-1}, \mathbf{x}\_l] + b\_l). \tag{2}$$

The status of the node at this moment is

\$

$$\dot{M}\_t = \tanh(\mathcal{W} \cdot [r\_t \times h\_{t-1}, x\_t] + b). \tag{3}$$

The final output of the hidden layer *ht* is the sum of the information to be kept at the current moment and the information to be kept at the previous moment

$$h\_t = (1 - z\_t) \times h\_{t-1} + z\_t \times \hat{h}\_t. \tag{4}$$

### *2.2. GRU Temporal Module Based on Attention Mechanism*

The length of time series of motor vibration signals is much longer than the length of text in natural language processing. Although the GRU solves the problem of gradient disappearance in long sequence learning of recurrent neural networks, it still cannot retain all the key information when the time sequence is too long. Therefore, this paper not only selects the state output of the last moment of the GRU but also combines the state features of each moment of the GRU. Moreover, the attention mechanism is introduced to assign a weight coefficient to the output of the GRU at each moment. It makes the neural network pay attention to the data features of the output at different moments adaptively. The GRU temporal module based on the attention mechanism is shown in Figure 2.

**Figure 2.** GRU temporal module based on an attention mechanism.

During the analysis of the vibration sequence, the output state of the GRU at the final moment determines the result of the fault diagnosis. However, the states of other moments also have many positive effects on the performance of the network. Therefore, the network not only relies on the output of the final moment but also considers the states of each moment in a comprehensive manner. The vibration signal *Xt* is fed into the GRU, which captures the vibration characteristics of the signal at each moment. The GRU outputs the state *Gt* at each moment as

$$G\_l = GR\ell I(X\_l). \tag{5}$$

However, each momentary output of the GRU has a different degree of influence on the diagnosis results for different types of motor faults. Therefore, the states at each moment of the GRU are selected by the attention mechanism. The states with high relevance are kept and the states with low relevance are weakened. Then the weights of each moment state are obtained by the fully connected layer (FC) and sigmoid function. The weight parameters *w*1 are

$$w\_1 = \sigma(w(G\_l) + b). \tag{6}$$

Finally, the output of GRU at each moment is multiplied by the weight parameter to obtain the output result *O*

$$O = weight \times G\_{\text{f}} \tag{7}$$
