3.1. Bidirectional Long Short-Term Memory Network
The gating mechanism of LSTM enables it to store long-term memory, but the information contained in the last state of LSTM often lacks completeness [
24]. In one-way learning, LSTM has insufficient ability to learn the advanced information features and cannot effectively use the backward information, which affects the accuracy of the model. When diagnosing a main drive shaft bearing fault, both the condition of the parts before and after the failure is reflected in the subsequent vibration data, so the feature learning after the fault is also very necessary.
Based on the forward-learning LSTM, the Bidirectional Long Short-Term Memory Network (BiLSTM) adds the backward-learning LSTM [
25]. BiLSTM is a neural network that can recurse the past and future hidden layer information of the current state. By connecting the past and future fault state information, the network can not only improve the utilization of data, but also improve the accuracy of the model. The BiLSTM network structure is shown in
Figure 8.
The forward calculation of the BiLSTM network can be described as follows:
where
is the input at time
t;
is the input weight of the forward LSTM layer;
is the weight of the forward LSTM layer at time
t − 1;
is the bias of the forward LSTM layer;
is the adopted activation function; and
is the forward calculation hidden vector of the forward LSTM layer.
The backward calculation of the BiLSTM network can be described as follows:
where
is the input at time t;
is the input weight of the backward LSTM layer;
v is the weight of the backward LSTM layer at time
t − 1;
is the offset of the backward LSTM layer; and
is the forward calculation hidden vector of the backward LSTM layer.
The output expression for the hidden layer is as follows:
where
is the output of the hidden layer, which is synthesized from the output value
of the forward hidden layer at each moment and the output value
of the backward hidden layer at each moment.
3.2. Construction of the Fault Diagnosis Model for the Main Drive Shaft Bearing
A deep autoencoder (DAE) can learn the fault features in the sample of bearing vibration signals and restore the original signal through these features. After the DAE model training, the bearing vibration signal samples are dimensionally reconstructed according to the learning results as input information for the BiLSTM network. The BiLSTM learns the fault features in the vibration signal through the forward and backward hidden layers, so as to improve the diagnosis accuracy of the model. Using the vibration characteristics of the BiLSTM network before and after the occurrence of related faults, it can effectively learn the characteristic information of spindle-bearing vibration data and establish a reliable fault diagnosis model for early warning. In practical application, the model, on the one hand, can combine sensing data for real-time fault diagnosis; on the other hand, by identifying abnormal signals, it can infer that components may be in abnormal work or early fault state, to timely send out warning information and remind managers to pay attention to them.
The workflow of troubleshooting and warning of the main drive shaft bearing through the DAE-BiLSTM model is shown in
Figure 9. First, the DAE network is constructed to determine the coding dimension of the hidden layer. The data set is input into the DAE model for training through the encoder and the decoder. The effect of the model training is verified by comparing the changes of the reconstruction error curve. After the training, the encoder is retained, and the training sample is reconstructed for dimensions. Secondly, the BiLSTM network is built to set the parameters of the model, including the number of samples, batch size, and number of hidden layer neurons. The reconstructed data of DAE network encoder were input into the BiLSTM network for learning and training. After the training, the diagnostic effect of the model was evaluated with the validation set, the best parameters of the model were used to obtain the grid search method, and finally the fault diagnosis model of the key components of the escalator was obtained. The specific process is as follows:
Step 1: Obtain the training sample. Use the vibration sensor to collect the vibration information of the main drive shaft bearing of the escalator.
Step 2: Preprocess the input samples. The raw vibration signal samples were normalized, and the normalization formula was expressed as follows:
Step 3: Divide the data set. In the process of learning, the data set is usually divided into three parts: the training set, validation set, and test set. The training set is used for model setting, where gradient descent is applied to the training error to adjust the weights and biases of the network; the validation set is used to adjust the hyperparameters of the model to prevent oversetting and evaluate the model performance; and the test set is used to evaluate the diagnostic effect of the model. In this paper, the data set is divided into the training set, the verification set, and the test set, according to the proportion of 7:2:1.
Step 4: DAE model training. Enter the bearing vibration sample into the DAE model, set the parameters such as the model training number and hidden layer cell number, and initialize the weight matrix W and bias vector b. First, perform forward propagation to obtain the output value; then, update the weights and bias of the network through error backpropagation. After training the DAE model, the encoder part is retained, and the vibration signal sample data are reconstructed.
- 2.
BiLSTM Fault Diagnosing
Step 1: Input the hidden layer output (embedding) of the DAE network into the BiLSTM network. The BiLSTM network processes the dimensionally reconstructed vibration information from the main drive shaft bearing, using forward and backward layers to extract past and future hidden information. Specifically, the forward layer processes the information sequentially from the past to the present, while the backward layer processes the information in reverse, from the future to the present. Through this bidirectional processing, the network learns the patterns of different fault vibration signals, calculates the forward and backward outputs separately, and then sums them to produce the final output of the BiLSTM hidden layer.
Step 2: connect the Softmax network layer, obtain the prediction probability matrix according to the output of the hidden layer, and classify the fault vibration samples.
Step 3: by iteratively training and optimizing the network model using the gradient descent algorithm to minimize the cross-entropy loss between the actual and theoretical outputs, the best network parameters are selected, thereby enabling more accurate fault diagnosis of the main drive axis.
Step 4: test the data by inputting them into the model and use the cross-entropy loss as the evaluation metric to assess the diagnostic effect of the model.