1. Introduction
As a key component of rotating machinery, the safe operation of rolling bearings directly affects the operating efficiency of rotating machinery, and timely detection of the bearing faults has important practical significance to maintaining the operation of rotating machinery [
1].
Generally speaking, rolling bearing faults are mostly caused by surface defects of the inner ring, outer ring and rolling element. When the machine runs with these faults, there will be strong pulse signals, and these signals are used as important indicators to evaluate the degree of mechanical faults [
2]. Considering that the vibration signal is the carrier of fault information, the fault diagnosis based on the vibration signals has become an important approach, and fault diagnosis generally includes the steps signal processing, feature extraction and pattern recognition [
3]. The vibration signal fault signals are generally not very stable, which affects the extraction of the fault features [
4]. Therefore, the decomposition methods, such as empirical mode decomposition (EMD) and ensemble empirical mode decomposition (EEMD), have been widely used in the bearing signal processing [
5,
6], structural safety level assessment [
7] and high-speed railway grid fault identification [
8]. For example, in [
6], based on EEMD and singular value entropy theory, singular value entropy is utilized to effectively distinguish different bearing fault states. Although EMD and EEMD have achieved great performances in processing of fault signals, they also have various problems, such as modal aliasing, insufficient theoretical foundation, and inability to select modal component independently. To address these problems, the empirical wavelet transform (EWT) was proposed. The EWT algorithm combined the advantages of wavelet transform and EMD, the signal spectrum is adaptively segmented, and appropriate empirical wavelet filter banks are constructed according to Meyer wavelet construction method to extract empirical mode components (EWFs) with tight support, effectively avoiding mode aliasing and end effect [
9]. Because of its good adaptability and low computational complexity, EWT has been widely used in the field of fault diagnosis [
10,
11,
12,
13,
14]. In this paper, EWT is introduced to decompose the bearing fault signal, and reduce the influence of signal instability on feature extraction.
Feature extraction is a key step in the bearing fault diagnosis, and the quality of extracted features directly affects the performance of fault recognition [
15]. In the field of fault diagnosis, different entropies are often used as the features of the fault signals and wait to be extracted. For example, Liu et al. [
16] and Zair et al. [
17] used the sample entropy and fuzzy entropy as the features of vibration signals to effectively distinguish different classes of bearing faults. However, it takes a long time to process long time series when the sample entropy and fuzzy entropy are used [
18]. Compared with the above two entropies, the permutation entropy proposed by Bandt [
19] is widely used in fault diagnosis due to its simple computation [
20,
21]. However, the permutation entropy tends to ignore the difference between signal amplitudes, which tends to cause the loss of effective information. To overcome the shortcomings of the above entropies, Yang et al. proposed the attention entropy—a new measure of signal complexity [
22]. Different from the traditional entropy which focuses on the frequency distribution of all data in a time series, the attention entropy only focuses on the frequency distribution in the interval between peaks in the time series. Therefore, the attention entropy has the advantages of not need for parameter adjustment, fast computation speed, and strong robustness to the length of time series. This paper puts forward a feature extraction method combining EWT and the attention entropy, which uses EWT to decompose the vibration signal, and then extracts the attention entropy of the intrinsic mode function (IMF) as the feature vector.
Fault diagnosis is essentially pattern recognition. Utilizing the fault features, the traditional classifier algorithms, such as support vector machines (SVM) and artificial neural network (ANN), have been widely used in fault pattern recognition [
23,
24,
25,
26,
27]. However, both SVM and ANN are shallow neural networks, and their diagnostic accuracy is often unsatisfactory. Compared with SVM and ANN, the extreme learning machine (ELM) can provide better performances in terms of the training speed and generalization ability. To address the classification accuracy problem caused by random initialization of the ELM algorithm, Huang et al. [
28] proposed the kernel extreme learning machine (KELM), which uses kernel mapping to replace random mapping, thus effectively improving the performance of the ELM model. With the continuous development of deep learning, various deep learning models have been successively developed, and the classic models include deep auto-encoders (DAE), convolutional neural network (CNN) and deep belief network (DBN). Among them, DAE is an unsupervised feature learning model, which extracts the deep features of the input data by transforming the feature space of the input data [
29]. The deep extreme kernel learning machine (DKELM) is a deep neural network model based on KELM by combining the DAE model [
30]. Compared with the KELM model, DKELM can mine the feature information at a deeper level, so as to improve the accuracy of the model. Therefore, DKELM has been put into broad applications in the financial market prediction [
31], hyperspectral image classification [
32], multi-classification problems [
33], and water quality prediction [
34].
Even though the DKELM model can mine the deep features of the data, the settings of the hyperparameters, such as the number of hidden layer nodes, the regularization parameters of the hidden layer, the kernel parameters, and the kernel function penalty coefficient, lead to highly random accuracy of the DKELM model. To address this problem, this paper introduces the marine predators algorithm (MPA) [
35] to optimize the hyperparameters of the DKELM model, so as to achieve the adaptive changes in the parameters of the DKELM model, and significantly reduce the adjustment time of the DKELM parameters. By analyzing the fault diagnosis performances of the ADKELM model with different kernel functions and different hidden layers, the optimal ADKELM model is determined. The simulation results show that the ADKELM model proposed in this paper outperforms the DKELM model.
This paper proposes a bearing fault diagnosis method by combining the attention entropy and ADKELM model. First, to address the problem that high noise covers effective fault signals, the wavelet threshold denoising method is used to effectively eliminate the influence of noise. Second, EWT is used to decompose the noise reduction signal into IMF of different frequencies, and the impulse values of different IMF are captured through the attention entropy. At the same time, the mentioned attention entropy is used as the feature vector. Then, the MPA algorithm is introduced to optimize the hyperparameters of the DKELM model, so as to realize the adaptive adjustment of the parameters of the DKELM model. Finally, with the efficient fault feature capture capability of AE and powerful recognition performance of ADKELM, accurate recognition of rolling bearing faults can be realized. The results of simulation experiments show that the proposed method can achieves great diagnosis accuracy.
Section 2 mainly introduces the basic principle of the algorithm.
Section 3 describes the simulation experiments. Finally,
Section 4 draws the conclusions.
2. Fault Diagnosis Method of EWT-AE-ADKELM
2.1. Empirical Wavelet Transform (EWT)
Gilles [
9] proposed the EWT based on the wavelet analysis framework. Reasonable segmentation of the signal spectrum is critical to EWT. A set of wavelet filters can be constructed to further extract different AM-FM signals in the signal. Suppose that the signal Fourier support [0, π] is divided into
N continuous parts, and the midpoint
(
,
) between the adjacent maximum points is used as the boundary of the segments. Then, the
n-th segment can be expressed as:
where
represents the
n-th segment frequency band.
Based on the
segmented, a band-pass filter is constructed. According to the construction idea of Meyer wavelet, the empirical wavelet function
and the empirical scale function
can be obtained by Equations (2) and (3):
where
,
;
. In Equation (3),
.
According to the construction idea of wavelet transform, the detail coefficient of the empirical wavelet transform of signal
can be obtained by Equation (4):
where
represents the detail coefficient;
stands for the inner product;
represents the inverse Fourier transform;
is the Fourier transform of
;
is the complex conjugate function of
.
The approximate coefficient calculation method of the empirical wavelet transform is as follows:
where
is the approximate coefficient;
represents the Fourier transform of
;
is the complex conjugate function of
.
The reconstructed signal of the original signal
is as follows:
where
represents the convolution symbol. According to Equation (6), the empirical mode
can be obtained by the empirical wavelet decomposition:
2.2. Attention Entropy (AE)
Unlike the traditional entropy which analyzes the frequency distribution of all signal points in the time series, the attention entropy effectively distinguish different time series signals by analyzing the frequency distribution in the interval between key data points in the signal [
22]. Therefore, this paper uses the empirical mode components obtained by the EWT decomposition, calculates the attention entropy of the component and uses it as the feature vector. The attention entropy generally consists of the following three steps:
(1) Defined the key patterns.
(2) Calculated the interval between two adjacent key patterns.
(3) Calculated the Shannon entropy of the interval.
Each point in the vibration signal can be regarded as a state of the system. The peak points represent the local upper and lower limits of the state, which make them the potential key modes. Therefore, the peak is defined as the potential critical mode point. Based on the following four different strategies: the interval from local maximum to local maximum, the interval from local maximum to local minimum, the interval from local minimum to local maximum, and the interval from local minimum to local minimum, the entropy of the interval distribution of key mode points is calculated using the Shannon entropy formula.
where
represents the Shannon entropy of the signal;
represents the probability of interval distribution, and its expression is
;
represents the number of intervals between the
key modes. Finally, the average value of the entropies under the four strategies is obtained and used as the attention entropy.
In Equation (9), represents the attention entropy of the signal; represents the Shannon entropy of the signal under the strategy.
2.3. Adaptive Deep Extreme Kernel Learning Machine (ADKELM)
2.3.1. Extreme Learning Machine (ELM)
As a feedforward neural network, the extreme learning machine is different from the BP neural network which needs to repeatedly adjust the weights and biases of the input layer and the hidden layer. ELM randomly selects the weights and biases of the input layer and the hidden layer, and according to the principle of least squares, the weight between the hidden layer and the output layer is directly determined. The sample dataset
is input into the ELM, where
is the input vector, and
is the output vector,
N is the number of samples. The ELM output can be obtained by Equation (10):
where
is output weight of the
hidden node;
represents the activation function;
is input weight of the
hidden node;
is input bias of the
hidden node. The training objective of the ELM algorithm is to minimize the error between the actual output and the expected output.
Rewrite Equation (11) into the matrix representation:
In Equation (12), , and represents the output matrix of the hidden layer node; the output weight matrix is ; the expected output matrix is .
By solving Equation (12), the output weight matrix
can be obtained as:
where
is the Moore–Penrose generalized inverse of the output matrix
. To improve the generalization ability of ELM, the regularization parameter
is introduced, and Equation (13) is rewritten to obtain a new
:
where
represents the identity matrix. The output weight
can be obtained by Equation (13). Input the new sample dataset into the ELM network to obtain:
where
is the actual output value of the new sample dataset
;
is the random mapping matrix of the hidden layer.
2.3.2. Autoencoder-Extreme Learning Machine (AE-ELM)
As an unsupervised learning algorithm, the auto-encoder maps the input features to the hidden layer through the encoder, and then uses the decoder to reconstruct the feature vector. In this way, the features of the data can be effectively learned through the encoding and decoding processes. According to Equation (16), the AE-ELM model randomly generates orthogonal input weights and biases:
The input sample dataset
X is mapped to the hidden layer through encoding, and the output matrix can be calculated by
:
The transposed matrix
of
is used as the input weight of the next layer, and
is the input matrix of the next network layer.
Figure 1 shows the basic network structure of AE-ELM.
2.3.3. Deep Extreme Kernel Learning Machine (DKELM)
The DK-ELM model is a deep model based on KELM by combining multiple AE-ELM models, and a neural network composed of multiple hidden layers is constructed by stacking multiple AE-ELM models. The DK-ELM model can efficiently extract the effective features of the samples by mapping the initial features to the new feature space. Moreover, to increase the stability of the samples, a kernel function is introduced to eliminate the shortcomings of random mapping. As shown in
Figure 2, the principle of DK-ELM is as follows:
(1) Input the original feature matrix
, and calculate the output matrix
and reconstruction matrix
of the first hidden layer according to
Section 3.2.
(2) is used as the input of the second hidden layer, the transposed matrix of is used as the output matrix of the second hidden layer, and calculate according to Equation (17).
(3) Repeat the operation in Step 2: use as the input of the ( + 1)-th hidden layer, and calculate the output matrix and reconstruction matrix of the ( + 1)-th hidden layer.
(4) Map the feature samples processed by the AE-ELM model using the kernel function to obtain the kernel matrix.
where
and
are the
and
input samples;
is the kernel function, and we select radial basis function (RBF), polynomial kernel function (POLY) and wavelet kernel function (WAVE) as kernel function. The expression of RBF is
, and the expression of POLY is
, the expression of WAVE is
.
,
,
and
is parameters of different kernel functions.
(5) Combining the kernel function, calculate the last output weight
and the last output
according to Equation (19):
2.4. Marine Predators Algorithm (MPA)
Inspired by the optimal predation strategies (Levy flight and Brownian motion) of marine predators and the speed change strategy after predator and prey encounter, Faramarzi developed the marine predators’ algorithm (MPA) by simulating the predation process of marine predators and considering the influence of eddy currents. The specific process of MPA is as follows:
(1) Set the population number pop and the maximum iterations M_Iter.
(2) Initialize the population parameters, calculate the fitness value of the population, and find the individual Xopt in the population under the optimal fitness value.
(3) If the current iteration number
satisfies
, update the locations of individuals in the population according to Equation (20); otherwise, go to Step 4.
where
is the step size;
is a vector containing random numbers based on normal distribution, which is used to represent the Brownian motion;
is the Kronecker operator;
is a random vector within the range of (0, 1).
(4) If the current iteration number
t satisfies
, the update method of population particles divided into two parts. For the first half of the population, the locations of population particles should be updated according to Equation (21); for the remaining population, the locations should be updated according to Equation (22); otherwise, go to Step 5.
where
is a random vector representing the Levy flight.
where
is a dynamic parameter that controls the step size.
(5) If the current iteration number
t satisfies
, update the locations of the population particles according to Equation (23); otherwise, go to Step 6.
(6) Output the best location Xbest of the population particle.
2.5. The Proposed Method
Combining the auto-encoder and KELM, DK-ELM can obtain more effective feature samples by transforming the sample feature space, thus improving the accuracy of KELM. However, in the DK-ELM model, the number of nodes at each hidden layer, the regularization parameters of the AE-ELM model, the kernel parameters and penalty coefficient of the KELM at the top layer all affect the performances of the DK-ELM model. If the parameters are set based experiences, it will not only take a lot of time, but the best combination of parameters may not even be found. To address this problem, this paper introduces the MPA algorithm to optimize the parameters of the DK-ELM model, which can achieve adaptive selection of the DK-ELM parameters, and effectively improve the performances of the DK-ELM.
As shown in
Figure 3, the main steps of the proposed method are as follows:
(1) The bearing vibration signals are denoised by the wavelet threshold denoising method.
(2) Utilize EWT to decompose the bearing signal into different empirical modal components, and extract the attention entropy of the components as the feature samples of ADK-ELM.
(3) Divide the feature samples into the training set, test set and validation set according to the ratio of 3:1:1.
(4) Set the misjudgment rate of the validation set as the fitness function; select the number of hidden layers and the kernel function of the DK-ELM model; set the upper and lower limits of the parameters, in which process, the node number at the hidden layer should be within the range of (0, 100), the regularization parameters and penalty coefficient of the AE-ELM model should be within the range of (10−3, 103), and the kernel parameters should be within the range of (10−7, 10−3).
(5) In the MPA algorithm, the population number of particle is 20, and the maximum number of iterations is 50.
(6) According to the rules of MPA population optimization described in
Section 2.4, find the optimal parameter combination.
(7) Substitute the optimal parameters into the DK-ELM model to obtain the output of the test set.
4. Conclusions
To address the difficulty of early fault diagnosis of rolling bearings, this paper proposes a bearing fault diagnosis model combining the IMF attention entropy and the adaptive deep kernel extreme learning machine. First, the wavelet threshold denoising method is adopted to effectively eliminate the noise in the vibration signal. Second, the denoised signal is decomposed by EWT, and the attention entropies of the IMF components are extracted and used as the feature vectors. Then, to address the difficulty to determine the parameters of DK-ELM, the MPA algorithm is employed for adaptive setting of the parameters of the DK-ELM model. Finally, the ADK-ELM model is used to achieve effective recognition of the bearing faults. The main conclusions of this paper are as follows:
(1) As the traditional entropy is extremely sensitive to the parameters, this paper introduces the attention entropy for feature extraction of the IMF components. The simulation results show that the attention entropy can effectively distinguish various fault signals.
(2) The MPA optimization algorithm is used to optimize the node number at the hidden layers of the DK-ELM model, the regularization parameters of the AE-ELM, the kernel parameters of the kernel function and the penalty coefficient, which can achieve adaptive adjustment of the parameters of the DK-ELM model.
(3) The diagnostic performances of the ADK-ELM model with different kernel functions and different hidden layers are investigated. The analysis of simulation results shows that AD-RBF-ELM model achieved the best diagnostic performance with four hidden layers.