1. Introduction
Bearing reliability evaluation and remaining useful life (RUL) prediction have received extensive attention due to the increasingly extreme working conditions of the entire system [
1,
2]. Generally, bearing life follows a specific statistical distribution that can be inferred from a large number of life data samples. However, due to the influence of assembly errors, material defects, and load fluctuations, in practice, bearing life has strong randomness. Studies carried out using public bearing life datasets can be taken as examples. In the accelerated degradation datasets of bearings collected by FEMTO-ST on the PRONOSTIA experimental platform, the RUL of the test bearing under specific load and speed conditions falls within the range of 2–7 h [
3]. The results of the datasets provided by the BPC system from Sumyoung Technology Co. are similar, where the life spans of the test bearings are between 30 min and 8 hours [
4]. In addition to the accelerated degradation experiments mentioned above, the RULs obtained from the natural degradation experiments are also different. For example, in the life test of the Rexnord ZA-2115 double-row bearing carried out on the durability test rig by NASA, the results are between seven days and one month [
5]. Therefore, it is highly challenging to predict bearing RUL from a statistical point of view accurately. For this reason, in engineering practice, research focus has shifted to the study of the RUL of individual bearing, considering its actual working condition. Kundu [
6] predicted the bearing RUL by establishing a Weibull proportional regression model based on the monitored signals of the PRONOSTIA platform. By combining the respective advantages of long- and short-term memory (LSTM) and statistical process analysis, Liu [
7] proposed a new network to predict the bearing RUL using the datasets released by NASA and FEMTO-ST. Huang [
8] introduced the transfer learning method and constructed a transfer depth-wise separable convolution recurrent network to predict the bearing RUL from the same public datasets considering different work conditions.
Various mathematical and physical models were successfully applied to the prediction of bearing RUL. However, an overview of the aforementioned studies indicates that the actual degree of damage to the test bearing was not considered in the studies using the public bearing datasets; only the physical signals, e.g., acceleration and temperature signals, monitored in the bearing life test were studied. Take the FEMTO dataset as an example, and it did not provide damage sizes of the test bearings. The termination criterion was only created by the acceleration signal exceeding the threshold. Similarly, the same criterion was defined as the end of bearing life in many other representative studies [
7,
8,
9,
10]. However, it was found in our experiments that under the same operating conditions, even if the same acceleration threshold is used as the test termination condition, the final damage degree, namely the crack length in the spall area, can be remarkably different.
Figure 1 shows the damage to four test bearings under the same operating condition, for which the tests were stopped based on the same acceleration threshold. It was demonstrated that even bearings with the same damage exhibited utterly different degrees of damage under similar acceleration levels. Therefore, under different application scenarios, the construction of health indicators and the development of life prediction methods need to be explicitly designed.
Among the conventional health indicators [
11,
12], the root mean square (RMS) is the most commonly used, although it has the disadvantage of strong fluctuations. Additionally, the relative root mean square (RRMS) was adopted in the literature to reduce the negative impact of the fluctuations on the calculation results [
9]. In addition to the RMS, Zhong [
13] used a nonparametric health index for machine condition monitoring by combining envelope spectrum and statistic model, and principal component analysis (PCA) was used by Kundu [
6] for dimensionality reduction of feature vectors to extract those features with better monotonicity. Xu [
14] purposed a new health indicator called moving average cross-correlation of the power spectral density (MACPSD) to predict the health state of ball bearing. However, the problem with these health indicators in practice is that their calculated values are not directly related to the actual damage degree of the bearing, as shown by the crack length of the spall area in
Figure 1.
In RUL prediction, researchers have proposed numerous prediction methods, which can be mainly divided into the model-driven method, data-driven method, and hybrid method. Model-driven methods, such as the Paris-Erdogan method [
15], can accurately predict the bearing RUL under different working conditions. Still, the accuracy depends on establishing a practical model based on the physical system. The data-driven method utilizes statistical models or artificial intelligence (AI) models to predict the RUL, where the statistical model is often referred to as an empirical model-based method. For example, Kumar [
16] combined the Kullback–Leibler divergence and Gaussian process regression to predict the RUL. Li [
11] and Qiu [
17] presented a stochastic process to predict the RUL. Xing [
18] and Li [
19] proposed a mixed Gauss model–hidden Markov model (GM-HMM) to predict the RUL of wind turbine bearings. Other methods include AI mathematical models such as the support vector machine [
9] and artificial neural network [
20], which can handle complex systems problems without any prior knowledge. However, data-driven methods can only manage the targeted work condition and exhibit no universal adaptability. On the other hand, the hybrid method combines the advantages of the model-driven method and the data-driven method; techniques such as the Kalman filter [
21] and particle filter [
22] belong to this type. Although many works were proposed to study the problem of bearing RUL prediction, few relevant studies have considered detecting accurate damage occurrence time and end of lifetime, which are crucial for prediction outputs. Antoni [
23] used entropic evidence to detect the initial faults in rotating machinery. Chegini [
24] purposed ensemble empirical model decomposition and wavelet packet decomposition to detect the initial faults in rotating machinery, and the Nirwan [
25] used the acoustic emission to detect faults. Although such mentioned works provided relatively accurate detection results, the detection models or the extracted features originated from extensive calculations. It is not easy to implement such methods under the requirement of real-time response, mainly when the prediction of bearing RUL is an ongoing process.
To address the problem of detecting initial bearing damage, an adaptive envelope analysis method was employed in
Section 2. Subsequently, in
Section 3, an improved phase-space warping (PSW) algorithm was proposed to construct the bearing health indicator, followed by the indicator normalization using the hidden Markov model regression (HMMR) introduced in
Section 4. The test bearing datasets verified the capability of the normalized health indicator to reasonably reflect the actual degree of bearing damage. In
Section 5, the health indicator threshold, indicating the true damage extent, was defined as the end of bearing lifetime, which allows the proposed encoder-decoder long-short term memory (LSTM) model with an attention mechanism to predict the bearing RUL. The results through analyzing the experimental data of bearing life proved the effectiveness of the proposed method, see
Section 6.
3. Construction of Health Indicator
The amplitudes of BPFs in the envelope spectrum can help determine the occurrence time of bearing initial damage. Nevertheless, it is not a good option as a health indicator to predict the RUL because of its poor monotonicity against the bearing degradation process. Therefore, once the initial damage is detected, a new bearing health indicator needs to be established to predict the bearing RUL.
Previous studies often use time-domain statistical features, e.g., RMS, kurtosis, and peak-to-peak value, as health indicators in industrial applications. In contrast, applicational cases related to frequency-domain features, e.g., wavelet transform and Hilbert-Huang Transform, were presented in the references until recent years. However, due to the introduction of pre-defined parameters in the mathematical transformation, its applications in industrial fields still have limitations. Time-domain features are still the most commonly used health indicators. The calculation formulas of RMS and kurtosis are as follows:
where
xi is the raw data and
n is the number of data points used to calculate the feature value.
μx and
σx are respectively the mean value and standard deviation of the raw data.
However, owing to the complexity of actual physical systems, it is challenging to accurately express the service status of the bearing with the above features. Their respective trends in
Figure 6 shows statistical characteristics such as RMS and kurtosis. The RMS exhibits evident fluctuations, while the kurtosis would decrease as the number of defect-induced impacts increases when the bearing damage becomes more severe. These poor monotonic features are likely to cause misjudgment of bearing life in engineering applications. Furthermore, for features extracted based on AI methods, it is challenging to explain the physical meaning of such features. Most importantly, the above-mentioned features are incapable of reflecting the crack length of the bearing spall area shown in
Figure 1. As shown by a group of experimental results listed in
Table 2, when the experiment ended, the RMS and kurtosis after scale standardization (the standardization method is introduced in
Section 4) cannot effectively reflect the crack length in the bearing (+represents multiple spall areas, as shown by
Figure 1). Therefore, to overcome the afore-described challenges, this paper proposes to use the PSW algorithm as the feature extraction method.
3.1. Phase Space Warping (PSW) Theory
PSW is an effective method for evaluating the damage of multi-layer dynamical systems, which was first proposed by Chelidze [
31]. From the perspective of dynamic theory, the damage evolution of the entire rotor system can be regarded as a high-order system composed of a “fast time” scale and a “slow time” scale. Its definition is as follows:
In Equation (7),
is the “fast time” scale variable, which can be measured directly.
is the “slow time” scale variable, which can directly reflect the damage state of the entire system but cannot be measured directly.
and
are the “fast time” and “slow time” scaling functions, respectively.
is a function of the variable
,
t represents time, and
is a constant parameter that defines the damage rate of the system. The response states of the bearing system at the initial time point
t0 and after operating for a certain period
tp can be expressed as
Suppose that the bearing system always maintains its initial state without experiencing any change or damage, then the response of the bearing system can be calculated using Equation (9).
Therefore, with the initial operating state of the bearing system as the reference state, there is
. Then, the damage state of the bearing system (or damage tracking) can be expressed as follows:
In Ref. [
31], after performing the Taylor expansion, the bearing system’s damage state can be finally expressed as
where
represents higher-order infinitesimal. In this paper, the raw acceleration signal of the bearing is regarded as an observable “fast time” scale variable, while the damage state of the bearing system is regarded as a “slow time” scale variable.
Typically, to calculate the damage state of the bearing, the phase space reconstruction theory based on the Takens embedding theorem would be introduced, and the damage state of the bearing would be quantified on this basis. The phase space reconstruction is mathematically expressed as follows:
where
is the reference initial state of the phase space of the bearing system,
n is the number of vectors in the phase space, and
N is the total number of observation data points.
and
D represent the time delay and embedding dimension of the phase space, which can be calculated using the mutual information [
32] method and Cao’s method [
33]. Theoretically, the unknown mapping between the reconstruction vector
in the reference phase space and that of the next step
on the “slow time” scale—that is, under the reference state of the bearing
—can be expressed as
In engineering, linear regression is the most straightforward and generic model for establishing the mapping between reconstruction vectors
and
in the reference phase space, as shown by Equation (14).
where
is the model parameter, and
is the combination of the input phase space and the identity matrix
Based on the simple linear regression model, the parameter matrix
An can be calculated as follows:
where
is the vector combination of the
I nearest neighbors of the reconstruction vector
yn in the reference phase space, and
.
Yn+1 is the target response, that is, the combination of the next step vectors of the
I nearest neighbor vectors. However, in actual scenarios, the “slow time” scale parameter (i.e., the operating state of the bearing) is in a state of continuous degradation. Thus, after degradation for time
tp, the mapping between each phase space point and its next step would be changed, and the true mapping can no longer be expressed by Equation (13) but can be obtained using Equation (17).
Assuming that during the bearing operation, there is no damage to its operating state, and the mapping relationship is not changed after the elapse of time
tp. In this case, the theoretical reconstruction vector of the next step
can be expressed as follows:
Based on the above idea, the damage “trajectory” during bearing operation after the elapse of time
tp can be obtained as follows:
3.2. Improved PSW Algorithm
Section 3.1 introduced the relevant model of the original PSW. However, the problem with this model in actual engineering is that the mapping relationship between the reconstruction vectors
and
does not follow a simple linear mapping relationship but exhibits a certain degree of nonlinearity. Therefore, a PSW mapping model that takes into account actual nonlinear factors is proposed in this section.
Ref. [
34] proposed the random vector functional-link net (FLNet), where the input and output layers are directly connected without an activation function. At the same time, an enhanced pattern is employed as an alternative to the nonlinear activation function. The structure of FLNet used for phase space nonlinear mapping in this paper is illustrated in
Figure 7.
is the randomly input activation function, representing the nonlinear part of the model, such as the sigmoid function and so on.
is a parameter generated randomly from the 0–1 continuous uniform distribution to generate nonlinear effects in combination with the activation function, which is called the enhancement node, and
m is the number of enhancement nodes. Compared with conventional neural network structures with nonlinear relationships, the structure adopted herein not only provides nonlinear characteristics that are not originally available for the mapping between phase space reconstruction vectors but is also more in line with engineering reality. At the same time, under the framework of this structure, the regression parameters can be directly calculated by linear regression formulas, which can eliminate calculations in neural networks that need to use gradient descent methods, thus exhibiting the superior advantage of small calculation amounts. In summary, when applying the phase space reconstruction function, the mapping between
and
can be rewritten as
where
is a random parameter matrix generated from the continuous uniform distribution, which is composed of
;
is the combination of the reconstruction vector and the enhancement node. In summary, by rewriting Equation (16) using Equation (20), the following can be obtained:
By combining Equations (19)–(21),
Then, according to the Ref. [
31], the final quantified damage state of the bearing in the phase space can be calculated using Equation (23).
where
is the weight function.
is the Euclidean distance between the current reconstruction vector
and the one that has the farthest Euclidean distance in the space composed of the
I nearest neighbor vectors.
M is the correlation dimension of the reference phase space. Finally, with the sample Bearing 1-1 as an example, the improved PSW results are shown in
Figure 8. Compared with kurtosis and RMS, it exhibits significantly smaller fluctuations and better monotonicity, so it is more appropriate to serve as the health indicator for the bearing.
In Refs. [
35,
36], several other features shown in
Table 3 are used for the prediction of bearing RUL. However, the prediction task begins after the occurrence of bearing initial damage. Hence, we select the monotonicity of the features after the bearing initial damage as the evaluation criterion. For a comparison purpose, the result from the PSW method is presented together with those from other features, as shown in
Figure 9.
4. Feature Normalization
The improved PSW can better describe the damage state of the bearing from the perspective of the bearing system, but there are still certain problems with PSW for different bearings.
Figure 10a shows the full life PSW data of five identical bearings under the same working conditions. There are relatively large initial fluctuations in the PSW values of different bearings. Similarly, the traditional RMS also faces the same problem. Therefore, it is necessary to develop a numerical standardization method to unify the quantitative standard of the bearing health indicator.
This study utilized a method proposed in the Ref. [
9], which used relative values to unify and quantify health indicators. In this method, the relative PSW value is calculated as follows:
where
h1 and
h2 are the starting and ending point of the steady phase during the full life of the bearing. In Ref. [
9], these two points were selected empirically, which is not a scientifically effective approach. In this paper, the ending point
h2 of the steady phase of the bearing is defined as the point of occurrence of minor faults; that is, the starting point of the fault obtained by the adaptive envelope analysis method described in
Section 2. As for the starting point of the steady phase,
Figure 10 indicates that once the bearing starts to operate, there is a certain “shift” in the bearing system state at the initial phase. However, this de facto “shift” is not damage; therefore, it is unreasonable to use the bearing’s starting point as the starting point of its steady phase. Therefore, this paper proposes to employ HMMR and use the PSW data before the fault point as its good input to perform unsupervised classification, thereby automatically dividing this segment of data into the initial phase and the steady phase.
4.1. Hidden Markov Theory
The HMM model is a model for nested stochastic processes, which contains two states: one is the invisible hidden state
, and the other is the explicit state
based on the hidden state, as shown in
Figure 11.
Q and
K are the number of hidden and explicit states, respectively.
The evolution of this model is generally described by the transition probability matrix
A, which can be expressed as
, where
i and
j, respectively, represent the
ith and
jth hidden state, and
represents the transition probability from hidden state
i to
j. Under the current hidden state
, the probability of showing the explicit state
is expressed by the emission matrix
. The initial probability of the stochastic process is expressed by
. In summary, the parameters contained in a complete HMM model are given as
4.2. HMMR-Based Normalization
In this study, after extracting the healthy data of the bearing, it is assumed that the healthy phase of the bearing can be divided into two phases—namely the initial running-in stage and the steady stage—and regarded as the hidden states of the HMM model [
37]; moreover, this is an irreversible process. Thus, the transition matrix
A can be expressed as
Figure 12 indicates that in the initial running-in stage, the bearing’s PSW value is in the form of an increasing cubic spline, while in the steady phase, it tends to be a stable straight line. Therefore, the explicit state is treated as a continuous Gaussian random distribution based on a polynomial regression model in this paper, rather than a traditional discrete emission matrix. This way, the traditional HMM model is modified into an HMMR model, where the running-in phase is a cubic spline Gaussian regression model, and the steady phase is a linear Gaussian regression model, as shown by Equation (28).
where
Uj is the regression coefficient of the
-dimensional
lth order function under the
jth hidden state.
is the regression input (i.e., time input) under the
jth hidden state.
is the Gaussian distribution with zero mean and one standard deviation,
is the standard deviation of the regression model under the
jth hidden state. Therefore, the parameters for the HMMR model in this paper are rewritten as follows:
Finally, according to the maximum likelihood estimation algorithm, the model parameters can be obtained as
According to the method proposed in the Ref. [
38], Equation (30) can be solved by the expectation–maximization (EM) algorithm. Its results are shown in
Figure 13, where the service state of the healthy bearing is automatically divided into the initial running-in phase and the steady operating phase. The end of the initial running-in phase—that is, the starting point of the steady operation phase—will be finally regarded as the theoretical point
h1 in Equation (25).
Figure 10b shows the standardized result of the PSW data in
Figure 10a obtained by the above method.
Figure 1 illustrates the crack length in the spall area to represent the damage degree of a specific damaged bearing. Multiple spall areas are also shown in
Figure 1 and are represented as the identifier ‘+’ in
Table 2. Damage related standardized PSW value is given in the same column of
Table 2, together with RMS and kurtosis. Compared with the traditional RMS and kurtosis, an obvious positive correlation can be observed between the standardized PSW value and the crack length.
7. Conclusions
Most extant studies on the RUL prediction of rolling bearings have limited themselves to online public data sets, which do not consider the actual damage degree of the bearing and also rarely consider the starting and ending points for the bearing RUL prediction or the damage type of the bearing. Compared with the traditional health indicators used for life prediction, the standardized PSW indicator based on HMMR and improved PSW proposed in this paper can effectively evaluate the actual damage size in the bearing. Moreover, it was experimentally verified that with this indicator as input, the encoder–decoder neural network that combines the attention mechanism (which takes into account the bearing damage type and working conditions) and the BILSTM model can effectively improve the prediction accuracy for bearing RUL, thereby generating certain engineering value.
The main contributions of this paper are summarized as follows:
Instead of using the traditional statistical life distribution to predict the bearing life, the damage starting time point for bearing life prediction is accurately defined from the perspective of the characteristic frequency of the rolling bearing through the adaptive envelope analysis method.
A health indicator that can effectively reflect the actual damage of the bearing is obtained by combining the HMMR model and the PSW model.
Compared with the traditional LSTM model, the encoder–decoder LSTM model proposed in this paper considers the timing signal characteristic of the health indicator. Not only are the BILSTM model and attention mechanism introduced into the model, but also the operating conditions of the bearing are considered as the LSTM hidden state input for the calculation, thereby improving the prediction accuracy for bearing RUL.