1. Introduction
Rolling element bearings are widely used in rotary machines, such as drills, electric motors, wind turbines, and turbofan engines. Therefore, bearing failure may lead to abrupt shut-downs, costly losses, and even catastrophic accidents [
1]. Recently, the precision and complexity of modern machinery and equipment are constantly improving. While the prognostics and health management are facing huge challenges, fault diagnosis and performance degradation assessment (PDA) are receiving more and more attention [
2,
3]. PDA focus on the changes in the equipment health status during the entire service process, rather than just discovering whether the equipment malfunctions. Effective PDA results are the basis for further accurate remaining useful life (RUL) prediction, which can largely ensure the safety and reliability of the equipment in the process of operation and reduce the cost of equipment maintenance [
4,
5]. Therefore, the research related to bearing PDA has been widely concerned [
6,
7,
8].
Feature extraction and selection, as an important procedure in bearing PDA, directly affects the final PDA results. Many common features have been applied, including time-domain features such as root mean square (RMS) and kurtosis, frequency-domain features such as frequency kurtosis, and time-frequency domain features obtained by wavelet packet decomposition, empirical mode decomposition, and variational mode decomposition as well as other features such as those based on mathematical morphology particles [
9]. Meanwhile, with the development of deep learning, many deep network architectures are introduced for feature extraction and health indicator construction, such as long short-term memory [
10] and convolutional neural networks [
11]. In order to solve the problem that bearing vibration signals are susceptible to serious interference, He et al. [
12] proposed multi-resolution singular value decomposition and a long short-term memory-network-based bearing PDA method. Xu et al. [
13] used a moving window-based stacked auto-encoder with an exponential function to construct a smooth degradation curve. The deep-learning-based methods can realize end-to-end deep feature extraction without human intervention automatically. However, time-consuming and unclear physical meaning are the common shortcomings of deep learning.
With the rapid development of signal processing and feature extraction technique, the high-dimension features containing abundant fault and degradation information can be available. In this case, how to extract or select features with high degradation information and low redundancy becomes the key issue. Typical metrics such as monotonicity, robustness, trendability, and consistency are often used for feature evaluation. Niu et al. [
14] utilized the rank mutual information criterion to measure the nonlinear correlation between feature and time. Chen et al. [
15] calculated the mixed scores of three evaluation indicators and utilized a variant correlation-based feature selection method to determine the number of optimal features. Methods based on mixed evaluation indicators can more comprehensively evaluate features without doubt, but the parameters of mixed evaluation metrics commonly rely on manual experience adjustment, which limits its application. Besides, the whole lifetime of bearings contains several different degradation stages. Taking the evolution process of bearing wear fault as an example, this characteristic of bearing degradation process can be shown directly in
Figure 1 provided by the literature [
16]. The surface roughness becomes smoother in the running-in stage, and the steady-state stage is accompanied by uniform lubricating film and contact mechanics. In the third stage, microcracks initiate and open on and under the surface. Then, the microcracks gradually expand to produce secondary cracks or separation. The evolution mechanism of each degradation stage is different, which leads to different vibration characteristics. Therefore, the evaluation of degradation features can not only be estimated from the overall development trend but also considering the local structure of feature curves. However, in the existing literature, this factor is rarely considered in the evaluation metrics for degradation feature selection.
To solve this problem, a systematic degradation feature selection method based on spectral clustering and rank mutual information is proposed in this article. As a typical clustering algorithm, the spectral clustering algorithm can find clusters at any space and converge to the global optimal solution [
17]. It is often used in fault diagnosis [
18] and fault state recognition [
19] but rarely in feature clustering. In this study, spectral clustering was used as a preprocessing step of feature selection to cluster features with similar degradation curves. Then, the optimal feature set was constructed by selecting the optimal feature from each cluster. Feature selection based on these two steps can not only reduce the information redundancy caused by similar feature curves in the feature set but can also ensure the diversity of different degradation curves in the feature set. In particular, how to evaluate the sensitivity of different feature curves in each cluster is the key procedure. In order to better evaluate features, rank mutual information (RMI) was introduced in this study, which is more suitable for trendability estimation of long time series. This feature selection based on these two steps is described in detail in the following sections.
After feature extraction and selection, the final PDA needs to fuse these selected multi-dimension features to build a health indicator [
20]. Tse et al. [
21] used principal component analysis (PCA) to construct a health indicator for impeller PDA. Guo et al. [
10] input the feature set formed by integrating similar features and classical time-frequency features into a recurrent neural network (RNN) to construct RNN-HI. Different from deep learning, the hidden Markov model (HMM) infers the hidden state change through the observations, which is more suitable for PDA. HMM has been widely used in recent years due to the advantages of high accuracy in small samples and with clear physical meaning. Ocak et al. [
22] used HMM for bearing fault detection and diagnosis for the first time. Yu et al. [
23] proposed a machine health degradation assessment method based on HMM and contribution analysis. Li et al. [
24] used the time-dependent state transition probability matrix with degradation characteristics to obtain the HMM reliability curve and realized the reliability evaluation of wind turbine components based on small sample data. Li et al. [
25] established a hazard model describing the time-varying and conditional adaptive state transition probability to estimate the wear state of the tool. Yu et al. [
26] proposed an adaptive-learning-based method for machine faulty detection and health degradation monitoring, which provide a useful guide for developing a condition-based maintenance system. Jiang et al. [
27] combined Student’s t-HMM with nuisance attribute projection to construct a robust PDA model, which shows more tolerance to outliers than conventional HMMs. In this study, Student’s t-HMM was utilized to construct a health indicator based on the selected feature sets and to assess the degradation process.
The rest of the article is structured as follows. The related theoretical backgrounds of spectral clustering, rank mutual information, and Student’s t-HMM are introduced in
Section 2. In
Section 3, the whole procedures of the proposed bearing PDA are introduced. In
Section 4, two experimental data sets are used to verify the proposed method. Finally, a conclusion is carried out in
Section 5.
2. Theory Background
2.1. Spectral Clustering
Different from the traditional clustering algorithms like k-means, the spectral clustering algorithm is based on graph theory [
28]. Spectral clustering takes samples as vertices and the similarity between samples as the weight of vertex connection edge, transforming the clustering problem into the partition problem of an undirected graph with weights. It can find clusters at any space and converge to the global optimal solution [
29], which is superior to the traditional clustering method [
30]. Therefore, spectral clustering is utilized to cluster different lifetime curves as a pre-procedure of feature selection in PDA.
For the sample data
, each data point
can be represented as a vertex
. Let
be an undirected graph with a vertex set
[
22]. Assume that the graph
G is weighted. Each edge between two vertices
and
carries a non-negative weight
≥ 0. Then, a weighted adjacency matrix of the graph can be obtained as follows:
As
G is undirected,
. If
, it means that the vertices
and
are not connected by any edge. Then, W(,) defines the relations of two not necessarily disjoint sets
A,
B ⊂
V.
The goal of spectral clustering is to cut the graph
G = (
V,
E) into
k subgraphs with no connection, which can be defined as follows:
The set of each subgraph points is defined as
A1,
A2, …,
Ak, and they satisfy
Ai∩
Aj = ∅, and
A1∪
A2∪…∪
Ak = V. Many techniques have been proposed to solve the cutting graph problem, and the
NCut technique was utilized in this study [
22]. Based on the
NCut technique, the problem described in Equation (3) can be rewritten as
where the degree matrix D is defined as the diagonal matrix with the degrees
on the diagonal, and
. L
sym is the normalized graph Laplacian matrix defined as
. This is the standard trace minimization problem, and solution Y consists of the minimum first
k eigenvectors of the matrix L
sym. Now, all sample points belonging to the same cluster were mapped from the original high-dimensional feature space into a new low-dimensional feature space (i.e.,
), which enabled us to obtain the final clustering result based on the simple clustering method. In this study, the classical k-means method was adopted.
2.2. Rank Mutual Information
The Spearman coefficient, as one of the major statistical correlation coefficients, was often used to measure the correlation between two continuous variables. Generally, the Spearman coefficient is utilized to evaluate the non-linear correlation between feature and time series for feature selection in PDA as a similarity metric. However, the whole degradation process of the bearing usually covers several different degradation stages. The evolution mechanism of each stage was different, resulting in different development patterns of vibration signals in each stage. For example, during the whole lifetime of the bearing, the normal stage is usually long and stable, while the fault stage usually changes rapidly. Unfortunately, the Spearman coefficient can only evaluate the overall trendability of time series, which cannot reflect the local structure of data. Therefore, it is not suitable as an evaluation metric for the selection of bearing degradation features. In this study, we introduced a new evaluation metric for degradation feature selection called rank mutual information (RMI).
RMI is a generalization method based on Shannon entropy, which can be used to measure the correlation between two sequence data. In particular, when RMI is used to measure the non-linear correlation between sequence data and time series, we find that it is more easily affected by the trendability of data in the later stage. This quality has advantages in optimizing the bearing degradation feature curves. So, RMI was introduced as the evaluation metrics for feature selection in this study.
Let
and
be two sequence data, where
. Given
,
, we defined the following sets:
Then, the ascending rank mutual information (ARMI) and the descending rank mutual information (DRMI) between sequence
X and
Y were defined as:
In ARMI,
is the cardinality of set
;
is an intersection operator; and
rni is the degree of
xn worse than
xi. We have:
In DRMI,
rni is the degree of
yn better than
yi. We have:
When RMI is used to measure the trendability between feature sequence
X and time series
T, Equations (9) and (10) could be rewritten as:
It can be seen from Equation (13) that the larger
n is, the larger
N/(
N −
n + 1) is. ARMI is more susceptible to the current rank sequence. From Equation (14), we can see that as
n increases, DRMI is less affected by the current rank sequence, which is just opposite to ARMI. Therefore, we used ARMI to assess the trendability of the degradation feature sequence, aiming to select features with better trendability in the degradation or fault stage. It was easy to deduce that 1 is the upper bound of ARMI, and the derivation process is as follows:
We used a few simple simulation data to illustrate the properties of RMI. The mathematical formulas of
and
are shown as follows:
where
x is a positive integer representing time, and
N(
x) is a random fluctuation following the standard normal distribution. The plots of
and
are shown in
Figure 2. The Spearman and RMI were used to measure the trendability of
and
. The calculation results are shown in
Table 1.
From the results of Spearman in
Table 1, we can see that the absolute value of
was greater than
, indicating that
had a greater trendability. However,
is more desirable and more consistent with the actual bearing performance degradation process. For
, the negative trendability of the previous stage and the positive trendability of the later stage offset each other, making the value of Spearman close to 0, which reflects the “misjudgment” of the Spearman coefficient. However, RMI can reduce the impact of previous stage data to a certain extent and prefers the features with good trendability in the later stage. So, RMI is more suitable for degradation feature curve selection. It is worth mentioning that the positive and negative trendability dividing line of ARMI is not necessarily 0, while Spearman is. So, we can use the Spearman coefficient for trendability correction before using ARMI if necessary.
2.3. Student’s t-HMM
The hidden Markov model is a dual stochastic model based on time series, which covers two random chains. One is a stochastic process for the observation sequence chain, and the other is a Markov process for the hidden state chain. Based on Bayesian inference, it estimates the hidden state changes from the observation data. Its basic principle is essentially the same as performance degradation evaluation, and the evaluation results are highly interpretable. So, HMMs are widely used in the field of mechanical fault diagnosis and PDA [
31,
32]. In this study, Student’s t-HMM, which has been proved to be highly tolerant to outliers in real-world applications [
27,
33], was introduced for bearing PDA based on the selected features. The graphical illustration of Student’s t-HMM is displayed in
Figure 3.
The Student’s t-HMM is defined as a finite state-space hidden Markov model whose observation emission distributions of each hidden state are modeled by Student’s t-mixture models (SMMs). Suppose that one Student’s t-HMM has I hidden states, and the number of the components of SMMs is J. Then, one Student’s t-HMM can be expressed by the following parameters:
: the initial state probability distribution. , and for .
: the state transition probability matrix. , and for .
: the observation probability distribution parameter sets based on the Student’s t-HMM.
, and
for
. Then, the probability density of the observation
emitted from the
i-th hidden state can be calculated as
Then, one Student’s t-HMM can be described by , , and . For convenience, the notation is used to indicate the complete parameter set of one Student’s t-HMM.
3. Bearing PDA Based on SC-RMI and Student’s t-HMM
Raw sensory data exhibit rich degradation information along with many kinds of disturbing information, so it can be quite challenging to obtain effective features with strong trendability to reflect the degradation process in a meaningful way. Moreover, some classical features like RMS lack a stable following trend with degradation process until a few times before failure occurs. Consequently, degradation-sensitive feature extraction and optimal selection from monitoring signals are quite important steps in PDA and have a direct and important impact on the assessment results. According to these, a new PDA framework based on SC-RMI and Student’s t-HMM was proposed. Firstly, spectral clustering was utilized to cluster lifetime feature curves. The features with similar shapes during the whole lifetime were clustered together, and features with different shapes during the whole lifetime were separated from each other. To prevent the loss of effective information, it was necessary to maintain the diversity of feature space. Meanwhile, information redundancy undoubtedly exists among features in the same cluster. Therefore, as the next procedure, optimal features based on RMI from each cluster were selected and put together to construct the final feature set. Feature selection based on these two steps can not only reduce the information redundancy caused by similar feature curves but can also ensure the diversity of different degradation curves in a feature set. So, this feature selection method is called SC-RMI for short. Finally, the selected features were put into Student’s t-HMM for PDA, which covers the training and testing procedures. The main steps are described as follows.
Step 1: feature selection based on SC-RMI. Based on time-frequency domain feature extraction methods, several lifetime feature curves could be obtained from the training lifetime data. Firstly, spectral clustering was utilized to cluster the features with similar shapes and trendabilities during the life cycle. Secondly, in order to fuse the degradation information in different feature clusters and reduce the information redundancy of similar features at the same time, RMI metrics were used to evaluate features, and features with the largest RMI were selected from each cluster. Then, a degradation-sensitive feature set was established with rich degradation information and less redundancy simultaneously. This feature selection method is simply called SC-RMI.
Step 2: PDA modeling based on Student’s t-HMM. In the application of PDA, a normal Student’s t-HMM is usually constructed based on the degradation-sensitive feature training set from the normal operation state. After the feature selection, the degradation-sensitive feature set was established. Features extracted from data under the normal stage were utilized as the training data for normal Student’s t-HMM modeling. All the algorithms of ordinary HMMs could be applied. Detailed algorithms are provided in reference [
34]. Then, a normal Student’s t-HMM could be obtained.
Step 3: performance degradation assessment. The model structure and parameters of the normal Student’s t-HMM reflect the multi-state time series statistical law of monitoring data under normal operation state. The degradation process of equipment can be regarded as a deviation from the normal operation state. Then, the testing feature set is put into the normal Student’s t-HMM to calculate the likelihood probability output . The likelihood probability output of the testing data in the trained model is a measure of the membership degree of the state of the testing data to the normal state. The closer the current equipment is to the normal operation state, the greater the likelihood probability of the testing data output in the normal Student’s t-HMM. Therefore, in the framework of equipment PDA based on Student’s t-HMM, the output likelihood probability of the testing data in the normal state model is often recorded as the performance indicator (PI).
The whole frame of the proposed method is shown in
Figure 4. Sensitive features were selected through SC-RMI as shown in the blue box and used in subsequent steps. The parameters of the trained Student’s HMM calculated in the orange box were passed to the green box for assessment.