1. Introduction
With the development of modern industry, motors are ubiquitous in manufacturing applications [
1]. Rolling bearings are the core components and vulnerable parts of machinery. Their health directly affects the performance, efficiency, stability, and life of the machine. According to statistics from the American Electric Power Research Institute, 41% of motor faults are caused by bearing damage, which is the primary cause of motor faults [
2]. In order to maintain the safe operation of equipment, early identification of rolling bearing defects is essential [
3].
Traditional bearing fault diagnosis is mainly based on model analysis and signal processing [
4]. However, these two methods often rely on the accumulation of experience. Due to the limitations of these two methods, they cannot meet the reliability requirements of modern production equipment. With the development of artificial intelligence technology, fault diagnosis methods based on machine learning show good performance in terms of motor health detection. Machine learning-based bearing fault diagnosis can be divided into three steps. The first step is to extract features, the second step is feature selection, and the third step is classifier recognition [
5].
Extraction of the running data of the bearings is an important step in the realization of rolling bearing condition monitoring. Feature extraction is the process of extracting attributes from motor signals [
6]. So far, researchers have studied various information analysis techniques, such as the fast Fourier transform (FFT), envelope analysis (EA), wavelet transform (WT), and Hilbert–Huang transform (HHT) [
7]. The HHT is one of the most widely used signal analysis tools in nonlinear and nonstationary signal analysis. Therefore, this paper uses HHT technology to extract fault features. Not all the time–frequency domain features extracted by HHT technology are conducive to fault diagnosis. Irrelevant and redundant features not only reduce the efficiency of model operation, but also lead to a decline in model recognition performance [
8]. Therefore, in order to avoid the interference of redundant and irrelevant features, it is necessary to use effective methods that select the optimal feature subset from multi-dimensional features.
Feature selection is an essential data preprocessing method, which can effectively reduce data dimensionality and improve classification performance by removing redundant or irrelevant features [
9]. Currently, two widely used strategies in feature selection methods are the filter and wrapper methods [
10]. The filter method obtains the intrinsic correlation of features through univariate statistics. This method is independent of the learning algorithm, so filter calculation is cheaper than calculation using the wrapper method [
11]. The wrapper method uses the classifier as part of the evaluation function, so it usually works better than filters. Recently, swarm intelligence algorithms such as the genetic algorithm (GA) [
12], particle swarm optimization (PSO) [
13], grey wolf optimization (GWO) [
14], and crow search algorithm (CSA) [
15] have been widely used in the search process of wrapper-based methods. Furthermore, the cuckoo search algorithm (CS) [
16] is inspired by cuckoo production and breeding behavior. It is a promising metaheuristic algorithm due to having fewer adjustment parameters and good search ability. The CS algorithm has a more effective search ability than the GA and PSO [
17]. However, the CS algorithm still faces the problems of slow convergence speed and random initialization when solving feature selection problem.
Based on the problems mentioned above, a feature selection method based on a clustering hybrid binary cuckoo search (CHBCS) is proposed. The main contributions of this study are briefly presented as follows:
Propose a strategy to extract the time–frequency domain features of motor signals based on the Hilbert–Huang transform.
In order to reduce redundant features in a population, a clustering hybrid initialization method is presented. The method uses the Louvain algorithm to cluster features and initializes the population according to the clustering information and the number of features, which can effectively remove redundant features.
A mutation strategy based on Levy flights is proposed to improve the update formula. This strategy can effectively utilize the high-quality information of the population by guiding the subsequent search with several high-quality individuals.
The proposed dynamic (Pa) probability strategy adaptively adjusts the (Pa) probability based on population rankings to preserve the high-quality solution of the current population.
This article is organized as follows. The related work is discussed in
Section 2.
Section 3 introduces the feature extraction method.
Section 4 describes the proposed algorithm in detail. In
Section 5, the effectiveness of the proposed CHBCS algorithm is verified by experiments. Finally,
Section 6 provides the conclusion.
2. Related Work
Machine learning-based bearing fault diagnosis has been widely used in rotating machinery health condition monitoring. In machine learning, feature selection plays an important role in improving classifier performance [
18].
Some researchers have previously studied filter-based feature selection methods. Cui et al. [
19] selected fault features accurately according to approximate entropy and correlation parameters and applied this method to the fault diagnosis of gear reducers. Zheng et al. [
20] proposed a bearing failure diagnosis method using Laplacian scores for the selection of features. The method uses multi-scale fuzzy entropy to characterize the complexity and irregularity of rolling bearing vibration signals and sorts the feature vectors according to the importance of features and their correlation with fault information. Tang et al. [
21] proposed a feature selection method based on the maximum information coefficient to improve bearing fault diagnosis. This method uses the maximum information coefficient to consider the correlation between features and the correlation between features and fault categories for feature selection. Tang et al. [
22] proposed the GL-mRMR-SVM feature selection model, which uses maximum correlation and minimum redundancy as feature selection criteria.
Compared to the above filtering-based feature selection methods, more research has focused on wrapper-based feature selection methods. Lu et al. [
23] proposed a genetic algorithm feature selection method based on a dynamic search strategy and applied it to rotating machinery fault diagnosis. This method uses empirical mode decomposition and variable range coding to establish the feature set. The improved genetic algorithm with the dynamic search strategy is used to process the initial feature set. Finally, a support vector machine is used for classification. Rauber et al. [
24] studied heterogeneous feature models and feature selection in bearing fault diagnosis. The signal features were extracted by a complex envelope spectrum, statistical time–frequency domain parameters, and wavelet packet analysis. A feature selection method based on the greedy method is used to process the feature set. Finally, a k-nearest neighbor classifier, feedforward network, and support vector machine are used for fault diagnosis. Shan et al. [
25] proposed a rotating machinery fault diagnosis method based on improved variational mode decomposition (IVMD) and the hybrid artificial herd algorithm (HASA). This method uses IVMD to decompose the signal and extract the signal characteristics. The HASA is used for feature selection to improve the performance of the classifier. Nayana et al. [
26] first extracted a set of six time-dependent spectral features (TDSF) to diagnose bearing faults. A feature selection algorithm combining particle swarm optimization and wheeled differential evolution was used to select effective features, and the final feature subset contained most of the TDSF features. Lee et al. [
27] proposed a bearing fault diagnosis model based on the feature selection optimization method. By using the Hilbert–Huang transform and envelope analysis, the motor signal is recovered. A feature selection method based on improved binary particle swarm optimization is proposed to improve classification accuracy.
The swarm intelligence algorithm has been widely used in feature selection methods, but there are still some problems. The CS algorithm uses Levy flights to search for the solution space. Due to the heavy-tailed distribution of Levy flights, the large search step size of the algorithm is not conducive to convergence. Some researchers have improved the CS algorithm so that it has better performance when solving feature selection problems. Rodrigues et al. [
28] proposed a feature selection method based on the binary cuckoo search (BCS) algorithm and verified the robustness of the BCS algorithm. Salesi and Cosma [
29] proposed embedding a pseudo-binary mutation neighborhood search procedure into the BCS algorithm, but the randomness of this strategy is not conducive to algorithm convergence. In order to solve the stability problem of the CS algorithm, Pandey et al. [
30] used two analysis techniques to perform double data transformation on the original features. The processed data eliminates the high-order correlation between features, which is conducive to subsequent searches. Aziz et al. [
31] presented a feature selection method based on a rough set and an improved CS algorithm to deal with high-dimensional data. Kelidari et al. [
32] proposed a chaotic cuckoo optimization algorithm (CCOALFDO). This algorithm improves the performance of the algorithm through chaotic mapping, an elite preservation strategy, and a uniform mutation strategy. Alia and Taweel [
33] proposed a new feature selection method by combining rough set theory with the binary cuckoo search algorithm. This method improves the binary cuckoo search algorithm by developing a new initialization and global update mechanism to improve the convergence efficiency of high-dimensional datasets.
Although the above improved algorithm can remedy some shortcomings of the CS algorithm, it still faces the problems of slow convergence speed and random initialization. Therefore, based on the simple binary cuckoo search algorithm, this paper proposes a feature selection method based on a clustering hybrid binary cuckoo search to overcome the above shortcomings.
3. Feature Extraction
In this paper, the Hilbert–Huang transform is used to extract the features of bearing signals. This technique first decomposes a column of time series data using the empirical mode decomposition (EMD) algorithm. It then obtains the operating characteristics of the time series data using the Hilbert transform [
34].
The vibration signal obtained from rotating machinery is usually nonstationary, complex, and nonlinear, which does not meet the preconditions of the Hilbert transform [
35]. Therefore, it is necessary to use EMD to decompose the nonlinear stationary signal into a stationary signal. The Hilbert transform is then performed on the intrinsic mode function (IMF) obtained by decomposition, and the complex signal is obtained for further analysis. For the given signal
x(
t), the signal decomposition process in the EMD method is shown in
Figure 1.
According to the IMF and the residual component
r(
t), the original signal
x(
t) can be reconstructed, as shown in Equation (1).
Each IMF component
ci(
t) obtained via EMD of the signal is subjected to the Hilbert transform to generate
H[
ci(
t)]. The equation is as follows:
where
H[
ci(
t)] and
ci(
t) are conjugate complex pairs. The analysis signal
zi(
t) is shown in Equation (3).
where
ai(
t) and
θi(
t) are time functions, as shown in Equations (4) and (5).
The Hilbert time–frequency spectrum matrix of the motor operation signal is obtained by the above calculation. The features of the signal of the time–frequency domain are extracted according to the information of the Hilbert time–frequency spectrum matrix [
27]. In the time domain, this paper obtained five characteristic curves by calculating the maximum (T-max), mean (T-mean), mean square error (T-mse), root mean square (T-rms), and standard deviation (T-std) of each column of the Hilbert time spectrum matrix. For each characteristic curve, the maximum value, mean value, mean square error, root mean square value, and standard deviation can be calculated. Thus, a total of 25 statistical features were obtained from five characteristic curves, recorded as F
1–F
25. Using the same program in the frequency domain, the five characteristic curves of F-max, F-mean, F-mse, F-rms, and F-std are obtained. A total of 25 statistical features from the frequency domain were extracted and recorded as F
26–F
50.
Figure 2 shows the process of the Hilbert–Huang transform establishing a feature set containing 50 statistical features.
4. Feature Selection of the Clustering Hybrid Binary Cuckoo Search
4.1. Binary Cuckoo Search Algorithm
Feature selection obtains a subset by selecting appropriate features, which is essentially a binary problem. This paper uses a binary vector to define the solution of the feature selection problem. The formulas are as follows:
The new generation of the bird nest location is based on a global random search and its update formula is as follows:
in which
is the best contemporary solution,
α0 is a constant,
α0 = 0.01,
μ and
ν are two random numbers generated from a normal distribution, and
ϕ is a random number extracted from the normal distribution. It can be seen from Equation (8) that the CS searches for new solutions around the current optimal solution.
In the BCS, we effectively convert each dimension of a position vector in a continuous space into a binary code through a V-shaped transfer function, as shown in Equations (9) and (10).
4.2. Clustering Hybrid Initialization Method
The initial nest position is an important part of the CS algorithm and has a great influence on the convergence speed and the final solution. The random ‘0’ or ‘1’ operation in the random initialization method does not guarantee the stability and quality of the initial population. Therefore, this paper proposes a clustering mixture (CH) initialization method based on feature similarity. Based on the similarity between features, this method uses the community division algorithm to complete the clustering. Using clustering information to select features reduces the randomness of selection. In addition, this method can effectively filter out redundant features during the initialization process and improve the quality of the initial population to a certain extent. The clustering hybrid initialization includes two steps: feature clustering and hybrid initialization.
4.2.1. Feature Clustering
Based on the commonality of information between features, an undirected weighted feature map is established with each feature as the vertex. The similarity between features is determined by the symmetric uncertainty (
SU) [
36], which is used as the weight of the edge. The larger the
SU value of the two features, the greater the similarity between the features. The calculation formula for
SU is shown in Equation (11).
in which
X and
Y are two random variables and
H(
X) is the entropy of
X, which is calculated by Equation (12).
IG(
X,
Y) is the information gain of
X under the
Y condition, which is obtained by Equation (13).
where
p(
xi) represents the probability when
x equals
xi,
H(
X|
Y) describes the information gain of
X under the
Y condition, and
p(
xi|
yi) represents conditional probability when
x equals
xi and
y equals
yi.
For the
d-dimensional dataset, the similarity matrix
α of
d ×
d is obtained by calculating
SU. The similarity matrix is processed by the OTSU algorithm [
37] to obtain the reasonable threshold
g. For any element
αij in the similarity matrix, if the corresponding element of the adjacency matrix
αij >
g, it is set to ‘1’ (otherwise it is set to ‘0’). The transformation from similarity matrix to adjacency matrix is implemented. In this matrix, ‘1’ means that two features are connected in the feature adjacency graph, with ‘0’ indicating they are not connected.
The Louvain algorithm [
38] clusters the features according to the feature adjacency graph to obtain the feature clustering group
Group = {
group1,
group2,…,
groupS}, where
S is the total number of clusters.
Figure 3 is a visualization of the clustering process of the Louvain algorithm. The same clustering group contains repetitive information relevant to the final classification task. A feature selected from a cluster will hold most of the information of the entire clustering feature.
4.2.2. Hybrid Initialization
The number of features is a significant factor affecting population quality. When the characteristics of clustering groups are selected through quantitative regularity, the small-scale initialization method and the large-scale initialization method are defined.
The small-scale initialization method selects any number of groups less than or equal to the total number of clusters S. The initial population containing SN features is then obtained by randomly selecting one feature from each group. This method selects a small number of features, which can effectively reduce redundant features.
The large-scale initialization method selects any number of SN features that are less than or equal to the total number of features N. If SN ≤ S, the same operation as that used in small-scale initialization is performed. If SN > S, a feature is selected from each cluster and then this selection is continued according to the above operation until the number of features satisfies SN. When the optimal feature subset contains relatively more features, the large-scale initialization method is more likely to obtain the optimal solution.
A hybrid initialization method is proposed by combining the two methods. Most cuckoos use the small-scale initialization method to reduce the number of features effectively. Other cuckoos use large-scale initialization methods to supplement the possibility of an optimal feature subset with multiple feature numbers. The hybrid initialization method considers multiple possibilities and can effectively combine the advantages of the two initializations. This paper sets 2/3 of the population using the small-scale initialization method.
Figure 4 shows the clustering hybrid initialization procedure.
4.3. Mutation Strategy Based on Levy Flight
The cuckoo searches for the optimal solution based on its position and the current optimal position of the bird’s nest in the CS algorithm. However, in the contemporary population, useful information about quality nests can be obtained in addition to the best modern solutions. By using the location information of multiple quality nests, cuckoos can identify the global optimal nest faster. Therefore, this paper introduces three randomly selected high-quality individuals into the update equation. This improvement enables the algorithm to explore more of the entire search space.
Furthermore, the CS algorithm has strong global exploration ability due to the addition of Levy flight. However, the heavy-tailed distribution of the Levy flight makes the jump step of the algorithm larger in the iterative process, which is not conducive to approximating the optimal solution. This article presents the spawning range
ω, which specifies that each cuckoo should lay between 2 and 5 eggs in this range. The spawning range of each cuckoo is calculated by Equation (15).
where
β is the constant,
β = 0.25,
eggs is the number of eggs laid per cuckoo, and
total_eggs represents total egg laying amount. The larger the nest yield, the larger the spawning range.
To effectively utilize useful information from the whole population and accelerate the convergence of the algorithm, a new global search formula is proposed, as shown in Equation (16).
in which
are three individuals randomly selected from the top 20% of the individuals in the population.
In the new global search formula, the solution will not be strongly attracted by the current optimal solution, reducing the speed at which the algorithm falls toward the local optimum. The laying range ω can control the cuckoos so that they walk randomly in different amplitude steps. Furthermore, massive offspring nests can improve local development ability. Finally, only the optimal offspring is retained as the next generation nest.
4.4. Dynamic Pa Probability Strategy
The probability
Pa in the standard cuckoo search algorithm is a fixed value. A fixed probability of discarding the nest indiscriminately may cause the loss of a better solution, which is not conducive to convergence. A dynamic
Pa probability strategy based on population fitness sorting is proposed, which can preserve the possible optimal solution with high probability. The population number is set to
M and the nests are sorted according to fitness. After sorting, each nest is assigned to a level, with the optimal solution being level one and the worst solution being level
M. The assignment level and
DPa are shown in Equations (17) and (18).
where
Pamin is the minimum probability and
Pamax is the maximum
Pa probability.
where
r is a random decimal that obeys the normal distribution. According to Equation (19), when
DPa is less than
r, a new nest is generated using the clustering hybridization initialization method to replace the original nest; otherwise, the contemporary nest is kept. The lower the
DPa value corresponding to each nest, the higher the probability of preservation. After discarding poor nests, a new nest generated by the CH initialization method can prevent the algorithm from falling toward the local optimum.
4.5. k-Nearest Neighbors Classifier and Fitness Function
The
k-nearest neighbors (KNN) classifier is a machine learning method for multi-classification [
39]. For unknown samples, the distance between the unknown sample and all existing samples needs to be calculated. The
k samples nearest to the unknown sample are selected and the category of the unknown sample is judged according to the
k sample types. The basic principle of the KNN classifier is shown in
Figure 5.
A key point in the KNN algorithm is to determine the distance function. The existing distance functions include Euclidean distance, cosine distance, Hamming distance, and Manhattan distance. The most widely used is the Euclidean distance, as shown in Equation (20).
In this paper, the classification error rate
Error obtained by calling the KNN classifier is used to evaluate the feature set. When setting the
k parameter to 5, the description formula is as follows.
The fitness value is applied to assess the effectiveness of the cuckoo nest solution. The classification error rate and the number of features are two essential criteria for evaluating classification performance. The fitness function is obtained by weighting them, as shown in Equation (22).
in which
q∈[0, 1] corresponds to the weight of the classification error rate,
q = 0.9, and |
x| represents the number of features.
4.6. Feature Selection Based on the Clustering Hybrid Binary Cuckoo Search
Combined with the above-improved strategy, a feature selection method based on the clustering hybrid binary cuckoo search is proposed. This method introduces the clustering hybrid initialization strategy, a mutation strategy based on Levy flight, and the dynamic
Pa probability strategy into the CS algorithm to improve the classification performance of the algorithm. A flow diagram of the CHBCS algorithm is shown in
Figure 6.