Next Article in Journal
Challenges and Opportunities: Experiences of Mathematics Lecturers Engaged in Emergency Remote Teaching during the COVID-19 Pandemic
Next Article in Special Issue
Hermite B-Splines: n-Refinability and Mask Factorization
Previous Article in Journal
Quasi-Deterministic Processes with Monotonic Trajectories and Unsupervised Machine Learning
Previous Article in Special Issue
A Real-Time Harmonic Extraction Approach for Distorted Grid
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Localization of Rolling Element Faults Using Improved Binary Particle Swarm Optimization Algorithm for Feature Selection Task

Department of Electrical Engineering, Chung Yuan Christian University, No. 200, Zhongbei Road, Zhongli District, Taoyuan City 320, Taiwan
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(18), 2302; https://doi.org/10.3390/math9182302
Submission received: 26 July 2021 / Revised: 15 September 2021 / Accepted: 17 September 2021 / Published: 18 September 2021

Abstract

:
The accurate localization of the rolling element failure is very important to ensure the reliability of rotating machinery. This paper proposes an efficient and anti-noise fault diagnosis model for rolling elements. The proposed model is composed of feature extraction, feature selection and fault classification. Feature extraction is composed of signal processing and signal noise reduction. Signal processing is carried out by local mean decomposition (LMD), and signal noise reduction is performed by product function (PF) selection and wavelet packet decomposition (WPD). Through the steps of signal noise reduction, high-frequency noise can be effectively removed, and the fault information hidden under the noise can be extracted. To further improve the effectiveness of the diagnostic model, an improved binary particle swarm optimization (IBPSO) is proposed to find the most important features from the feature space. In IBPSO, cycling time-varying inertia weight is introduced to balance exploitation and exploration and improve the capability to escape from local solutions, and crossover and mutation operations are also introduced to improve exploration and exploitation capabilities, respectively. The main contributions of this research are briefly described as follows: (1) The feature extraction process applied in this research can effectively remove noise and establish a high-accuracy feature set. (2) The proposed feature selection algorithm has higher accuracy than the other state-of-the-art feature selection algorithms. (3) In a strong noise environment, the proposed rolling element fault diagnosis model is compared with the state-of-the-art fault diagnosis model in terms of classification accuracy. Experimental results show that the model can maintain high classification accuracy in a strong noise environment. Therefore, it can be proved that the fault diagnosis model proposed in this paper can be effectively applied to the fault diagnosis of rotating machinery.

1. Introduction

With the improvement of industrial automation levels, the development of rotating machinery is more precise than ever. Therefore, the monitoring and fault diagnosis methods of rotating machinery have always been the field that researchers are committed to developing [1]. M. Van and H.J. Kang [2] proposed a bearing fault diagnosis model. The model combines a new feature extraction technology based on non-local mean denoising and empirical mode decomposition (EMD), and a two-stage feature selection technology based on hybrid distance evaluation technology (DET) and particle swarm optimization (PSO). The model proved its effectiveness in bearing failure experiments. F. Alvarez-Gonzalez et al. [3] proposed an online statistical analysis method based on Hilbert–Huang transform (HHT) to detect permanent magnet synchronous motor (PMSM) stator short-circuit faults and proved reliable fault detection through simulation results. S. Haroun et al. [4] proposed multiple feature extraction techniques to detect stator winding faults of induction motors. First, the three-phase stator current is analyzed using Park transform, zero-crossing time signal and envelope. Then, the time domain and frequency domain statistical features are extracted from the analysis results. Experimental results show that the proposed method can detect stator winding faults and identify fault phases under various faulty cases and different load variations. The above results prove that fault diagnosis can greatly improve system reliability, reduce maintenance costs and even avoid major production losses caused by failures. In addition, in the statistics of the Electric Power Research Institute (EPRI), rolling element failures accounted for 41% of all rotating machinery failures, the highest proportion of failure types [5]. Moreover, even early rolling element fault can quickly develop into serious fault [6]. Therefore, this study focuses on constructing an efficient rolling element fault diagnostic model.
In recent years, with the advent of accelerometers, it is easy to measure vibration signals and generally provide a wide frequency range, so fault diagnosis models based on vibration signals have been widely proposed. Z. Wang et al. proposed an efficient and robust hybrid model, using wavelet packet decomposition (WPD) and mutual dimensionless indexing to extract the best features, and random forest for classification [7]. Y. Shao et al. proposed a rolling element fault diagnosis model based on the principle of coherent demodulation. By extracting the feature frequencies of different fault types, the fault type can be accurately classified [8]. Z. Huo et al. proposed a rolling element fault diagnosis model, which can be effectively applied to multi-speed environments. The model uses particle swarm optimization and quasi-Newton minimization algorithm to optimize the parameters of the continuous wavelet transform (WT) model. Then, it performs feature extraction in the 3-D feature space and the k-nearest neighbor (k-NN) classifier for fault classification [9]. S. Wei et al. proposed a time-varying envelope filtering (TVEF) to extract the features of rolling element faults. Using the instantaneous frequency and instantaneous amplitude extracted by this method to reconstruct the high-resolution time-frequency distribution can more accurately extract the fault features [10].
The fault diagnosis model based on vibration signals is usually divided into three stages: feature extraction, feature selection and fault classification. Among them, feature extraction and fault classification are key stages. The signal processing technology in feature extraction [9] is an important step in reducing the dimensionality of vibration signals and extracting key fault messages. Due to the complex working environment of rotating machinery, the measured vibration signal contains non-stationary components and noise. Signal processing techniques based only on the time domain or frequency domain may not be effective. Therefore, some time-frequency analysis signal processing techniques such as fast Fourier transform (FFT), short-time Fourier transform (STFT) and continuous wavelet transform (CWT) are widely used. The classification of signal analysis results by neural network (NN) or machine learning (ML) is the final stage. Feature selection [11] is an option of the model, and its function is to solve the diagnosis performance degradation caused by redundant or irrelevant fault features.
However, the above-mentioned time-frequency analysis signal processing techniques still have their own limitations. For example, FFT and STFT affect the decomposition performance due to the use of fixed-length windows [12]. CWT solves the problems of FFT and STFT with an adjustable window size [13]. However, once the decomposition scale of CWT is defined, CWT can only decompose signals in the defined frequency band, which makes CWT non-adaptive [14]. Based on the above analysis, adaptive signal processing technology may be able to analyze vibration signals more effectively [15]. Local mean decomposition (LMD) [16] is a new adaptive signal processing technology and has many advantages that can be applied to rolling element fault diagnosis. First, the decomposition process of LMD does not need to use Hilbert transformation (HT), so it will not encounter negative frequencies [17]. Secondly, LMD can decompose and demodulate signals at the same time. Third, LMD can decompose the amplitude and frequency modulation characteristics of the vibration signal when the rolling element fails [18,19].
However, in the actual working environment, the features of rolling element faults are usually masked by noise or other rotating machinery components’ disturbing vibrations [20,21], causing LMD to decompose redundant or irrelevant product function (PF) components. Therefore, this study used a denoising technique combining PF selection and WPD. First, the PF selection method removes redundant or irrelevant PF components and selects the most valuable PF components for further denoising. Then, WPD denoising technology can effectively remove noise and present fault information in wavelet packet coefficients [22,23]. Finally, the fault features are extracted from these wavelet packet coefficients.
Although the hard work in the feature extraction stage extracts the fault features in the original signal, there may still be redundant or irrelevant fault features in the feature set, resulting in a decrease in diagnostic performance [11]. Therefore, feature selection is applied to prevent overfitting and improve model performance [11]. Feature selection can be divided into filter methods and wrapper methods. The filter method mainly uses correlation coefficient (CC) or univariate mutual information (MI) to calculate the linear intensity between each input and the target, and sorts according to their intensity and removes irrelevant features [24]. The wrapper method combined with a specific classifier for accurate evaluation can usually achieve better performance than the filter method [11,24]. Therefore, some optimization algorithms such as binary particle swarm optimization (BPSO) [25], genetic algorithm (GA) [26] and binary chicken swarm optimization (BCSO) [27] are widely used in feature selection. However, the above algorithms generally have many defects, such as premature convergence [28,29] and falling into local optima [28,29,30]. Although there is no optimization algorithm that can guarantee the best feature subset, PSO has successfully solved many nonlinear optimization problems in the engineering field due to its excellent computational efficiency and simple operation [31,32]. Therefore, PSO is still an optimization algorithm that many researchers are dedicated to researching [33,34,35]. Therefore, this study proposes an improved binary particle swarm optimization (IBPSO) as the feature selection task of the fault diagnosis model. In this study, three mechanisms are proposed to improve the performance of PSO. First, cycling time-varying inertia weights are introduced to balance exploration and exploitation and enhance the capability to avoid local optima. Considering the crossover and mutation mechanism can improve the exploration and exploitation capabilities of PSO and solve the problem of premature convergence of PSO.
Fault classification is another important stage that constitutes a rolling element fault diagnosis model. Researchers have widely used NN and ML in the fault diagnosis of rolling elements [36,37]. Traditional NN such as multilayers perceptron (MLP) has the problem of a complex structure and difficult training process [38]. ML has the advantage of being simple and easy to implement, and the classification results are better, especially the support vector machine (SVM) algorithm, which has many papers to prove its classification efficiency and anti-noise capability [39,40]. In recent years, a new type of NN, fully connected neural network (FCNN), achieves powerful performance through a new way of connecting neurons. FCNN has the following advantages: (1) The complexity of FCNN is like that of traditional single hidden layer NN, but the performance is very powerful [38]. (2) The addition of too many neurons in the traditional single hidden layer NN leads to overfitting and poor generalization of the model [41]. FCNN needs fewer neurons to achieve powerful performance and good generalization [42]. (3) In [42], the author proved that FCNN has excellent anti-noise capability and can complete classification under low signal-to-noise ratio (SNR). (4) In [38], the author showed the excellent performance of FCNN, and its classification performance is better than SVM. Therefore, in this study, we adjusted the number of layers and the number of neurons and compared the performance of five FCNNs to establish the most robust rolling element fault diagnosis model. The advantages of the above feature extraction, feature selection and fault classification motivate us to propose a robust rolling element diagnosis model with both classification accuracy and anti-noise capability.
The organization of this paper is as follows: Section 2 introduces the basic methods of the proposed model, including feature extraction process, binary particle swarm algorithm and fully connected neural network. Section 3 introduces the detailed description of the improved binary particle swarm algorithm and the flow of the rolling element fault diagnosis model. Section 4 discusses the experimental results of the University of California, Irvine (UCI) feature selection dataset and Case Western Reserve University (CWRU) rolling element failure dataset. Section 5 evaluates the diagnostic model and future work. Finally, Section 6 explains the conclusion.

2. Methodology

In this section, three important stages in fault diagnosis are presented: feature extraction, feature selection and classification.

2.1. Feature Extraction

In actual cases, the fault features are usually masked by a lot of noise (e.g., background noise and Gaussian noise). Therefore, the most important step in the fault diagnosis model is the denoising and extraction of fault features. In the feature extraction process of this study, first, local mean decomposition (LMD) decomposes the vibration signal into a set of product function (PF) components. Second, it uses PF selection to select the PF that contains the most fault information. Then, wavelet packet decomposition (WPD) is used for analysis further of fault information and denoising. Finally, the potential features of bearing fault features are extracted.

2.1.1. Local Mean Decomposition

The LMD is an effective technique that usually adopts to decompose non-stationary signals [16]. LMD can decompose non-stationary signals into simple PFs. Each PF is the product of the envelope signal and the pure frequency modulated (FM) signal. Then, the time-frequency distribution of the original signal can be easily derived. The process of LMD decomposing the vibration signal v ( t ) is shown as following a loop.
Step 1. 
Calculate all local extrema (e1, e2, …, ei, …) from v(t).
Step 2. 
All the local mean value mi and local envelope value ai can be determined by two successive local extrema ei and ei+1.
m i = ( e i + e i + 1 ) 2
a i = | e i     e i + 1 | 2 .
Step 3. 
Apply the moving averaging (MA) method to smooth the local mean function m 11 ( t ) and local envelope function a 11 ( t ) . m 11 ( t ) and a 11 ( t ) are the straight lines extending between the successive local extrema of the signal.
Step 4. 
The residue signal r 11 ( t ) can be subtracted by the local mean function m 11 ( t ) from the original signal v(t).
r 11 ( t ) = v ( t )     m 11 ( t )
Step 5. 
The FM signal f 11 ( t ) can be obtained from r 11 ( t ) and a 11 ( t ) :
f 11 ( t ) = r 11 ( t ) a 11 ( t ) .
Step 6. 
Take f 11 ( t ) as a new signal and Step 1 to 5 will be repeated until f 11 ( t ) is a purely FM signal. Then, go to Step 7.
Step 7. 
The instantaneous envelope function a 1 ( t ) is obtained by all local envelope functions produced during iteration.
a 1 ( t ) = a 11 ( t ) a 12 ( t ) a 1 n ( t ) = q = 1 n a 1 q ( t )
Then, the first product function P F 1 is generated by multiplying the instantaneous envelope function and the final FM signal:
P F 1 = a 1 ( t ) f 1 n ( t ) .
Step 8. 
The new signal u 1 is treated as v(t) minus P F 1 . Then, the above process is repeated from Step 1 to 7 until u i ( t ) = u i 1 ( t ) P F i ( t ) is a monotonic function. Finally, v(t) is decomposed into a set of PF components.
v ( t ) = i = 1 p P F i ( t ) + u p ( t ) .

2.1.2. Product Function Selection

The second step is to remove redundant components and extract the components that contain most of the fault information. Some parameters are used to extract those effective components, such as kurtosis [23], root mean square (RMS) [43] and correlation coefficient (CC) [44]. Kurtosis is very sensitive to early or weak failures, but when the failures become more severe, kurtosis cannot maintain an increasing trend [43]. RMS is not sensitive to early failure [45]. CC is used to evaluate the similarity of components, but it is not sensitive to early failures [17]. Therefore, a single parameter cannot effectively select components. This study uses weights to balance the statistical values of the above parameters and effectively select the PF component.

2.1.3. Wavelet Packet Decomposition

The third step is to use WPD to further analyze and remove high-frequency noise to extract the fault features that are hidden under the noise. WPD is a powerful noise reduction tool due to its high resolution at both high and low frequencies [22].
The wavelet packet coefficients (WPC) of a signal can be described as follows:
W P C j , n , k ( t ) = x ( t ) W P T j , k n ( t ) d t
where x ( t ) is the signal, W P T j , k n ( t ) denotes the wavelet packet function, j and k are scale and translation values, respectively, and n = 1, 2, …, 2j is the oscillation parameter.

2.2. Binary Particle Swarm Optimization for Feature Selection

The traditional particle swarm optimization (PSO) is based on the concept of swarm exploration of the search space to find solutions. Each particle moves in the search space at a certain velocity and updates the velocity according to its own search experience and the group’s search experience in each iteration, as shown in Equation (9) [25]. In Equation (9), each particle will update its velocity according to the best position it has searched (personal best) and the best position searched by the group (global best).
v i + 1 = w v i + a c 1 R a n d ( p b i p i ) + a c 2 R a n d ( g b p i )
where i represents the current iteration; w denotes the inertia weight; a c 1 and a c 2 denote the acceleration coefficients to control the strength of exploration and exploitation; vi and pi are current velocity and current position of particle, respectively; and p b i and g b represent the personal best position and global best position, respectively.
In binary particle swarm optimization (BPSO) [25] for feature selection, each particle is converted from a continuous position to a position in the binary search space through a transfer function. The converted position is represented as a bit string (i.e., feature subset), which means that the corresponding feature is selected (1) or not selected (0). The sigmoid function is used as a transfer function in this study. The position of each particle is updated based on the velocity of each particle. The steps to transfer from a continuous position to a binary position are described as follows:
s i g ( v i + 1 ) = 1 1 + e v i + 1
p i + 1 = { 1 0 i f   e l s e R a n d < s i g ( v i + 1 )

3. Proposed Rolling Element Fault Diagnosis Model

This section details the proposed rolling element fault diagnosis model based on three main stages: feature extraction, improved binary particle swarm optimization for feature selection and classifier.

3.1. Feature Extraction Process

In the feature extraction process, first, the motor signal is decomposed into PF components using LMD. Second, the PF selection is used to select the effective PF components which contain the most failure information. In this study, we choose two effective PF components. Third, WPD is used to denoise the effective PF components and extract failure information further. In this study, a two-level WPD is used to decompose the effective PF components into four wavelet packet coefficients. Finally, eight statistical fault features such as max value, min value, root mean square, mean square error, standard deviation, kurtosis, crest factor and clearance factor are extracted from each WPC and extract a total of 64 (2 × 4 × 8) features. Figure 1 illustrates the detail of feature extraction process.

3.2. Improved Binary Particle Swarm Optimization

In this subsection, two mechanisms and one parameter are introduced to enhance the BPSO. First, cycling time-varying inertia weight can not only balance exploration and exploitation but also enhance the capability to escape the local optima. Second, the position update formula based on crossover operation can select high-potential solutions to generate better solutions and improve the exploration capability. Finally, the position update formula based on mutation operation can improve the capability to escape the local optima without increasing the computational cost.

3.2.1. Cycling Time-Varying Inertia Weight

In BPSO, the inertia weight is an important parameter responsible for balancing exploitation and exploration in the algorithm. The inertia reduction of the weight value can make the algorithm transition from exploration to exploitation smoothly. However, BPSO still has the problem of premature convergence, and the algorithm cannot continue to find a better solution in the later stages of the iteration. In this study, cycling time-varying inertia weight is applied to not only balance exploitation and exploration but also improve the algorithm’s capability to escape local solutions [46]. The cycling time-varying inertia weight will cycle the number of cycles set by the user, so that the weight value linearly decreases from the maximum weight value (2) to the minimum weight value (0), and linearly increases from the minimum weight value (0) to the maximum weight value (2) until the number of cycles set by the user is met. The cycling time-varying inertia weight defined in Equation (12) is calculated as follows:
w = | 2 ( t   mod T / C ) T / ( 4 C ) |
where C is user defined cycling time and t and T denote the current iteration and maximum number of iterations, respectively.

3.2.2. Improve the Exploration Capability Based on Crossover Operation

In genetic algorithms, crossover operations can explore new areas in the search space, avoid premature convergence and further improve convergence accuracy. In this study, the three-point crossover operation is combined with BPSO. During the iteration process, when the best solution of the current population matches the best solution of the population obtained in the previous iteration, it may indicate that the overall evolution is slow, and a three-point crossover operation will be performed to create a high-potential solution. The three-point crossover operation formula is shown in Equation (13). First, a pair of solutions are randomly selected from the individual best solutions of each particle. Next, the three-point crossover operation randomly selects three positions to cut the pair of solutions. Then, the pair of solutions swaps segments to create two new solutions. Finally, one of the new solutions is randomly selected as the current solution.
p i = C r o s s o v e r ( p b r a n d 1 , p b r a n d 2 )
where Crossover is the three-point crossover operation applied in this study, and pbrand1 and pbrand2 are randomly selected particles from the individual best solutions.

3.2.3. Escape the Local Trap Based on Mutation Operation

Introducing the mutation operation in the genetic algorithm, this operation enables the solution to move slightly in the search space, increases the capability of escaping from the local solution and increases the diversity of the population further. In this study, the mutation probability MR is introduced to specify whether to change the position of each particle in the specified dimension. The mutation operation formula is shown in Equation (14). If Rand(i) (i = 1, 2, …, n) < MR, one dimension of the ith particle is selected randomly, and its value is mutated.
p i + 1 = { 1 p i p i i f   e l s e R a n d ( i ) < M R
where pi is current position of ith particle.

3.3. Rolling Element Fault Diagnosis Model

The proposed rolling element fault diagnosis model can be divided into three stages, including feature extraction, feature selection and classification.
In the feature extraction stage, as shown in Section 3.1, the measured signal is decomposed into a set of PF components through LMD. Next, the most important PF components are selected. Then, wavelet packet decomposition is used to extract further the fault information and denoise the most important PF components. Finally, eight statistical fault features are extracted through each wavelet packet coefficient.
In the feature selection stage, as shown in Section 3.2, the improved binary particle swarm optimization (IBPSO) optimizes the fault feature set obtained in the feature extraction stage to remove redundant features. This stage can improve the classification accuracy of the rolling element fault diagnosis model. The process of IBPSO is shown in Figure 2.
In the fault classification stage, the best feature subset obtained in the feature selection stage is classified through the fully connected neural network. In this study, the hyperparameters of the fully connected neural network will be adjusted to obtain the best classification accuracy to obtain the best rolling element fault diagnosis model.

4. Results

In this study, the results can be divided into two experiments, namely, the University of California, Irvine (UCI) feature selection dataset [47] and the Case Western Reserve University (CWRU) bearing failure dataset [48], and the experiments were simulated on Intel(R) Core (TM) i7-3930K CPU @ 3.20 GHz 3.60 GHz, 24 GB RAM and MATLAB platform.

4.1. Experiment 1: UCI Feature Selection Datasets

4.1.1. Experiment Setup and Parameter Setting

To evaluate the effectiveness of IBPSO in the field of feature selection, nine UCI feature selection datasets were used in this experiment, including BreastCancer, Wine, CongressEW, SpectEW, BreastEW, Ionosphere, krvskp, WaveformEW and Sonar. Table 1 describes the nine UCI feature selection data sets used in this experiment. In this experiment, the results of feature selection by IBPSO for nine UCI feature selection data sets are shown and compared with three basic feature selection algorithms, including BPSO, GA and BCSO. Finally, IBPSO is compared with other state-of-the-art feature selection models.
In this experiment, the k-nearest neighbors (k-NN) classifier is used as the classifier of the wrapper feature selection model. Table 2 is the parameter setting of this experiment. T is regarded as the convergence criterion. Each model performed 30 independent feature selections on all datasets and collected experimental results, including the average classification accuracy, the average number of selected features, the standard deviation of the classification error and the average computational time. Based on the requirements of a high-accuracy fault diagnosis model, minimizing the classification error is the primary task of the proposed feature selection model, so the classification error is defined as the fitness value in this experiment.

4.1.2. Comparison with Basic Feature Selection Algorithm

This subsubsection shows the convergence curves of feature selection algorithms including IBPSO, BPSO, GA, and BCSO for each dataset, as shown in Figure 3. Observing from the convergence curve in Figure 3, IBPSO has lower classification errors than other feature selection algorithms in all datasets when it reaches the defined convergence criterion. Especially in the BreastCancer, WaveformEW and Sonar datasets, IBPSO has a relatively poor initial population, but IBPSO can continue to find better solutions and eventually achieve the best classification error.
Table 3 shows the results of the average classification accuracy and the average number of selected features. IBPSO has the best average classification accuracy in all datasets. GA has the second-ranked average classification accuracy in all datasets. BPSO has competitive results of average classification accuracy with GA in datasets with smaller dimensions, such as BreastCancer, Wine, and CongressEW. This result shows that although BPSO has good exploitation capabilities, high-dimensional datasets such as WaveformEW and Sonar, GA with better exploration capabilities perform better than BPSO. In summary, the proposed hybrid algorithm IBPSO has good exploitation and exploration capabilities and achieves the best classification accuracy in this experiment.
In addition, because the results of the algorithm are different each run, this experiment uses the standard deviation to analyze the stability of the algorithm. Table 4 shows the standard deviation of the classification error of each algorithm. It can be seen from Table 4 that IBPSO has the best standard deviation of classification error in all datasets, showing that IBPSO has stronger stability than other algorithms. Table 5 shows the average calculation time of each algorithm. It can be clearly seen from Table 5 that GA is the worst computational cost algorithm. There is no significant difference in computational time between BPSO, BCSO and IBPSO. From this result, IBPSO can achieve higher classification accuracy without consuming extra computational costs.
In summary, case study 1 uses the public feature selection datasets to verify the superiority of IBPSO. The results in Table 3 and Table 4 show that based on the average classification accuracy and stability, the proposed method is better than the three comparison algorithms (BPSO, GA, BCSO). This shows that IBPSO enhances its exploitation and exploration capabilities during the iteration through its more efficient cycling time-varying inertia weights and crossover operators than traditional BPSO. Although IBPSO has no advantage in the average number of selected features, accurate classification of rolling element failures is the priority criterion of this study.

4.1.3. Comparison with State-of-the-Art Models

Table 6 shows the classification results that reference the same UCI dataset to evaluate the effectiveness of the proposed algorithm in the field of feature selection. This study cites four state-of-the-art feature selection models, and their brief descriptions are as follows: the continuous symbiotic organism search algorithm uses adaptive S-shaped transfer function to convert into a binary symbiosis organism search algorithm, named BSOS [49]; the basic PSO introduces two dynamic correction coefficient and spiral-shaped mechanisms to improve the position update formula of PSO, and uses the logic diagram sequence to enhance the diversity, named HPSO-SSM [50]; a binary butterfly optimization algorithm based on sigmoid transfer function can better converge to the optimal solution, named s-bBOA [51]; the grasshopper optimization algorithm combined with the mutation operator with linearly decreasing mutation rate enhances the exploration stage, named BGOA-M [52].
In Table 6, BSOS achieves the least number of selected features in BreastEW and Sonar, which is significantly reduced compared to other algorithms. This result is because the author emphasizes fewer selected features to achieve acceptable accuracy. HPSO-SSM achieves better performance in low-dimensional datasets, such as BreastCancer, Wine and CongressEW. Especially in BreastCancer and Wine, HPSO-SSM achieves the highest classification accuracy and the least number of selected features. This result is because the spiral-shaped position update mechanism introduced by HPSO-SSM can exploitation the search area more intensively, provide more solutions and maximize the exploitation capability of the PSO algorithm. However, in high-dimensional datasets, too much exploitation may lead to local optima. s-bBOA generally performs well in the selected datasets. This result is because the adaptive mechanism in s-bBOA plays a role in balancing exploration and exploitation. s-bBOA achieves the second highest classification accuracy among SpectEW and BreastEW. However, in datasets with many local solutions, such as CongressEW, Ionosphere, WaveformEW and Sonar, the classification accuracy of s-bBOA is not well and cannot escape the local solution well. BGOA-M has better classification accuracy in low-dimensional datasets, especially in CongressEW. BGOA-M achieves the highest classification accuracy. However, in higher-dimensional datasets, such as WaveformEW and Sonar, the classification accuracy of BGOA-M is poor. This result shows that the exploration capability of mutation operators with a linearly decreasing mutation rate is not enough to solve high-dimensional datasets. The proposed algorithm IBPSO performs best in classification accuracy, especially in medium and high-dimensional datasets including SpectEW, BreastEW, Ionosphere, krvskp, WaveformEW and Sonar, achieving the highest classification accuracy. In addition, to evaluate the stability of the algorithm, Table 6 also compares the standard deviation of the classification accuracy of the algorithm. Based on the standard deviation of classification accuracy, BGOA-M and IBPSO performed best, achieving the best or second-ranked standard deviation of the classification accuracy in all datasets.

4.2. Experiment 2: CWRU Bearing Dataset

In this experiment, a bearing failure dataset was used to evaluate the robustness of the diagnostic model. In this experiment, fully connected neural network was used and hyperparameters were adjusted to set a total of five fully connected neural networks to select a robust fault diagnosis model. In the experiment, white Gaussian noise was also added to simulate various noise environments, making the experimental results more in line with the real working environment. Finally, we compared the robustness of the proposed bearing fault diagnosis model with the state-of-the-art model.

4.2.1. Experiment Setup and Parameter Setting

This experiment used a bearing failure dataset from Case Western Reserve University (CWRU) [48]. The source of the vibration signal is an accelerometer set at the driving end of the test motor, collected at a sampling frequency of 12 kHz. The test motor was tested under a load of 2 horsepower and the speed was 1750 rpm. Bearing defects were completed using electrical discharge machining. This experiment used 10 different rolling bearing faults, including normal bearings, inner ring faults (0.007-, 0.014- and 0.021-inch fault depth), outer ring faults (0.007-, 0.014- and 0.021-inch fault depth) and ball defects (0.007-, 0.014- and 0.021-inch fault depth). Table 7 shows the description of the dataset. Table 8 shows the detailed information of the fully connected neural network used in this experiment. The fully connected neural network set 5-fold cross-validation to effectively measure the generalization ability of the model to new data. A total of three kind of layer sizes of single fully connected layer neural networks were set up, and the layer sizes of the FCNN-D and FCNN-E fully connected neural networks were both 10. In addition, setting the penalty term lower can solve overfitting, and standardizing can converge data to optimize neural network performance.

4.2.2. Comparison in Feature Selection Stage

In this study, not only the UCI feature selection datasets, but also the bearing failure dataset were used to evaluate the effectiveness of the proposed feature selection algorithm. In addition, in this experiment, BPSO, GA and BCSO were also used for comparison. Figure 4 shows the convergence curves of the four algorithms. It can be seen from Figure 4 that the proposed algorithm has excellent performance, converging to the best solution at the 29th iteration. In contrast, BPSO, GA and BCSO converged at the 59th, 64th and 77th iterations, respectively, and did not converge to the optimal solution. According to Table 9, the proposed algorithm achieves the best performance in the average fitness value, the average selected feature number and the standard deviation of the average fitness value. Especially in the result of average fitness value, the proposed algorithm has a significant improvement compared with the compared algorithms. Table 10 shows the detailed information of the best feature subsets obtained by each algorithm. According to Table 10, the proposed algorithm achieves the optimal feature subset with the least number of selected features. The above results show that the proposed algorithm has better exploitation and exploration capabilities, and better converges to the best solution.

4.2.3. Comparison in Classification Stage

In the classification stage, five fully connected neural networks were used to classify the optimal feature subset based on Table 10 and the original feature set obtained in the feature extraction stage. In addition, this experiment also evaluated the performance of the proposed fault diagnosis model in a noisy environment. In this experiment, Gaussian white noise was added to the original vibration signal to simulate the real environment. Equation (13) defines the signal to noise ratio (SNR).
S N R = 10 log 10 ( P signal P noise )
where Psignal and Pnoise are the power of signal and the power of noise, respectively. In this experiment, the SNR values were set to 20, 15, 10, 5 and 0 dB. In order to make the experimental results fair, the average classification accuracy rate is obtained after 50 trainings. Table 11, Table 12, Table 13, Table 14 and Table 15 show the classification results of each fully connected neural network under different noise levels. It can be seen from each table that after feature selection eliminates redundant features, the performance of the fault diagnosis model can be further improved, and the feature subset obtained by the proposed feature selection algorithm has the best classification accuracy. Based on classification accuracy, the three types of single fully connected layer neural networks have better performance, while FCNN-D was fourth and FCNN-E has the worst performance. The FCNN-A is the worst among the three types of single fully connected layer neural networks. There are competing results between FCNN-B and FCNN-D. The classification accuracy under each noise level is very similar, but the FCNN-B is still slightly higher, except that when SNR = 0 dB, the two classifiers achieve the same classification accuracy, which is 96.56%. In summary, according to the above analysis, the FCNN-B is highly efficient in this experiment and has high robustness for bearing fault detection.

4.2.4. Comparison with State-of-the-Art Bearing Fault Diagnosis Models

Table 16 and Table 17 show the comparison for the classification accuracy of the proposed fault diagnosis model and the most advanced bearing diagnosis model. Among them, Table 16 and Table 17 are the classification accuracy rate (SNR = dB) under the normal environment and the classification accuracy rate under the noise environment, respectively. The diagnostic models in the table all use the same data set (CWRU bearing fault dataset) for fair comparison, and the classification results are all cited from their papers. Some brief descriptions of state-of-the-art fault diagnosis models are as follows: C. Grover and N. Turk [53] proposed a fault diagnosis model that combines Hjorth parameters to extract features from intrinsic mode functions (IMFs) and rule-based machine learning. M. Zhao et al. [54] proposed a new supervised dimensionality reduction method to extract features and improved the iterative trace ratio method to solve the trace ratio problem in linear discriminant analysis for fault classification. X. Zhanga and J. Zhoub [55] proposed a bearing fault diagnosis model combining ensemble empirical mode decomposition (EEMD) and optimized support vector machine by inter-cluster distance (ICD). X. Zhang et al. [41] proposed a hybrid fault diagnosis model, which uses permutation entropy (PE) to extract feature vectors from intrinsic mode functions and supports vector machines optimized by ICD for classification.
Some brief descriptions of state-of-the-art fault diagnosis models which are tested under the noise environment are as follows: Y. Zhang et al. [56] integrated 15 ensembles deep shrinkage autoencoders (EDCAE) and combined these EDCAEs through a combination strategy. The proposed fault diagnosis method can effectively diagnose the fault even in the noise environment. H. Wenyi et al. [57] used filters of different scales to obtain the diversity resolution expression of the signal in the frequency domain, and enhanced the classification information of the input, and proposed an improved CNN called multi-scale cascaded convolutional neural network (MC-CNN). S. Ma et al. [58] proposed an end-to-end deep learning model based on wavelet packet transform. The experimental results show that this model is highly efficient and has excellent anti-noise ability. X. Yu et al. [59] proposed a combination of window marginal spectrum clustering (WMSC) and Hilbert–Huang transform (HHT) feature extraction technology and SVM, named HHT-WMSC-SVM.
The comparison results of classification accuracy under normal conditions are shown in Table 16. Ref. [41] achieves 100% classification accuracy, Ref. [55] achieves 99.33% classification accuracy and ranks second, and the proposed model achieves 98.05% classification accuracy and ranks third. The above results show that SVM has excellent classification performance under normal condition. However, Table 17 shows the classification results of each model in a strong noise environment, and the proposed model shows excellent anti-noise capability. When the SNR value is reduced from 20 dB to 0 dB, the classification accuracy of the proposed model is only reduced by 0.97%. In contrast, when the SNR value from 10 dB to 0 dB, the classification accuracy was reduced by 31.1% in Ref. [56]. When the SNR value is reduced from 6 dB to 0 dB, the classification accuracy is reduced by 8.28% in Ref. [57]. When the SNR value is reduced from 10 dB to 0 dB, the classification accuracy is reduced by 2.98% in Ref. [58]. When the SNR value is reduced from 15 dB to 0 dB, the classification accuracy is reduced by 4.64% in Ref. [59]. In addition, when the SNR value is 0 dB, the proposed model achieves a classification accuracy of 96.56%, which is better than the 96.35% classification accuracy of the deep learning model proposed by Ref. [57]. In summary, the above analysis results show that the proposed fault diagnosis model achieves high accuracy and excellent noise immunity.

5. Discussion

Based on the reason for the highest incidence of rolling element failures, this research proposes an efficient rolling element failure diagnosis model. To verify the effectiveness of the proposed model, two public data sets were used, namely, UCI feature selection dataset and CWRU bearing fault dataset for fair comparison with other state-of-the-art methods. Based on the experimental results in the fourth subsection, the main contributions of this research can be divided into the following two points.
(1) The proposed fault diagnosis model can be applied to a strong noise environment: Although local mean decomposition can effectively deal with non-stationary signals, it can extract the time-frequency domain information from the signal. However, this method is sensitive to noise and generates redundant PF components. Therefore, this study proposes a feature extraction technique that combines local mean decomposition with PF selection and WPD. Three fault features are introduced in PF selection, because these features perform well in early faults or severe faults, so the weight value is considered to balance the contribution of each fault feature. Benefitting from the wavelet packet denoising technology that maintains high resolution at both low and high frequencies, high-frequency noise is removed and fault information is further extracted.
In addition, feature selection technology can improve the performance of fault diagnosis models further, remove redundant features, improve classification accuracy and reduce computational costs. Based on the above criteria, the proposed feature selection algorithm IBPSO performs best among the comparison algorithms (BPSO, GA, BCSO). Table 12 shows the classification results of the best feature subsets obtained by each feature selection algorithm. It can be seen from Table 12 that IBPSO removes 75% of the redundant features and achieves the best classification accuracy under each noise level. Moreover, after increasing the noise level, there was a reduction in the classification accuracy of the diagnostic model. Based on the classification results of the original feature set, the classification accuracy rate drops from 95.18% to 84.11% when the SNR value rises from dB to 0 dB. Based on the classification result of the best feature subset obtained by IBPSO, the classification accuracy rate drops from 98.05% to 96.56% when the SNR value rises from dB to 0 dB. Based on the above analysis results, IBPSO can select the most important features and significantly improve the performance of the fault diagnosis model.
(2) Determine the appropriate number and size of fully connected layers: In neural networks, the most difficult task is to determine the appropriate number of fully connected layers and the number of neurons. A deeper number of layers may cause overfitting problems, increase the difficulty of training and make it hard for the model to converge. Using too few neurons will result in underfitting. Conversely, using too many neurons will also lead to overfitting. When the amount of information contained in the training set is not enough to train all the neurons in the fully connected layer, it will lead to overfitting. Therefore, it is important to select the appropriate number of fully connected layers and neurons. In this study, five types of layer numbers and sizes are set, including FCNN-A (layer sizes: 10), FCNN-B (layer sizes: 25), FCNN-C (layer sizes: 100), FCNN-D (layer sizes: [10, 10]) and FCNN-E (layer sizes: [10, 10, 10]). Table 11, Table 12, Table 13, Table 14 and Table 15 show the classification results of each classifier. The experimental results from Table 14 and Table 15 show that the classification accuracy of deeper layers is worse than that of single layer, and the model is difficult to converge. Therefore, the fault diagnosis model is more suitable for using a single fully connected layer neural network. Based on the experimental results from Table 11, Table 12 and Table 13, the performance of the FCNN-B is better than narrow neural network. This result shows that increasing the number of neurons can improve classification performance. However, increasing the number of too many neurons does not help improve the classification performance. In Table 12 and Table 13, the classification results of FCNN-B and FCNN-C are almost the same. This result shows that too many neurons are a waste of computational cost and do not help to improve the classification performance.
In addition to the above advantages, the proposed model still has the following shortcomings.
(1) The types of features selected by the fault diagnosis model are highly dependent on the knowledge of the engineer, and the quality of the features determines the accuracy of the fault diagnosis model. This also affects the versatility of the fault diagnosis model. Engineers choose appropriate features for different types of faults based on prior knowledge. Therefore, automatic feature extraction technology should be considered in the future.
(2) The computational time of the algorithm. In Table 5, the computational time of the proposed algorithm performs poorly in low-dimensional or less local optimum datasets. However, in high-dimensional or more local optimum datasets, the proposed algorithm performs better than other algorithms. This result shows that although the crossover operator mechanism has powerful exploration capabilities, the low-dimensional or less local optimum datasets usually require strong exploitation capabilities to approach the global optimal. Therefore, it is necessary to further study the algorithm to reduce the computational time.
(3) Optimization of the computational complexity of the fault diagnosis model. This study only discusses the influence of the number of fully connected neural network layers and the number of neurons on the classification accuracy of fault diagnosis model but does not mention the computational complexity of the diagnosis model. Some methods may reduce the computational complexity of a fully connected neural network, such as sparsity [60] or improvement of neural network architecture and parameters.

6. Conclusions

This paper proposes an improved binary particle swarm optimization algorithm for the feature selection task of rolling element fault classification model, which is accurate and robust. The proposed model uses effective and anti-noise feature extraction technology. The process is to decompose the vibration signal by LMD and combine the noise reduction technology of PF selection and wavelet packet decomposition. The improved binary particle swarm optimization can remove redundant features and improve classification accuracy. The CWRU bearing fault dataset was used to evaluate the proposed diagnostic model. The IBPSO removes 75% of the redundant features from the original feature set (64 features) and selects the 16 most important features. By adjusting the hyperparameters, five fully connected neural networks are applied. The FCNN-B achieves the best classification results, reaching a classification accuracy of 98.05%. In addition, the proposed diagnostic model was compared for anti-noise capability, and the result is better than the state-of-the-art diagnostic model. When the SNR value is reduced from 20 dB to 0 dB, the classification accuracy of the proposed model is only reduced by 0.97%. However, as mentioned in the discussion section, the proposed model still has the following limitations: (1) the selection of feature types depends on prior knowledge, which affects the classification accuracy and generalization of the fault diagnosis model, and (2) computational complexity. Therefore, it is necessary to further study the automatic feature extraction technology and the improvement of neural network architecture and parameters.

Author Contributions

Methodology, C.-Y.L. and G.-L.Z.; visualization, C.-Y.L. and G.-L.Z.; software, C.-Y.L.; data curation, G.-L.Z.; writing—original draft preparation, G.-L.Z.; writing—review and editing, C.-Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lei, Y.; Yang, B.; Jiang, X.; Jia, F.; Li, N.; Nandi, A.K. Applications of machine learning to machine fault diagnosis: A review and roadmap. Mech. Syst. Signal. Process. 2020, 138, 106587. [Google Scholar] [CrossRef]
  2. Van, M.; Kang, H.J. Bearing-fault diagnosis using non-local means algorithm and empirical mode decomposition-based feature extraction and two-stage feature selection. Sci. Meas. Technol. 2015, 9, 671–680. [Google Scholar] [CrossRef]
  3. Alvarez-Gonzalez, F.; Griffo, A.; Wang, B. Permanent magnet synchronous machine stator windings fault detection by Hilbert–Huang transform. J. Eng. 2019, 2019, 3505–3509. [Google Scholar] [CrossRef]
  4. Haroun, S.; Seghir, A.N.; Touati, S. Multiple features extraction and selection for detection and classification of stator winding faults. IET Electr. Power Appl. 2018, 12, 339–346. [Google Scholar] [CrossRef]
  5. Kumar, S.; Mukherjee, D.; Guchhait, P.K.; Banerjee, R.; Srivastava, A.K.; Vishwakarma, D.N.; Saket, R.K. A comprehensive review of condition based prognostic maintenance (CBPM) for induction motor. IEEE Access 2019, 7, 90690–90704. [Google Scholar] [CrossRef]
  6. Hamadache, M.; Lee, D.; Veluvolu, K.C. Rotor speed-based bearing fault diagnosis (RSB-BFD) under variable speed and constant load. IEEE Trans. Ind. Electron. 2015, 62, 6486–6495. [Google Scholar] [CrossRef]
  7. Wang, Z.; Zhang, Q.; Xiong, J.; Xiao, M.; Sun, G.; He, J. Fault diagnosis of a rolling bearing using wavelet packet denoising and random forests. IEEE Sens. J. 2017, 17, 5581–5588. [Google Scholar] [CrossRef]
  8. Shao, Y.; Kang, R.; Liu, J. Rolling Bearing Fault Diagnosis Based on the Coherent Demodulation Model. IEEE Access 2020, 8, 207659–207671. [Google Scholar] [CrossRef]
  9. Huo, Z.; Zhang, Y.; Francq, P.; Shu, L.; Huang, J. Incipient fault diagnosis of roller bearing using optimized wavelet transform based multi-speed vibration signatures. IEEE Access 2017, 5, 19442–19456. [Google Scholar] [CrossRef] [Green Version]
  10. Wei, S.; Wang, D.; Wang, H.; Peng, Z. Time-Varying Envelope Filtering for Exhibiting Space Bearing Cage Fault Features. IEEE Trans. Instrum. Meas. 2021, 70, 3504313. [Google Scholar] [CrossRef]
  11. Kang, M.; Islam, M.R.; Kim, J.; Kim, J.-M.; Pecht, M. A hybrid feature selection scheme for reducing diagnostic performance deterioration caused by outliers in data-driven diagnostics. IEEE Trans. Ind. Electron. 2016, 63, 3299–3310. [Google Scholar] [CrossRef]
  12. Yiyuan, G.; Dejie, Y.; Haojiang, W. Fault diagnosis of rolling bearings using weighted horizontal visibility graph and graph Fourier transform. Measurement 2020, 149, 107036. [Google Scholar]
  13. Gao, M.; Yu, G.; Wang, T. Impulsive gear fault diagnosis using adaptive morlet wavelet filter based on alpha-stable distribution and kurtogram. IEEE Access 2019, 7, 72283–72296. [Google Scholar] [CrossRef]
  14. Ye, X.; Hu, Y.; Shen, J.; Feng, R.; Zhai, G. An improved empirical mode decomposition based on adaptive weighted rational quartic spline for rolling bearing fault diagnosis. IEEE Access 2020, 8, 123813–123827. [Google Scholar] [CrossRef]
  15. Minhas, A.S.; Singh, G.; Singh, J.; Kankar, P.K.; Singh, S. A novel method to classify bearing faults by integrating standard deviation to refined composite multi-scale fuzzy entropy. Measurement 2020, 154, 107441. [Google Scholar] [CrossRef]
  16. Smith, J.S. The local mean decomposition and its application to EEG perception data. J. R. Soc. Inter. 2005, 2, 443–454. [Google Scholar] [CrossRef]
  17. Yu, J.; Lv, J. Weak fault feature extraction of rolling bearings using local mean decomposition-based multilayer hybrid denoising. IEEE Trans. Instrum. Meas. 2017, 66, 3148–3159. [Google Scholar] [CrossRef]
  18. McFadden, P. Detecting fatigue cracks in gears by amplitude and phase demodulation of the meshing vibration. J. Vib. Acoust. Stress Reliab. Des. 1986, 108, 165–170. [Google Scholar] [CrossRef]
  19. Radcliff, G.A. Condition Monitoring of Rolling Element Bearings Using the Enveloping Technique. In IMechE Paper, Solid Mechanics and Machine Systems Group Seminar; IMechE: London, UK, 1990; pp. 55–65. [Google Scholar]
  20. Khanam, S.; Dutt, J.K.; Tandon, N. Extracting rolling element bearing faults from noisy vibration signal using Kalman filter. J. Vib. Acoust. Trans. ASME 2014, 136, 11. [Google Scholar] [CrossRef]
  21. Wang, Y.; Liang, M. An adaptive SK technique and its application for fault detection of rolling element bearings. Mech. Syst. Signal. Process. 2011, 25, 1750–1764. [Google Scholar] [CrossRef]
  22. Wang, Y.; Xu, G.; Liang, L.; Jiang, K. Detection of weak transient signals based on wavelet packet transform and manifold learning for rolling element bearing fault diagnosis. Mech. Syst. Signal. Process. 2015, 54–55, 259–276. [Google Scholar] [CrossRef]
  23. Sun, J.; Xiao, Q.; Wen, J.; Wang, F. Natural gas pipeline small leakage feature extraction and recognition based on LMD envelope spectrum entropy and SVM. Measurement 2014, 55, 434–443. [Google Scholar] [CrossRef]
  24. Souza, F.A.; Araújo, R.; Mendes, J. Review of soft sensor methods for regression applications. Chemom. Intell. Lab. Syst. 2016, 152, 69–79. [Google Scholar] [CrossRef]
  25. Kennedy, J.; Eberhart, R.C. A Discrete Binary Version of the Particle Swarm Algorithm. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Orlando, FL, USA, 12–15 October 1997; pp. 4104–4108. [Google Scholar]
  26. Holland, J.H. Genetic algorithms. Sci. Am. 1992, 267, 66–72. [Google Scholar] [CrossRef]
  27. Hafez, A.I.; Zawbaa, H.M.; Emary, E.; Mahmoud, H.A.; Hassanien, A.E. An Innovative Approach for Feature Selection Based on Chicken Swarm Optimization. In Proceedings of the 2015 7th International Conference of Soft Computing and Pattern Recognition (SoCPaR), Fukuoka, Japan, 13–15 November 2015; pp. 19–24. [Google Scholar]
  28. Karim, A.A.; Isa, N.A.M.; Lim, W.H. Modified particle swarm optimization with effective guides. IEEE Access 2020, 8, 188699–188725. [Google Scholar] [CrossRef]
  29. Liang, X.; Kou, D.; Wen, L. An improved chicken swarm optimization algorithm and its application in robot path planning. IEEE Access 2020, 8, 49543–49550. [Google Scholar] [CrossRef]
  30. Ooi, C.S.; Lim, M.H.; Leong, M.S. Self-Tune Linear Adaptive-Genetic Algorithm for Feature Selection. IEEE Access 2019, 7, 138211–138232. [Google Scholar] [CrossRef]
  31. Chang, W.D.; Shih, S.P. PID controller design of nonlinear systems using an improved particle swarm optimization approach. Commun. Nonlinear Sci. Numer. Simul. 2010, 15, 3632–3639. [Google Scholar] [CrossRef]
  32. Cacciola, M.; Calcagno, S.; Morabito, F.C.; Versaci, M. Swarm optimization for imaging of corrosion by impedance measurements in Eddy current test. IEEE Trans. Magn. 2007, 43, 1853–1856. [Google Scholar] [CrossRef]
  33. Xue, B.; Zhang, M.; Browne, W.N.; Yao, X. A Survey on Evolutionary Computation Approaches to Feature Selection. IEEE Trans. Evol. Comput. 2016, 20, 606–626. [Google Scholar] [CrossRef] [Green Version]
  34. Lee, J.H.; Kim, J.W.; Song, J.Y.; Kim, Y.J.; Jung, S.Y. A novel memetic algorithm using modified particle swarm optimization and mesh adaptive DIRECT search for PMSM design. IEEE Trans. Magn. 2016, 52, 7001604. [Google Scholar] [CrossRef]
  35. Lee, C.-Y.; Le, T.-A. Intelligence bearing fault diagnosis model using multiple feature extraction and binary particle swarm optimization with extended memory. IEEE Access 2020, 8, 198343–198356. [Google Scholar] [CrossRef]
  36. Long, J.; Zhang, S.; Li, C. Evolving deep echo state networks for intelligent fault diagnosis. IEEE Trans. Ind. Inform. 2020, 16, 4928–4937. [Google Scholar] [CrossRef]
  37. Lee, C.-Y.; Le, T.-A. An Enhanced Binary Particle Swarm Optimization for Optimal Feature Selection in Bearing Fault Diagnosis of Electrical Machines. IEEE Access 2021, 9, 102671–102686. [Google Scholar] [CrossRef]
  38. Deshpande, G.; Wang, P.; Rangaprakash, D.; Wilamowski, B. Fully Connected Cascade Artificial Neural Network Architecture for Attention Deficit Hyperactivity Disorder Classification from Functional Magnetic Resonance Imaging Data. IEEE Trans. Cybern. 2015, 45, 2668–2679. [Google Scholar] [CrossRef] [PubMed]
  39. Zhang, W.; Li, C.H.; Peng, G.L.; Chen, Y.H.; Zhang, Z.J. A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load. Mech. Syst. Signal. Process. 2018, 100, 439–453. [Google Scholar] [CrossRef]
  40. Lee, C.-Y.; Zhuo, G.-L. Effective Rotor Fault Diagnosis Model Using Multilayer Signal Analysis and Hybrid Genetic Binary Chicken Swarm Optimization. Symmetry 2021, 13, 487. [Google Scholar] [CrossRef]
  41. Zhang, X.; Liang, Y.; Zhou, J.; Zang, Y. A novel bearing fault diagnosis model integrated permutation entropy, ensemble empirical mode decomposition and optimized SVM. Measurement 2015, 69, 164–179. [Google Scholar] [CrossRef]
  42. Hussain, S.; Mokhtar, M.; Howe, J.M. Sensor Failure Detection, Identification, and Accommodation Using Fully Connected Cascade Neural Network. IEEE Trans. Ind. Electron. 2014, 62, 1683–1692. [Google Scholar] [CrossRef]
  43. Meng, L.; Xiang, J.; Wang, Y.; Jiang, Y.; Gao, H. A hybrid fault diagnosis method using morphological filter–translation invariant wavelet and improved ensemble empirical mode decomposition. Mech. Syst. Signal. Process. 2015, 50–51, 101–115. [Google Scholar] [CrossRef]
  44. Lei, Y.; Li, N.; Lin, J. A new method based on stochastic process models for machine remaining useful life prediction. IEEE Trans. Instrum. Meas. 2016, 65, 2671–2684. [Google Scholar] [CrossRef]
  45. Xu, F.; Song, X.; Tsui, K.; Yang, F.; Huang, Z. Bearing performance degradation assessment based on ensemble empirical mode decomposition and affinity propagation clustering. IEEE Access 2019, 7, 54623–54637. [Google Scholar] [CrossRef]
  46. Askari, Q.; Saeed, M.; Younas, I. Heap-based optimizer inspired by corporate rank hierarchy for global optimization. Expert Syst. Appl. 2020, 161, 113702. [Google Scholar] [CrossRef]
  47. UCI Machine Learning Repository. Available online: http://archive.ics.uci.edu/ml (accessed on 22 May 2021).
  48. Case Western Reserve University Bearing Data Center. Available online: http://csegroups.case.edu/bearingdatacenter/home (accessed on 12 June 2021).
  49. Han, C.; Zhou, G.; Zhou, Y. Binary Symbiotic Organism Search Algorithm for Feature Selection and Analysis. IEEE Access 2019, 7, 166833–166859. [Google Scholar] [CrossRef]
  50. Chen, K.; Zhou, F.; Yuan, X. Hybrid particle swarm optimization with spiral-shaped mechanism for feature selection. Expert Syst. Appl. 2019, 128, 140–156. [Google Scholar] [CrossRef]
  51. Arora, S.; Anand, P. Binary butterfly optimization approaches for feature selection. Expert Syst. Appl. 2019, 116, 147–160. [Google Scholar] [CrossRef]
  52. Mafarja, M.; Aljarah, I.; Faris, H.; Hammouri, A.I.; Al-Zoubi, A.M.; Mirjalili, S. Binary grasshopper optimisation algorithm approaches for feature selection problems. Expert Syst. Appl. 2019, 117, 267–286. [Google Scholar] [CrossRef]
  53. Grover, C.; Turk, N. Rolling element bearing fault diagnosis using empirical mode decomposition and hjorth parameters. Procedia Comput. Sci. 2020, 167, 1484–1494. [Google Scholar] [CrossRef]
  54. Zhao, M.; Jin, X.; Zhang, Z.; Li, B. Fault diagnosis of rolling element bearings via discriminative subspace learning: Visualization and classification. Expert Syst. Appl. 2014, 41, 3391–3401. [Google Scholar] [CrossRef]
  55. Zhang, X.; Zhou, J. Multi-fault diagnosis for rolling element bearings based on ensemble empirical mode decomposition and optimized support vector machines. Mech. Syst. Signal. Process. 2013, 41, 127–140. [Google Scholar] [CrossRef]
  56. Zhang, Y.; Li, X.; Gao, L.; Chen, W.; Li, P. Ensemble deep contractive auto-encoders for intelligent fault diagnosis of machines under noisy environment. Knowl. Based Syst. 2020, 48, 34–50. [Google Scholar] [CrossRef]
  57. Huang, W.; Cheng, J.; Yang, Y.; Guo, G. An improved deep convolutional neural network with multi-scale information for bearing fault diagnosis. Neurocomputing 2019, 359, 77–92. [Google Scholar] [CrossRef]
  58. Ma, S.; Cai, W.; Liu, W.; Shang, Z.; Liu, G. A Lighted Deep Convolutional Neural Network Based Fault Diagnosis of Rotating Machinery. Sensors 2019, 19, 2381. [Google Scholar] [CrossRef] [Green Version]
  59. Yu, X.; Ding, E.; Chen, C.; Liu, X.; Li, L. A Novel Characteristic Frequency Bands Extraction Method for Automatic Bearing Fault Diagnosis Based on Hilbert Huang Transform. Sensors 2015, 15, 27869–27893. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  60. Wang, Y.K.; Zhang, F.; Zhang, S.W. A new methodology for identifying arc fault by sparse representation and neural network. IEEE Trans. Instrum. Meas. 2018, 67, 2526–2537. [Google Scholar] [CrossRef]
Figure 1. Feature extraction process.
Figure 1. Feature extraction process.
Mathematics 09 02302 g001
Figure 2. The procedure of IBPSO.
Figure 2. The procedure of IBPSO.
Mathematics 09 02302 g002
Figure 3. Comparison between IBPSO based on convergence curves of UCI benchmark datasets. (a) BreastCancer, (b) Wine, (c) CongressEW, (d) SpectEW, (e) BreastEW, (f) Ionosphere, (g) krvskp, (h) WaveformEW, (i) Sonar.
Figure 3. Comparison between IBPSO based on convergence curves of UCI benchmark datasets. (a) BreastCancer, (b) Wine, (c) CongressEW, (d) SpectEW, (e) BreastEW, (f) Ionosphere, (g) krvskp, (h) WaveformEW, (i) Sonar.
Mathematics 09 02302 g003
Figure 4. Comparison between IBPSO based on convergence curves of CWRU bearing dataset.
Figure 4. Comparison between IBPSO based on convergence curves of CWRU bearing dataset.
Mathematics 09 02302 g004
Table 1. Description of UCI benchmark datasets used in this case study.
Table 1. Description of UCI benchmark datasets used in this case study.
DatasetsFeaturesInstancesClasses
BreastCancer106992
Wine131783
CongressEW164352
SpectEW222672
BreastEW305692
Ionosphere343512
krvskp3631962
WaveformEW4050003
Sonar602082
Table 2. Parameter setting for this experiment.
Table 2. Parameter setting for this experiment.
ParameterValue
Number of nearest neighbor of k-NN classifier1
k-fold cross-validation10
Number of solutions10
Maximum number of iterations T100
Independent runs30
Inertia weight w in BPSO[0.9, 0.4]
Acceleration c1 and c2 in BPSO2.05
Crossover rate in GA0.8
Mutation rate in GA and IBPSO0.01
Rooster parameter in BCSO0.15
Hen parameter in BCSO0.7
Mother parameter in BCSO0.5
C in IBPSO4
Table 3. Comparison between IBPSO based on classification accuracy and selected features in this experiment.
Table 3. Comparison between IBPSO based on classification accuracy and selected features in this experiment.
DatasetsBPSOGABCSOIBPSO
Avg Acc (%)Avg No.FAvg Acc (%)Avg No.FAvg Acc (%)Avg No.FAvg Acc (%)Avg No.F
BreastCancer97.195.8397.246.0697.15.5697.286.13
Wine98.726.3698.826.6398.26.4699.276.63
CongressEW96.163.9396.134.295.954.0396.33.86
SpectEW84.4912.485.3912.5383.7912.7685.7312.56
BreastEW96.8914.797.3815.9696.811697.4915.13
Ionosphere94.9314.6395.2115.894.214.9395.4315.1
krvskp97.7920.7398.2920.9697.1920.4698.4520.5
WaveformEW75.7618.577.9117.6374.9618.9678.1616.36
Sonar91.9530.1393.5731.291.4931.994.0432.7
Note: the algorithms that achieve better results are bold.
Table 4. Comparison between IBPSO based on standard deviation.
Table 4. Comparison between IBPSO based on standard deviation.
DatasetsBPSOGABCSOIBPSO
StdStdStdStd
BreastCancer1.79 ×   10 3 1.38 ×   10 3 2.12   ×   10 3 1.31   ×   10 3
Wine8.1   ×   10 3 1.08   ×   10 2 8.37   ×   10 3 5.89   ×   10 3
CongressEW3.74   ×   10 3 4.14   ×   10 3 3.65   ×   10 3 2.2   ×   10 3
SpectEW8.89   ×   10 3 9.38   ×   10 3 8.34   ×   10 3 7.59   ×   10 3
BreastEW3.38   ×   10 3 3.61   ×   10 3 2.97   ×   10 3 2.7   ×   10 3
Ionosphere7.86   ×   10 3 1.05   ×   10 2 6.36   ×   10 3 5.33   ×   10 3
krvskp2.45   ×   10 3 1.81   ×   10 3 4.3   ×   10 3 1.03   ×   10 3
WaveformEW7.56   ×   10 3 8.1   ×   10 3 9.36   ×   10 3 6.7   ×   10 3
Sonar7.78   ×   10 3 9.74   ×   10 3 8.58   ×   10 3 6.17   ×   10 3
Note: the algorithm that achieves better results is bold.
Table 5. Comparison between IBPSO based on average computational time.
Table 5. Comparison between IBPSO based on average computational time.
DatasetsBPSOGABCSOIBPSO
SecSecSecSec
BreastCancer108.45176.43109.33110.28
Wine98.05152.569798.37
CongressEW140.07216.17138.33140.46
SpectEW105.35153.9698.51103.99
BreastEW138.88237.47139.17138.08
Ionosphere140.7217.08134.29138.78
krvskp200.22322.99205.42229.92
WaveformEW235.62352.55237.4215.45
Sonar153.02213.39135.72132.75
Note: the algorithms that achieve better results are bold.
Table 6. Comparison between IBPSO and state-of-the-art models.
Table 6. Comparison between IBPSO and state-of-the-art models.
DatasetsAlgorithmsAvg Acc (%)stdAvg No.F
BreastCancerHPSO-SSM [50]98.032.25   ×   10 3 4
s-bBOA [51]96.866   ×   10 3 5.6
BGOA_M [52]97.4305
IBPSO97.281.31   ×   10 3 6.13
WineHPSO-SSM [50]99.388.91   ×   10 3 4.43
s-bBOA [51]98.435.6   ×   10 3 6.2
BGOA_M [52]98.8804.4
IBPSO99.275.89   ×   10 3 6.63
CongressEWHPSO-SSM [50]96.648.13   ×   10 3 2.97
s-bBOA [51]95.932   ×   10 2 6.4
BGOA_M [52]97.641.6   ×   10 3 5
IBPSO96.32.2   ×   10 3 3.87
SpectEWHPSO-SSM [50]79.922.61   ×   10 2 8.43
s-bBOA [51]84.631   ×   10 2 10.8
BGOA_M [52]82.617.6   ×   10 3 9.96
IBPSO85.737.59   ×   10 3 12.57
BreastEWBSOS [49]94.739   ×   10 3 5
HPSO-SSM [50]94.896.87   ×   10 3 6.76
s-bBOA [51]97.093   ×   10 3 16.8
BGOA_M [52]96.974   ×   10 3 12.5
IBPSO97.492.7   ×   10 3 15.13
IonosphereBSOS [49]901.2   ×   10 2 8
HPSO-SSM [50]92.571.62   ×   10 2 7.1
s-bBOA [51]90.71   ×   10 2 16.2
BGOA_M [52]94.587.3   ×   10 3 11.46
IBPSO95.435.33   ×   10 3 15.1
krvskpHPSO-SSM [50]96.377.11   ×   10 3 18.27
s-bBOA [51]96.63   ×   10 3 17.6
BGOA_M [52]97.363   ×   10 3 17.73
IBPSO98.451.03   ×   10 3 20.5
WaveformEWs-bBOA [51]74.291   ×   10 3 25
BGOA_M [52]75.116.4   ×   10 3 20.9
IBPSO78.176.7   ×   10 3 16.37
SonarBSOS [49]90.471.4   ×   10 2 19
s-bBOA [51]93.621   ×   10 3 32.8
BGOA_M [52]91.471.09   ×   10 2 26.8
IBPSO94.046.17   ×   10 3 32.7
Note: the algorithms that achieve better results are bold.
Table 7. Description of CWRU bearing dataset used in this case study.
Table 7. Description of CWRU bearing dataset used in this case study.
Fault TypeFault Depth
(Inches)
Dataset (2Hp)
Category Label
Healthy-1
Inner0.0072
0.0143
0.0214
Outer0.0075
0.0146
0.0217
Ball0.0078
0.0149
0.02110
Table 8. Parameter setting of five classifiers.
Table 8. Parameter setting of five classifiers.
ParameterFCNN-AFCNN-BFCNN-CFCNN-DFCNN-E
k-fold cross-validation55555
Layer sizes1025100[10, 10][10, 10, 10]
Activation functionReLUReLUReLUReLUReLU
Maximum number of training iterations10001000100010001000
Regularization penalty term00000
Standardize datatruetruetruetruetrue
Table 9. Comparison between IBPSO based on fitness value and number of selected features in this case study.
Table 9. Comparison between IBPSO based on fitness value and number of selected features in this case study.
DatasetBPSOGABCSOIBPSO
Avg FitnessAvg No.FStdAvg FitnessAvg No.FStdAvg FitnessAvg No.FStdAvg FitnessAvg No.FStd
2 Hp0.02125.736.4   ×   10 3 0.020729.671.4   ×   10 2 0.033929.99.7   ×   10 3 0.0067256.2   ×   10 3
Note: the algorithm that achieves better results is bold.
Table 10. Details of obtained optimal feature subset.
Table 10. Details of obtained optimal feature subset.
DatasetAlgorithmNo.FFeature Indicators (F)
2 HpBPSO262, 7, 8, 9, 10, 11, 12, 17, 18, 19, 20, 23, 24, 33, 37, 39, 40, 41, 43, 44, 45, 46, 47, 48, 49, 56.
GA251, 2, 3, 5, 6, 7, 8, 9, 10, 11, 14, 16, 17, 18, 19, 21, 36, 37, 38, 40, 43, 45, 49, 52, 56.
BCSO211, 5, 6, 7, 9, 10, 12, 13, 15, 17, 18, 19, 20, 36, 37, 41, 43, 44, 48, 50, 56.
IBPSO161, 2, 3, 5, 7, 10, 14, 15, 16, 18, 19, 20, 33, 41, 45, 47.
Table 11. Classification results using FCNN-A.
Table 11. Classification results using FCNN-A.
DatasetAlgorithmNo.FAvg Acc (%)
dB20 dB15 dB10 dB5 dB0 dB
2 HpWithout FS6494.4493.8793.6592.5290.3585.22
BPSO2695.8895.5195.294.8293.0889.83
GA2597.0196.8996.5996.4195.6493.21
BCSO2196.5596.596.3696.0195.2292.03
IBPSO1697.497.2497.1896.9396.2996.08
Note: the algorithm that achieves better results is bold.
Table 12. Classification results using FCNN-B.
Table 12. Classification results using FCNN-B.
DatasetAlgorithmNo.FAvg Acc (%)
dB20 dB15 dB10 dB5 dB0 dB
2 HpWithout FS6495.1894.1693.5592.389.7684.11
BPSO2695.8895.7595.5695.192.9889.65
GA2597.4297.1696.9696.4195.8593.64
BCSO2196.9496.7496.6896.5295.7292.38
IBPSO1698.0597.5397.4997.2796.6396.56
Note: the algorithm that achieves better results is bold.
Table 13. Classification results using FCNN-C.
Table 13. Classification results using FCNN-C.
DatasetAlgorithmNo.FAvg Acc (%)
dB20 dB15 dB10 dB5 dB0 dB
2 HpWithout FS6494.393.8893.592.7189.582.22
BPSO2696.1896.1195.0794.9892.6889.54
GA2597.397.1396.8596.7296.0393.5
BCSO2196.8996.7596.5996.4995.4891.78
IBPSO1697.9797.4397.2797.2496.6496.56
Note: the algorithm that achieves better results is bold.
Table 14. Classification results using FCNN-D.
Table 14. Classification results using FCNN-D.
DatasetAlgorithmNo.FAvg Acc (%)
dB20 dB15 dB10 dB5 dB0 dB
2 HpWithout FS6492.0891.6791.4690.6288.8586.1
BPSO2694.9794.8394.2993.8193.1289.82
GA2595.7995.6495.0194.9194.8892.97
BCSO2195.5995.595.2794.5893.5291.13
IBPSO1696.7596.3196.0195.995.4295.37
Note: the algorithm that achieves better results is bold.
Table 15. Classification results using FCNN-E.
Table 15. Classification results using FCNN-E.
DatasetAlgorithmNo.FAvg Acc (%)
dB20 dB15 dB10 dB5 dB0 dB
2 HpWithout FS6491.6191.5691.0290.3688.8585.07
BPSO2694.6593.7393.6592.890.7188.38
GA2594.9594.4294.393.7493.1491.63
BCSO2194.1694.0493.7892.9791.5788.55
IBPSO1695.1795.1394.6294.0993.592.58
Note: the model that achieves better results is bold.
Table 16. Comparison with published in literature for bearing fault diagnosis models in normal condition.
Table 16. Comparison with published in literature for bearing fault diagnosis models in normal condition.
Diagnosis ModelDataset (2 Hp)
Ref. [53]93.82
Ref. [54]95.8
Ref. [55]99.33
Ref. [41]100
Proposed work98.05
Note: the algorithm that achieves better results is bold.
Table 17. Comparison with published in literature for bearing fault diagnosis models in noisy condition.
Table 17. Comparison with published in literature for bearing fault diagnosis models in noisy condition.
Diagnosis ModelAvg Acc (%)
20 dB15 dB10 dB8 dB6 dB5 dB4 dB3 dB2 dB0 dB
Ref. [56]--95.48576.8-68.4-66.664.3
Ref. [57]----99.61-97.33-95.7191.33
Ref. [58]--99.33----98.63-96.35
Ref. [59]-99.64---98.57-95--
Proposed work97.5397.4997.27--96.63---96.56
Note: the models that achieve better results are bold.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Lee, C.-Y.; Zhuo, G.-L. Localization of Rolling Element Faults Using Improved Binary Particle Swarm Optimization Algorithm for Feature Selection Task. Mathematics 2021, 9, 2302. https://doi.org/10.3390/math9182302

AMA Style

Lee C-Y, Zhuo G-L. Localization of Rolling Element Faults Using Improved Binary Particle Swarm Optimization Algorithm for Feature Selection Task. Mathematics. 2021; 9(18):2302. https://doi.org/10.3390/math9182302

Chicago/Turabian Style

Lee, Chun-Yao, and Guang-Lin Zhuo. 2021. "Localization of Rolling Element Faults Using Improved Binary Particle Swarm Optimization Algorithm for Feature Selection Task" Mathematics 9, no. 18: 2302. https://doi.org/10.3390/math9182302

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop