1. Introduction
With the improvement of industrial automation levels, the development of rotating machinery is more precise than ever. Therefore, the monitoring and fault diagnosis methods of rotating machinery have always been the field that researchers are committed to developing [
1]. M. Van and H.J. Kang [
2] proposed a bearing fault diagnosis model. The model combines a new feature extraction technology based on non-local mean denoising and empirical mode decomposition (EMD), and a two-stage feature selection technology based on hybrid distance evaluation technology (DET) and particle swarm optimization (PSO). The model proved its effectiveness in bearing failure experiments. F. Alvarez-Gonzalez et al. [
3] proposed an online statistical analysis method based on Hilbert–Huang transform (HHT) to detect permanent magnet synchronous motor (PMSM) stator short-circuit faults and proved reliable fault detection through simulation results. S. Haroun et al. [
4] proposed multiple feature extraction techniques to detect stator winding faults of induction motors. First, the three-phase stator current is analyzed using Park transform, zero-crossing time signal and envelope. Then, the time domain and frequency domain statistical features are extracted from the analysis results. Experimental results show that the proposed method can detect stator winding faults and identify fault phases under various faulty cases and different load variations. The above results prove that fault diagnosis can greatly improve system reliability, reduce maintenance costs and even avoid major production losses caused by failures. In addition, in the statistics of the Electric Power Research Institute (EPRI), rolling element failures accounted for 41% of all rotating machinery failures, the highest proportion of failure types [
5]. Moreover, even early rolling element fault can quickly develop into serious fault [
6]. Therefore, this study focuses on constructing an efficient rolling element fault diagnostic model.
In recent years, with the advent of accelerometers, it is easy to measure vibration signals and generally provide a wide frequency range, so fault diagnosis models based on vibration signals have been widely proposed. Z. Wang et al. proposed an efficient and robust hybrid model, using wavelet packet decomposition (WPD) and mutual dimensionless indexing to extract the best features, and random forest for classification [
7]. Y. Shao et al. proposed a rolling element fault diagnosis model based on the principle of coherent demodulation. By extracting the feature frequencies of different fault types, the fault type can be accurately classified [
8]. Z. Huo et al. proposed a rolling element fault diagnosis model, which can be effectively applied to multi-speed environments. The model uses particle swarm optimization and quasi-Newton minimization algorithm to optimize the parameters of the continuous wavelet transform (WT) model. Then, it performs feature extraction in the 3-D feature space and the k-nearest neighbor (k-NN) classifier for fault classification [
9]. S. Wei et al. proposed a time-varying envelope filtering (TVEF) to extract the features of rolling element faults. Using the instantaneous frequency and instantaneous amplitude extracted by this method to reconstruct the high-resolution time-frequency distribution can more accurately extract the fault features [
10].
The fault diagnosis model based on vibration signals is usually divided into three stages: feature extraction, feature selection and fault classification. Among them, feature extraction and fault classification are key stages. The signal processing technology in feature extraction [
9] is an important step in reducing the dimensionality of vibration signals and extracting key fault messages. Due to the complex working environment of rotating machinery, the measured vibration signal contains non-stationary components and noise. Signal processing techniques based only on the time domain or frequency domain may not be effective. Therefore, some time-frequency analysis signal processing techniques such as fast Fourier transform (FFT), short-time Fourier transform (STFT) and continuous wavelet transform (CWT) are widely used. The classification of signal analysis results by neural network (NN) or machine learning (ML) is the final stage. Feature selection [
11] is an option of the model, and its function is to solve the diagnosis performance degradation caused by redundant or irrelevant fault features.
However, the above-mentioned time-frequency analysis signal processing techniques still have their own limitations. For example, FFT and STFT affect the decomposition performance due to the use of fixed-length windows [
12]. CWT solves the problems of FFT and STFT with an adjustable window size [
13]. However, once the decomposition scale of CWT is defined, CWT can only decompose signals in the defined frequency band, which makes CWT non-adaptive [
14]. Based on the above analysis, adaptive signal processing technology may be able to analyze vibration signals more effectively [
15]. Local mean decomposition (LMD) [
16] is a new adaptive signal processing technology and has many advantages that can be applied to rolling element fault diagnosis. First, the decomposition process of LMD does not need to use Hilbert transformation (HT), so it will not encounter negative frequencies [
17]. Secondly, LMD can decompose and demodulate signals at the same time. Third, LMD can decompose the amplitude and frequency modulation characteristics of the vibration signal when the rolling element fails [
18,
19].
However, in the actual working environment, the features of rolling element faults are usually masked by noise or other rotating machinery components’ disturbing vibrations [
20,
21], causing LMD to decompose redundant or irrelevant product function (PF) components. Therefore, this study used a denoising technique combining PF selection and WPD. First, the PF selection method removes redundant or irrelevant PF components and selects the most valuable PF components for further denoising. Then, WPD denoising technology can effectively remove noise and present fault information in wavelet packet coefficients [
22,
23]. Finally, the fault features are extracted from these wavelet packet coefficients.
Although the hard work in the feature extraction stage extracts the fault features in the original signal, there may still be redundant or irrelevant fault features in the feature set, resulting in a decrease in diagnostic performance [
11]. Therefore, feature selection is applied to prevent overfitting and improve model performance [
11]. Feature selection can be divided into filter methods and wrapper methods. The filter method mainly uses correlation coefficient (CC) or univariate mutual information (MI) to calculate the linear intensity between each input and the target, and sorts according to their intensity and removes irrelevant features [
24]. The wrapper method combined with a specific classifier for accurate evaluation can usually achieve better performance than the filter method [
11,
24]. Therefore, some optimization algorithms such as binary particle swarm optimization (BPSO) [
25], genetic algorithm (GA) [
26] and binary chicken swarm optimization (BCSO) [
27] are widely used in feature selection. However, the above algorithms generally have many defects, such as premature convergence [
28,
29] and falling into local optima [
28,
29,
30]. Although there is no optimization algorithm that can guarantee the best feature subset, PSO has successfully solved many nonlinear optimization problems in the engineering field due to its excellent computational efficiency and simple operation [
31,
32]. Therefore, PSO is still an optimization algorithm that many researchers are dedicated to researching [
33,
34,
35]. Therefore, this study proposes an improved binary particle swarm optimization (IBPSO) as the feature selection task of the fault diagnosis model. In this study, three mechanisms are proposed to improve the performance of PSO. First, cycling time-varying inertia weights are introduced to balance exploration and exploitation and enhance the capability to avoid local optima. Considering the crossover and mutation mechanism can improve the exploration and exploitation capabilities of PSO and solve the problem of premature convergence of PSO.
Fault classification is another important stage that constitutes a rolling element fault diagnosis model. Researchers have widely used NN and ML in the fault diagnosis of rolling elements [
36,
37]. Traditional NN such as multilayers perceptron (MLP) has the problem of a complex structure and difficult training process [
38]. ML has the advantage of being simple and easy to implement, and the classification results are better, especially the support vector machine (SVM) algorithm, which has many papers to prove its classification efficiency and anti-noise capability [
39,
40]. In recent years, a new type of NN, fully connected neural network (FCNN), achieves powerful performance through a new way of connecting neurons. FCNN has the following advantages: (1) The complexity of FCNN is like that of traditional single hidden layer NN, but the performance is very powerful [
38]. (2) The addition of too many neurons in the traditional single hidden layer NN leads to overfitting and poor generalization of the model [
41]. FCNN needs fewer neurons to achieve powerful performance and good generalization [
42]. (3) In [
42], the author proved that FCNN has excellent anti-noise capability and can complete classification under low signal-to-noise ratio (SNR). (4) In [
38], the author showed the excellent performance of FCNN, and its classification performance is better than SVM. Therefore, in this study, we adjusted the number of layers and the number of neurons and compared the performance of five FCNNs to establish the most robust rolling element fault diagnosis model. The advantages of the above feature extraction, feature selection and fault classification motivate us to propose a robust rolling element diagnosis model with both classification accuracy and anti-noise capability.
The organization of this paper is as follows:
Section 2 introduces the basic methods of the proposed model, including feature extraction process, binary particle swarm algorithm and fully connected neural network.
Section 3 introduces the detailed description of the improved binary particle swarm algorithm and the flow of the rolling element fault diagnosis model.
Section 4 discusses the experimental results of the University of California, Irvine (UCI) feature selection dataset and Case Western Reserve University (CWRU) rolling element failure dataset.
Section 5 evaluates the diagnostic model and future work. Finally,
Section 6 explains the conclusion.
5. Discussion
Based on the reason for the highest incidence of rolling element failures, this research proposes an efficient rolling element failure diagnosis model. To verify the effectiveness of the proposed model, two public data sets were used, namely, UCI feature selection dataset and CWRU bearing fault dataset for fair comparison with other state-of-the-art methods. Based on the experimental results in the fourth subsection, the main contributions of this research can be divided into the following two points.
(1) The proposed fault diagnosis model can be applied to a strong noise environment: Although local mean decomposition can effectively deal with non-stationary signals, it can extract the time-frequency domain information from the signal. However, this method is sensitive to noise and generates redundant PF components. Therefore, this study proposes a feature extraction technique that combines local mean decomposition with PF selection and WPD. Three fault features are introduced in PF selection, because these features perform well in early faults or severe faults, so the weight value is considered to balance the contribution of each fault feature. Benefitting from the wavelet packet denoising technology that maintains high resolution at both low and high frequencies, high-frequency noise is removed and fault information is further extracted.
In addition, feature selection technology can improve the performance of fault diagnosis models further, remove redundant features, improve classification accuracy and reduce computational costs. Based on the above criteria, the proposed feature selection algorithm IBPSO performs best among the comparison algorithms (BPSO, GA, BCSO).
Table 12 shows the classification results of the best feature subsets obtained by each feature selection algorithm. It can be seen from
Table 12 that IBPSO removes 75% of the redundant features and achieves the best classification accuracy under each noise level. Moreover, after increasing the noise level, there was a reduction in the classification accuracy of the diagnostic model. Based on the classification results of the original feature set, the classification accuracy rate drops from 95.18% to 84.11% when the SNR value rises from
dB to 0 dB. Based on the classification result of the best feature subset obtained by IBPSO, the classification accuracy rate drops from 98.05% to 96.56% when the SNR value rises from
dB to 0 dB. Based on the above analysis results, IBPSO can select the most important features and significantly improve the performance of the fault diagnosis model.
(2) Determine the appropriate number and size of fully connected layers: In neural networks, the most difficult task is to determine the appropriate number of fully connected layers and the number of neurons. A deeper number of layers may cause overfitting problems, increase the difficulty of training and make it hard for the model to converge. Using too few neurons will result in underfitting. Conversely, using too many neurons will also lead to overfitting. When the amount of information contained in the training set is not enough to train all the neurons in the fully connected layer, it will lead to overfitting. Therefore, it is important to select the appropriate number of fully connected layers and neurons. In this study, five types of layer numbers and sizes are set, including FCNN-A (layer sizes: 10), FCNN-B (layer sizes: 25), FCNN-C (layer sizes: 100), FCNN-D (layer sizes: [10, 10]) and FCNN-E (layer sizes: [10, 10, 10]).
Table 11,
Table 12,
Table 13,
Table 14 and
Table 15 show the classification results of each classifier. The experimental results from
Table 14 and
Table 15 show that the classification accuracy of deeper layers is worse than that of single layer, and the model is difficult to converge. Therefore, the fault diagnosis model is more suitable for using a single fully connected layer neural network. Based on the experimental results from
Table 11,
Table 12 and
Table 13, the performance of the FCNN-B is better than narrow neural network. This result shows that increasing the number of neurons can improve classification performance. However, increasing the number of too many neurons does not help improve the classification performance. In
Table 12 and
Table 13, the classification results of FCNN-B and FCNN-C are almost the same. This result shows that too many neurons are a waste of computational cost and do not help to improve the classification performance.
In addition to the above advantages, the proposed model still has the following shortcomings.
(1) The types of features selected by the fault diagnosis model are highly dependent on the knowledge of the engineer, and the quality of the features determines the accuracy of the fault diagnosis model. This also affects the versatility of the fault diagnosis model. Engineers choose appropriate features for different types of faults based on prior knowledge. Therefore, automatic feature extraction technology should be considered in the future.
(2) The computational time of the algorithm. In
Table 5, the computational time of the proposed algorithm performs poorly in low-dimensional or less local optimum datasets. However, in high-dimensional or more local optimum datasets, the proposed algorithm performs better than other algorithms. This result shows that although the crossover operator mechanism has powerful exploration capabilities, the low-dimensional or less local optimum datasets usually require strong exploitation capabilities to approach the global optimal. Therefore, it is necessary to further study the algorithm to reduce the computational time.
(3) Optimization of the computational complexity of the fault diagnosis model. This study only discusses the influence of the number of fully connected neural network layers and the number of neurons on the classification accuracy of fault diagnosis model but does not mention the computational complexity of the diagnosis model. Some methods may reduce the computational complexity of a fully connected neural network, such as sparsity [
60] or improvement of neural network architecture and parameters.