Bearing Fault Diagnosis Method Based on Improved VMD and Parallel Hybrid Neural Network

Chen, Wuyi; Cai, Huafeng; Sun, Qiu

doi:10.3390/app15084430

Open AccessArticle

Bearing Fault Diagnosis Method Based on Improved VMD and Parallel Hybrid Neural Network

by

Wuyi Chen

,

Huafeng Cai

^*

and

Qiu Sun

School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan 430068, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(8), 4430; https://doi.org/10.3390/app15084430

Submission received: 26 February 2025 / Revised: 7 April 2025 / Accepted: 16 April 2025 / Published: 17 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

In order to combat the difficulty of fault feature extraction and fault recognition in the field of bearing fault diagnosis, a bearing fault diagnosis method based on improved variational mode decomposition (VMD) and parallel hybrid neural network is proposed, which combines reweighted kurtosis (RK) with variable mode decomposition (VMD) and uses reweighted kurtosis as the evaluation index to select the decomposition times of variational mode decomposition, while removing part of the interference in the fault signal and retaining its impact characteristics. Afterwards, the processed fault data set is brought into a parallel hybrid neural network model with a global average pooling layer (GAP) for feature extraction, feature fusion, and fault classification. The parallel hybrid neural network model can extract fault signal features more comprehensively and improve the accuracy of fault diagnosis, while the global average pooling layer can speed up the training and testing. Experiments on the Xian Jiao tong University (XJTU) and Case Western Reserve University (CWRU) bearing public data sets show that the diagnosis accuracy reaches 99.72% and 99.73%, respectively, indicating that the method has good fault diagnosis accuracy and better diagnosis performance compared with other models.

Keywords:

global average pooling; fault diagnosis; variational mode decomposition; reweighted kurtosis; bidirectional gated recurrent unit; convolutional neural network

1. Introduction

Rotating machinery is widely used in industrial production. As a key component of rotating machinery, the failure of the rolling bearing will directly affect the normal operation of the whole equipment. But it is often in a continuous load working state, which is prone to failure [1]. The failure of the rolling bearing will cause the stopping of large machinery and cause great losses, which will put the staff at risk. Therefore, the research on fault diagnosis of the rolling bearing has great practical value.

A bearing vibration signal is non-stationary and non-linear, and there are many interference factors in the working environment, so it is difficult to analyze the signal [2]. For the traditional signal decomposition method, high-frequency noise will destroy the local characteristics of the signal, resulting in different frequency components mixed with the same intrinsic mode function, affecting the accuracy and aggravating mode aliasing. For the method of neural networks, noise may be mistaken for effective features by neural networks, resulting in the model overfitting the noise rather than the real signal mode, which then results in the deviation of feature learning. Traditional signal analysis methods include time domain analysis and frequency domain analysis. The time domain analysis method directly extracts various statistical indicators of the analysis signal. The frequency domain analysis method converts the signal to the frequency domain through the Fourier transform and uses the fault frequency for the analysis. These traditional analysis methods are still very useful for some simple signal analysis tasks, but with the continuous breakthrough of signal analysis technology, time–frequency analysis, a new signal analysis method, has entered people’s vision [3]. Common time–frequency analysis methods include wavelet transform (WT), short-time Fourier transform (STFT), empirical mode decomposition (EMD), and local mean decomposition (LMD). The principle of wavelet transformation is based on the multi-scale decomposition and reconstruction of signals, using a group of special functions called wavelet basis functions to represent the signal and extracting the time-domain and frequency-domain information of the signal by decomposing and reconstructing the signal at different scales. However, it is difficult to select the wavelet basis function. Selecting an inappropriate wavelet basis function may lead to inaccurate analysis results or the inability to meet specific needs. The requirement for the length of the data is high, and the boundary effect may occur at the boundary of the signal. El khadiri, K et al. [4] proposed an empirical wavelet transformation and Capon time–frequency analysis method for a denoising analysis of EMG and EEG signals. Using adaptive spectrum segmentation and filter design in the empirical wavelet transform, the shortcomings of the fixed basis function of the traditional wavelet transform are overcome, while the Capon time–frequency analysis method has significant advantages in improving the resolution and eliminating interference, and the combination of the two has achieved a good denoising effect. The STFT is a mathematical tool for joint analysis of signals in the time and frequency domains. It analyzes the frequency components while preserving the time locality by dividing the signal into frames and windows and performing the Fourier transform segment by segment. However, Fourier transform has the problem that time and frequency resolution cannot be obtained at the same time, and compared with wavelet transform, it uses a fixed-length window throughout the whole analysis process, which cannot adapt to the signal characteristics. EMD decomposes the signal to acquire a series of intrinsic mode functions (IMFs) which represent the time-varying characteristics of different scales in the signal. Meng DB et al. [5] used EMD for the feature extraction of the vibration signal and used a variety of classifiers to classify the fault state of the rolling bearing, which gained a kind effect. However, EMD has defects in the endpoint effect and mode overlap, resulting in inaccurate decomposition results [6]. On this basis, Torres et al. [7] proposed the ensemble empirical mode decomposition (EEMD) method. By introducing uniformly distributed white noise in the decomposition process, the noise of the signal itself is masked by several occasions of artificially added noise. At the same time, the decomposition results are averaged. The more frequent the average processing times are, the lower the impact of the noise on the decomposition, and then, the mode mixing problem of traditional EMD is solved. To improve the computational efficiency and residual noise problem of EEMD, complete ensemble empirical mode decomposition with adaptive noise (CEEMDWAN) adds white noise to the residual value and calculates the mean value of the IMF component every time the first-order IMF component is obtained in the decomposition process and iterates it step by step to achieve better modal decomposition results. On this basis, to further optimize the noise control and decomposition accuracy of ceemdan, Colominas Ma et al. [8] proposed the improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDWAN) method to improve the introduction of a more flexible adaptive noise strategy and reduce false modes and noise residue. LMD analysis is based on the local smoothness of the signal. Wang ZJ et al. [9] combined minimum entropy deconvolution (MED) with LMD, used MED as a filter to denoise the signal, and used the LMD method to decompose the processed signal, providing a solution for the extraction of weak characteristic signals with strong interference conditions. Compared with EMD, LMD does not have the problem of mode overlapping in the decomposition process, but it will be affected by the local smoothness of the signal. In the case of poor smoothness of the signal, the decomposition accuracy will decline. VMD overcomes the shortcomings of EMD, such as endpoint effect and pattern overlap, and has high decomposition efficiency. However, in the application of VMD, how to choose the decomposition times and penalty factor is a problem that needs to be solved. Liu Z et al. [10] used the whale algorithm to adaptively select parameters. Although appropriate parameter values can be obtained, many iterative calculations are required, and the calculation efficiency is low. Li h et al. [11] used kurtosis as the evaluation index and selected the reconstructed signal with the largest kurtosis value as the optimal decomposition, but it was affected by the strong impact of some signals, resulting in the instability of decomposition times.

The problem of fault feature extraction also needs to be solved. For subsequent fault diagnosis, representative features are required to be extracted from the original signal. From the point of view of signal processing, Robert B. Randall et al. [12] analyzed the specific characteristics of rolling element-bearing signals. From the point of view of cyclostationarity, the discrete/random separation (DRS) method was used for signal separation, spectral kurtosis was used to identify the frequency band with the most significant impulsive, and “envelope analysis” was used for the final diagnosis to achieve the results. Alternatively, Diletta Sacerdoti et al. [13] further used the cepstrum pre-whitening method for signal denoising and used cyclostationary technology, including square envelope spectrum and improved envelope spectrum, to correctly identify the faults of specific bearing components, and achieved good results. From the perspective of deep learning, the features in the signal can be automatically learned by constructing a multi-layer neural network to convert the original signal into a higher-level and more representative feature representation [14]. Zhu DC et al. [15] proposed a method combining envelope order spectrum and convolutional neural network. The envelope order spectrum extracted from the original signal contains a wealth of information about the order of fault characteristics. Subsequently, the CNN model is employed to extract these representative features, which can better identify the type of bearing defects. Shao y et al. [16] proposed a fault diagnosis method based on a one-dimensional convolutional neural network (1DCNN) and support vector machine (SVM). The improved 1DCNN is employed to extract the features of fault signals, and SVM is employed as the classifier for fault classification. The advantage of this is that 1DCNN has a stronger extraction ability for vibration signals. Chen HM et al. [17] combined the multi-scale CNN-LSTM module with the deep residual module, which has the advantage of stronger fault feature extraction ability, and the use of residual can improve the feature expression ability of the model.

From the research of the above scholars, fault signal analysis and machine learning models can effectively improve the accuracy of fault diagnosis, while the deep learning method in bearing fault diagnosis faces the limitations of data dependence, noise sensitivity, poor interpretability, and high computational cost. Based on this problem, this paper proposes a fault diagnosis method based on improved VMD and parallel hybrid neural network, using the sliding window overlapping sampling method to increase the number of samples, improving VMD to remove noise, and using the gap layer to reduce the amount of calculation in parallel hybrid neural network and using the confusion matrix graph to display the results. The main contributions of this paper are as follows:

(1): An improved VMD algorithm is proposed by combining VMD with reweighted kurtosis. Using the reweighted kurtosis value as the evaluation index, the influence of some strong shocks in the fault signal can be eliminated, and the accuracy of fault diagnosis can be improved.
(2): A parallel hybrid neural network model is proposed, which fuses the features extracted by the two parallel networks to achieve feature enhancement and complementarity, which can improve the classification accuracy of the diagnosis model.
(3): The proposed fault diagnosis method is compared with several latest bearing fault diagnosis models. The proposed method has certain advantages in the accuracy of fault diagnosis, which reflects the positives of the method.

2. Model Construction Principle

2.1. Variational Mode Decomposition

As a signal decomposition method, VMD can decompose the signal into multiple local band components and extract the local features of the signal [18]. The theoretical basis of VMD is to express the signal as the superposition of a series of modulated signals, in which each modulated signal has a central frequency and bandwidth range. VMD can decompose the original signal by iterating it to obtain the optimal modulation signal. The constrained variational model expression is as follows:

\{\begin{matrix} \underset{\{u_{k}\}, \{ω_{k}\}}{m i n} \{\sum_{k} {‖\partial_{t} [(δ (t) + \frac{j}{π t}) * u_{k} (t)] e^{- j ω_{k} t}‖}^{2}\} \\ s . t . \sum_{k} u_{k} (t) = f \end{matrix}

(1)

where

u_{k}

represents K IMF components obtained by decomposition;

ω_{k}

is the center frequency of each mode;

f

is the input signal.

For constrained variational problems, they are usually transformed into unconstrained variational problems and then solved. Here, the quadratic penalty term and Lagrange multiplier method are introduced to achieve this goal, and the alternating direction multiplier method is employed to find the global optimal point:

\begin{matrix} L (\{u_{k}\}, \{ω_{k}\}, λ) = α \{\sum_{k} {‖\partial_{t} [(δ (t) + \frac{j}{π t}) * u_{k} (t)] e^{- j w_{k} t}‖}_{2}^{2}\} + \\ {‖f (t) - \sum_{k} u_{k} (t)‖}_{2}^{2} + ⟨λ (t), f (t) - \sum_{k} u_{k} (t)⟩ \end{matrix}

(2)

where

α

is a penalty factor and

λ

is Lagrange multiplication operator.

2.2. Reweighted Kurtosis Method

Kurtosis is a statistic that describes the steepness or smoothness of the data distribution. It measures the relative size of the peak and tail of the data distribution and is an important indicator to describe the data distribution. With the gradual wear of rotating machinery and equipment, the impact pulse in the signal will increase, and the kurtosis will also increase [19]. Kurtosis is defined as the normalized fourth-order center distance, and the formula is as follows (3). Since the signal to be processed in this paper is a fault signal and its fault characteristics are mainly abnormal frequency and signal pulse, it is reasonable to take kurtosis as the evaluation index of the bearing fault signal, but at the same time, there is a problem with this: the calculation of the kurtosis value is very sensitive to outliers, which may be affected by the strong impact of some parts of the signal, resulting in a decline in accuracy. Therefore, this paper uses a reweighted kurtosis value instead of the kurtosis value as the evaluation index of the VMD reconstructed signal. Its advantage is that it can more effectively eliminate part of the interference while retaining the impact part of the fault signal [20,21]. The specific implementation steps are as follows:

First, the reconstructed signal is divided into n equal parts.

Secondly, the kurtosis value of each segment of the signal is calculated, respectively.

K = \frac{1}{N} \sum_{k = 1}^{N} {(\frac{x_{i} - \bar{x}}{σ})}^{4}

(3)

where

K u

,

\bar{x}

,

σ,

respectively, represent the kurtosis value, average value, and standard deviation of the signal.

Third, the calculated kurtosis values are arranged in ascending order and converted into row vectors.

Fourthly, the weight corresponding to the kurtosis value is calculated.

W_{i} = \frac{K_{i}}{\sum_{k = 1}^{N} K_{i}}

(4)

Fifthly, the calculated n weights are arranged in descending order and converted into row vectors.

R K = K \cdot (W)^{T}

(5)

2.3. Recurrent Neural Network

The recurrent neural network (RNN) can model and process sequence data [22]. Compared with the traditional feedforward neural network, there is a feedback connection between the neurons in the RNN, so that the RNN can remember and process the past input, to process the data with time sequence, sequence, or language nature [23].

As an improved model of the cyclic network, the long short-term memory (LSTM) network realizes sequential signal processing through three gating units: the input gate filters the effective input, the forgetting gate controls the inheritance ratio of historical information, and the output gate adjusts the external response strength [24]. This design effectively alleviates the problem of historical information attenuation in the traditional circular network. However, in the face of long-term continuous monitoring data, the model still faces the problems of gradient disappearance and a significant increase in computational load. In contrast, the gated recurrent unit (GRU) adopts the simplified design concept and integrates the two gate control units of the forgetting gate and the input gate into an update gate [25] on the premise of maintaining the core functions to reduce the number of model parameters. This optimization improves the operation speed of processing data sets of the same size and achieves the effect of saving training and testing time. Its structure is shown in Figure 1.

In the figure,

z_{t}

represents the update gate,

r_{t}

is the reset gate,

h_{t - 1}

represents the value of the hidden unit at the previous time,

x_{t}

represents the input data at the current time,

h_{t}

represents the output data of the hidden layer, and

{\tilde{h}}_{t}

is the candidate state obtained after the compound operation of the state of the previously hidden layer

h_{t - 1}

and the current input

x_{t}

. The calculation process of the GRU network is as follows, where

σ

is the sigmoid activation function, and

W_{z}

,

W_{r}

and

W_{h}

are the parameters of each gate neuron, respectively.

z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}])

(6)

r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}])

(7)

{\tilde{h}}_{t} = \tanh (W_{h} \cdot [r_{t} * h_{t - 1}, x_{t}])

(8)

h_{t} = (1 - z_{t}) * h_{t - 1} + z_{t} * {\tilde{h}}_{t}

(9)

Bidirectional long short-term memory (BiLSTM) and a bidirectional gated recurrent unit (BiGRU) introduce a bidirectional cyclic structure based on LSTM and GRU, which is composed of two unidirectional LSTMs or GRUs with opposite directions. Their advantage is that they can consider the impact of historical information and future information [26]. The bidirectional loop structure can effectively solve the limitations of RNN when processing long sequence data to improve the performance and stability of the model. Specifically, taking the BiGRU model as an example, for the forward GRU, its hidden state sequence

\vec{h_{1}}

,

\vec{h_{2}}

…

\vec{h_{t}}

according to the conventional GRU calculation method is calculated from the beginning of the sequence to the end of the sequence. For the reverse GRU, its hidden state sequence

\overset{\leftarrow}{h_{t}}

,

\overset{\leftarrow}{h_{t - 1}}

…

\overset{\leftarrow}{h_{1}}

starts from the ending position of the sequence and reversely calculates to the starting position of the sequence. After obtaining the sequence of forward and reverse hidden states, BiGRU splices the forward and reverse hidden states of each time step to obtain the final hidden state representation. That is, for time step T, the final hidden status

{h_{t}}^{B I G R U}

is Formula (10), while for BiGRU, the GRU structure is replaced by the LSTM structure, and its calculation method remains the same, so it will not be repeated in the text.

{h_{t}}^{B I G R U} = [\vec{h_{t}}, \overset{\leftarrow}{h_{t}}]

(10)

2.4. Convolutional Neural Network

CNN is a widely used network model. Its basic theory is to extract features from image, voice, or text data through convolution operation [27]. Its basic structure is composed of the convolution layer, pool layer, and full connection layer. The convolution layer is used to extract features. Local features are extracted by the convolution operation between the convolution kernel and input data, and different features are extracted by multiple convolution kernels. The pooling layer is used to compress features. The full connection layer is used to classify or regress the extracted features [28].

1DCNN is a variant of CNN, which is mainly used for the processing of sequence data. Compared with traditional CNN, 1DCNN can better capture the local patterns in the sequence data [29]. The FC layer in the traditional CNN network may lead to the problems of overfitting and there being too many model parameters, and it is unable to retain the spatial information of the feature map. To solve this problem, the GAP layer is employed to replace the last FC layer. The GAP layer averages and pools the feature map extracted from the convolution layer, and then directly connects the pooling result to the classifier as the input of the classifier [30]. By compressing the feature map while preserving the spatial information of the feature map, the addition of the GAP layer improves the generalization ability of the model, reduces the overfitting in the training process, and reduces the number of parameters and calculation of the model. Specifically, for channel c of the input characteristic graph, its size is H × W and the global average pooled output

{G A P}_{c}

is shown in Formula (11), where

x_{i, j, c}

is the value of channel c input at position (

i, j

). The 1DCNN-GAP pooling process is shown in Figure 2.

{G A P}_{c} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} x_{i, j, c}

(11)

3. Materials and Methods

Combining a signal decomposition algorithm and neural network technology, a fault diagnosis method based on improved VMD, a parallel hybrid neural network is proposed. The method is mainly composed of fault signal processing and fault diagnosis models.

3.1. Fault Signal Processing Method

The signal decomposition algorithm is employed to reduce part of the noise in the fault signal, retain the impact part of the original signal, and achieve the goal of improving the accuracy of model diagnosis. The fault signal processing flow based on improved VMD is shown in Figure 3.

This paper proposes an improved VMD algorithm combining the VMD algorithm with the reweighted kurtosis value. Firstly, in the basic parameter setting of the VMD algorithm, the penalty factor

α

is set to 2000 and other parameters are set to default values. The role of

α

is reflected in the control mode bandwidth: the larger the

α

, the narrower the bandwidth of each mode function, and the more concentrated the time–frequency characteristics of the mode, but it may lead to excessive smoothing and sacrifice part of the reconstruction accuracy. VMD common code implementation (such as MATLAB/python Toolkit) usually sets the value of

α

to 2000. This numerical value has been verified by many experiments, which can achieve a good balance between modal bandwidth and reconstruction accuracy and is suitable for mechanical vibration signals. The default value of noise tolerance (tau) is applicable to scenes where the noise distribution is not clear. The default value of convergence tolerance (tol) can ensure that the algorithm converges in a reasonable time without adjustment. After that, the fault signal is decomposed and reconstructed by decomposing the number of decomposition K from 2 to 9. The reconstructed signal is divided into 12 segments, and the value and weight of each segment are calculated to obtain its reweighted kurtosis value. The reweighted kurtosis value is employed to evaluate each reconstructed signal, and the reconstructed signal with the largest reweighted kurtosis value is selected as the fault signal. Compared with the kurtosis value, the reweighted kurtosis value as the evaluation index can eliminate the impact of some strong shocks in the fault signal and further improve the diagnostic accuracy of the model. After the improved VMD algorithm processing, it can remove part of the interference in the fault signal while retaining the impact part of the original signal.

3.2. Fault Diagnosis Model Design

In this paper, a parallel hybrid neural network is proposed, which combines the unique advantages of the cyclic neural network in processing time series data with the powerful feature extraction ability of the convolutional neural network and is employed for rapid diagnosis of fault signals. The model uses two parallel networks for more comprehensive feature extraction of fault signals and fuses the extracted features to improve the richness of features. At the same time, the GAP layer is employed for model parameter compression and data dimension transformation. The fault diagnosis process of the parallel hybrid neural network is shown in Figure 4.

The diagnosis process of the parallel hybrid neural network model is as follows.

Firstly, the processed fault signal is employed as the input to generate training samples that can be read by BiLSTM and BiGRU layers through data fusion, data interception, and normalization.

Secondly, in the parallel neural network, the BiLSTM layer and BiGRU layer, respectively, extract the features of each group of training samples with time-series characteristics and then capture and remember the relevant features in the samples. The bidirectional cyclic structure can better handle the long-term dependence in the time series data [31] and transfer the memory information to the lower network layer by layer after completing the feature extraction.

Thirdly, the CNN layer is employed to deeply mine the output features of the BiLSTM layer and BiGRU layer. By constructing multiple 1D-CNN layers, the small differences between different faults can be extracted from the feature data to improve the diagnosis accuracy of the model.

Fourthly, the features extracted by the two neural networks are fused by splicing, and the complementarity between features is employed to improve diagnostic accuracy.

Fifthly, after feature fusion, the use of the gap layer can greatly reduce the parameters of the model and the training and testing time of the model while considering the global information.

Finally, the output of the GAP layer is input into the SoftMax classifier for a final classification decision, and the diagnosis result is given.

3.3. Data Set Processing

The effectiveness of the proposed fault diagnosis method is verified by experiments on the public data set [32] of XJTU. The test bearing model adopted is LDK uer204. The experimental platform is mainly composed of a speed controller, digital force display, rotating shaft, motor, and test bearing, as shown in Figure 5. During the experiment, the radial force can be adjusted by the hydraulic loading system, and the speed can be adjusted by the AC motor to set different working conditions. There are three kinds of working conditions in the experiment, and the data from five experimental bearings are collected under each type of working condition. Table 1 lists the details of the bearings selected in this paper, including bearing life and fault location. The experimental data were collected by a dt9837 signal collector with a sampling interval of 60 s, a sampling frequency of 25.6 kHz, and a single sampling time of 1.28 s.

Because the bearing fault data set is too large to analyze all the fault signals, this paper selects the vibration signals in the horizontal direction of two bearing data sets under each working condition in the experiment [33], and each bearing data set selects the data of the last five CSV files for processing and analysis. Each CSV file contains 32768 sampling points, with a total of six fault types. The experimental data set constructed is shown in Table 2 below.

The original data set needs to be preprocessed before the fault signal processing so that after the fault signal processing, the data set can be directly brought into the fault diagnosis model as an input for learning. Data truncation and sample generation are required for the new fault data set before data set preprocessing. To make each group of samples contain effective information on the fault, this paper selects the appropriate sample length by taking the number of points of one revolution of the bearing as a reference and calculates the range of 640–731.43 through the sampling frequency and the corresponding speed in three working conditions. Considering the influence of data abundance, the length of each training sample is set to 1024 data points. To enhance the generalization ability of the model. In this paper, the sliding window overlapping sampling method [34] is employed to generate training samples. The generation formula is as follows:

L = (\frac{l - W}{B} + 1)

(12)

where

L

is the number of newly generated samples,

l

is the number of data points of the original fault signal,

W

is the length of the new training sample, and

B

is the step size of the sliding window.

In this paper, the step length of the sliding window is 1/4 of the sample length, that is 256 data points, and the number of newly generated samples can be calculated to be 637. To facilitate statistics and analysis, the first 600 samples are employed as training samples. Therefore, the size of each type of fault set generated after data truncation is (600, 1024). From each type of fault set, the training set and test set are constructed by randomly selecting data in the ratio of 8:2. To accelerate the convergence speed of the training model, the min max function is employed to normalize the experimental data. The newly generated bearing fault data set is shown in Table 3.

4. Results

4.1. Fault Signal Processing

After the bearing fault data set is obtained, the envelope spectrum analysis method is used to analyze the fault data. Because the impact information generated by the rolling bearing in the fault state is very prominent, the envelope spectrum analysis method can accurately locate the side frequency and quickly obtain the modulation signal. The square envelope spectrum algorithm is based on the envelope spectrum algorithm. When solving the analytical signal of the vibration signal, the conjugate signal of the analytical signal is solved at the same time, and then, the square envelope signal of the vibration signal can be obtained by integrating the conjugate analytical signal and the analytical signal. Compared with the envelope spectrum algorithm, the square envelope spectrum signal can suppress noise and other unrelated signals, making the spectrum clearer. Due to the large number of fault data, it is impossible to analyze all the data. Therefore, Bearing1_4 and one of the CSV files with a fault tag of 1 are selected as a typical example. The rotating speed of Bearing1_4 under working conditions is 35 Hz, and the calculation formula of bearing fault characteristic frequency (13) is introduced to obtain the cage fault characteristic frequency FTF =13.49 Hz of bearing Bearing1_4. At the same time, since the third octave of the characteristic frequency of cage fault is within 100 Hz, only the square envelope spectrum within 100 Hz is shown here, as shown in Figure 6:

F T F = \frac{1}{2} N [1 - \frac{d \cos a}{D}]

(13)

where N represents the bearing rotation frequency (unit: Hz),

d

represents the diameter of the rolling element, D represents the bearing pitch diameter, and

a

represents the contact angle of the rolling element.

It can be seen from the figure that although the frequency doubling and frequency doubling of the fault frequency component are obvious, there are still some interferences. Considering the improved VMD method proposed to process the fault signal, its purpose is to denoise and filter the original signal to remove the presence of noise and mask the components unrelated to the fault, making the fault feature easier to detect. The reweighted kurtosis values corresponding to different reconstructed signals are shown in Figure 7.

It can be seen from the figure that when the number of decomposition K is set to 6, the reweighted kurtosis value of the reconstructed signal is the largest. The reconstructed signal processed by the improved VMD algorithm removes some interference and retains the main impact signal, thus enhancing the fault characteristics of the original signal. The image comparison between the reconstructed signal and the original signal after VMD algorithm processing is shown in Figure 8:

Using the reconstructed signal to draw the square envelope spectrum image, as shown in Figure 9. It can be found that after the improved VMD algorithm processing, the amplitude of the square envelope spectrum image at the fault feature frequency increases significantly, which improves the identifiability of the signal and reflects the superiority of the improved VMD algorithm in retaining the impact characteristics of the fault signal, laying the foundation for the use of deep learning network for feature extraction in a following paper.

4.2. Fault Diagnosis Model Based on Parallel Hybrid Neural Network

During the design of the fault diagnosis model, setting different super parameters will affect the diagnosis results, and elements such as the complicacy of the problem, the size of the data set, and computation need to be considered. In this paper, the number of network layers, neurons, and training times of the diagnostic model are repeatedly adjusted. Specifically, the number of network layers is determined first, then the number of neurons is determined, and finally, the training times are determined according to the size of the loss value. The finally established parallel hybrid neural network fault diagnosis model is composed of two parallel neural networks, as shown in Figure 4 above. One neural network is the BiGRU-1DCNN structure, including two BiGRU layers and two 1DCNN layers. The number of neurons in BiGRU layer 1-1 and BiGRU layer 1-2 are set to 32 and 16, respectively, the filters of 1D-CNN_1-3 are set to 32, the convolution size is set to 64, the filters of 1D-CNN_1-4 are set to 16, and the convolution size is set to 32. The other neural network is a BiLSTM-1DCNN structure, which contains two BiLSTM layers and two 1DCNN layers. The number of neurons in BiLSTM layer 2-1 and BiLSTM layer 2-2 are set to 32 and 16, respectively. The filters of 1DCNN_2-3 are set to 32, the convolution size is set to 64, the filters of 1DCNN_2-4 are set to 16, and the convolution size is set to 32. The fault diagnosis model uses two BiLSTM layers and two BiGRU layers to extract the features of the training samples with time series, capture and remember the relevant features in the samples, and use the two-way cyclic structure to better handle the long-term dependence in the sequence data. Then, the extracted memory features are input into two 1DCNN layers, respectively, for in-depth feature mining, and the small differences between different faults are extracted from the data, thus raising the diagnosis accuracy of the model. Then, the features extracted by the two neural networks are fused using the splicing method to obtain more comprehensive fault features. After the CNN layer, the GAP layer is employed to take the place of the traditional FC layer, which can significantly reduce the parameters of the model while preventing the model from being overfitted and also upgrade the training speed of the model. Finally, fault classification is completed through the SoftMax layer. The super parameters of the model are shown in Table 4.

At the same time, to improve the training effect of the model, the batch training method and Adam optimizer are used to train the model in the experiment. The batch training method divides the training data set into multiple mini batches, and each iteration uses the data of one batch to calculate the gradient and update the model parameters, which can improve the convergence speed and generalization ability of the model. The Adam optimizer can adapt to the learning rate without manually adjusting the learning rate of each parameter, which is suitable for non-stationary targets. In this paper, Dell g15-5520 with the i7-12650h CPU model, rtx3060 graphics card model, and 6 GB video memory capacity are used for training (Dell, Round Rock, TX, USA). The number of samples put into the model each time batch size is set to 32, and the total number of training times is set to 20. The error curve of model training is shown in Figure 10. The model basically converges and is stable after training 20 times, and there is no fitting problem.

After using the test data set to evaluate the training model, the diagnosis results are presented in the form of a confusion matrix in Figure 11. The confusion matrix shows that the accuracy of the test is 99.72%. Experimental results verify the effectiveness of the proposed fault detection method and show that the method can accurately diagnose faults with high accuracy.

5. Discussion

5.1. Comparison of Denoising Experiments

To prove the advantages of the improved VMD method, Gaussian noise with signal-to-noise ratios (SNRs) of 5 dB, 10 dB, and 15 dB are added to the fault signal, respectively, and fast Fourier transform (FFT), WT, WOA-VMD, Kurtosis–VMD, EWT, and the method in this paper were employed for denoising, respectively. The SNR and the Pearson correlation coefficient between the denoised signal and the fault signal were employed as evaluation indexes. The average values obtained after ten experiments are shown in Table 5 below:

SNR is the ratio of relative strength or energy between signal and noise. It is a common indicator for evaluating signal quality and noise level, and the formula is

S N R = 10 l o g_{10} (s i g n a l / n o i s e)

(14)

where signal and noise represent their power, respectively.

Pearson correlation coefficient

(u s e r i n s t e a d)

can express the linear correlation degree of two variables, that is, the intensity and direction of their linear relationship, and the expression is

r = \frac{\sum^{\underset{n}{i = 1}} (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum^{\underset{n}{i = 1}} (X_{i} - \bar{X})^{2}} \sqrt{\sum^{\underset{n}{i = 1}} (Y_{i} - \bar{Y})^{2}}}

(15)

where

r

is the Pearson correlation coefficient,

X

and

Y

are two variables, and

\bar{X}

and

\bar{Y}

represent the average of

X

and

Y

, respectively.

In Table 5, the denoising results of various methods are compared using signal-to-noise ratios and the Pearson correlation coefficient. A large SNR value means that the strength of the signal is higher than that of the noise. Therefore, a larger SNR usually indicates a better denoising effect. Also,

r

is used to measure the linear correlation between the two variables, so a larger

r

value means that the higher the similarity between the denoised signal and the original signal, the better the denoising effect. The denoising effect of FFT is not ideal, and SNR and

r

are basically not improved compared with those before denoising. The denoising effect of WT and WOA-VMD is obvious, and their SNR and R have been improved in 5 dB, 10 dB, and 15 dB Gaussian noise environments. At the same time, compared with WT, WOA-VMD is better in a 15 dB noise environment, indicating that VMD has certain advantages over other denoising methods in a low noise environment. Compared with WOA-VMD, the kurtosis value VMD method has little difference in the denoising effect, but the SNR and Pearson correlation coefficient are lower than those before denoising under 15 dB noise, and there is a negative increase, which may be due to the impact of some strong impact in the noise signal, resulting in improper decomposition. The EWT algorithm has achieved a good denoising effect in three kinds of noise cases, which is better than the WT algorithm and has little difference with the method in this paper. Its calculation is faster and suitable for real-time processing. The RK-VMD proposed in this paper has the highest signal-to-noise ratio and Pearson correlation coefficient under three kinds of noise conditions, which represents the best denoising effect. It can remove noise and retain the original characteristics of the signal as much as possible and has certain advantages.

5.2. Experimental Results and Comparative Analysis

To test the feasibility and superiority of the proposed method, this paper, respectively, compares the proposed method with the improved and the latest fault diagnosis model. Considering the reliability of the experiment, the following experiments are repeated 10 times, and an average is found. Firstly, ablation experiments were carried out, and the proposed method was compared with the GRU model, GRU-CNN model, BiGRU-CNN model, parallel hybrid neural network full connection model, and parallel hybrid neural network GAP model to verify the feasibility of the method. The GRU model uses two GRU layers, and the number of neurons is 32 and 16 separately. GRU-CNN model adds two 1DCNN layers based on the GRU model. The output dimension numbers are 32 and 16 separately, and the convolution kernel size is 64 and 16 separately. The BiGRU-CNN model replaces the GRU layer with the BiGRU layer, while the parallel hybrid neural network–FC model fuses the characteristics abstracted from the two networks and uses the flatten layer and the two FC layers to reduce the data dimension. The number of neurons in the FC layer is 64 and 32 separately. The parallel hybrid neural network–GAP model replaces the flatten layer and two FC layers of the above model with a GAP layer.

The diagnostic results were evaluated by time index and accuracy index. In terms of time, the training time and testing time of the diagnostic model were mainly observed. The accuracy indicators include the accuracy rate (

P

) and harmonic mean (

F_{1}

), whose mathematical expression is

P = T P / (T P + F P) \times 100 %

(16)

F_{1} = \frac{2 T P}{2 T P + F P + F N} \times 100 %

(17)

In the formula,

T P

is the real class,

F P

is the false positive class, and

F N

is the false negative class; the ablation results are shown in Table 6.

In Table 6, the time index and accuracy index are employed to evaluate the results. From the experimental results, the effect of using only the GRU layer is not ideal, and the

p

value and

F_{1}

average are 93.95% and 93.87%, respectively. After adding the CNN layer, the GRU-CNN model becomes more complex, and the training and testing time increases, but the

p

value and

F_{1}

average value reach 95.91% and 95.77%, respectively; it reflects the further extraction ability of the CNN layer for deep-seated information in the signal. The BiGRU-CNN model uses a two-way cyclic structure, and its

p

value and

F_{1}

mean value reach 97.03% and 96.76%, respectively, which reflects the more powerful feature extraction ability of BIGRU. The parallel hybrid network–FC model adds a parallel BiLSTM-CNN network for feature extraction, and fuses the features extracted by the two networks, using the complementarity between features to improve the diagnostic accuracy, and its

p

value and

F_{1}

average is 98.47% and 98. 42%, respectively. Then, the gap layer is used instead of the FC layer to form the parallel hybrid neural network–GAP model. The training time and testing time are reduced by 88.4 s and 0.99 s, respectively, indicating that the GAP layer has unique advantages in reducing model parameters and calculation. Finally, the fault diagnosis method of improved VMD and parallel hybrid neural network is proposed in this paper. The fault signal is handled by the improved VMD algorithm and then input into the parallel hybrid neural network–GAP model. The

p

value and

F_{1}

average are 99.73% and 99.72%, respectively, reflecting the role of the improved VMD in removing part of the interference in the fault signal, retaining the impact part of the fault signal and improving the accuracy of model recognition.

To further reflect the advantages of the proposed method, an experimental comparison and analysis will be conducted between the proposed method and existing fault diagnosis methods. These include the improved residual dense network (IRDN) proposed by Sun JD et al. [35], the AO-LSTM (ACM-GIF-AOLSTM) network proposed by Ma j et al. [36], the LSTM-Cascade Catboost network proposed by Yang mm et al. [37], and the hybrid multimodal fusion network (HMF-DL) based on deep learning proposed by Che CC et al. [38]. IRDN uses the XJTU data set and uses accuracy, standard deviation, and loss as evaluation indexes; ACM-GIF-AOLSTM uses the XJTU data set and uses accuracy and standard deviation as evaluation indexes; LSTM Cascade Catboost uses the CWRU data set and XJTU data set, with accuracy as the evaluation index; HMF-DL uses the XJTU data set and draws the confusion matrix to obtain the accuracy as the evaluation index. Considering comprehension, this paper uses the diagnostic accuracy of different models as the comparison index, as shown in Table 7.

According to the diagnosis results, the bearing fault diagnosis method proposed in this paper is the best in the diagnosis effect compared with other models, reaching 99.72%. The above experiments reflect that the fault diagnosis method based on improved VMD and parallel hybrid neural network has certain advantages.

5.3. Generality Analysis

To prove the generality of the proposed method, a further test was taken on the data set of the faulty fan end of the rolling bearing of Case Western Reserve University (CWRU) in the United States with the parameters of the model unchanged. There are nine fault types and one normal type in the data set. The nine fault types are rolling fault (B), inner ring fault (IR), and outer ring fault (OR). Each fault type is divided into 0.18, 0.36, and 0.53 mm single-point pits according to the fault diameter of the bearing. On the other hand, the experimental platform is equipped with unidirectional acceleration sensors at the drive end (DE), fan end (FE), and base (BA) of the motor, and contains four load conditions, namely load 1 (1796 r/min), load 2 (1772 r/min), load 3 (1748 r/min), and load 4 (1722 r/min). The fault sampling frequency is 12 kHz and the sampling time is 10 s, so a set of (3, 12000) tensor matrices can be obtained for each fault type, in which 3 represents the vibration signals at three different positions and 12000 represents the total length of the signals.

In this article, the signals at the driving end under four load conditions are employed as experimental data, and the length of the training sample is 1024 data points, which is consistent with the above. At the same time, the overlapping sampling method is employed to expand the training samples. The step size of the sliding window is 1/5 of the sample length, that is, 204 data points. Finally, the number of newly generated samples is 583. Considering the subsequent statistics and analysis of the experimental results, the first 500 samples are regarded as the data set. Therefore, the size of each type of fault set generated after data truncation is (500, 1024). Since there are 10 types, the size of the experimental data under each working condition is (5000, 1024). After improved VMD processing, the training set and test set are divided in the proportion of 8:2, and the training set is substituted into the parallel hybrid network after the data are normalized. The settings of network parameters and training parameters have not changed. The test set is substituted into the model for diagnosis. The results are shown in Table 8.

The results show that the improved VMD parallel hybrid neural network has also achieved satisfactory results on the CWRU data set, and the average accuracy under various load conditions has reached 99.73%, which proves that the proposed fault diagnosis method can be applied to different bearing data sets and has certain universality.

6. Conclusions

To solve the problem of the fault identification of the rolling bearing, this paper proposes a fault diagnosis method based on improved VMD and a parallel hybrid neural network. The improved VMD and parallel hybrid neural network are combined to significantly improve the accuracy and applicability of bearing fault diagnosis through optimizing signal decomposition and enhancing feature learning. In the aspect of signal processing, the bearing data set is composed of vibration signals, and there is some noise interference. This paper uses the improved VMD algorithm and introduces the weighted kurtosis as the evaluation index, which can more accurately decompose the non-stationary vibration signal, effectively remove the noise, and retain the impact characteristics of the fault signal. In the aspect of fault feature extraction, two parallel BiGRU-CNN networks and BiLSTM-CNN networks are used for feature extraction, and a two-way cyclic structure is introduced, so that the model can consider not only the previous historical information but also future information. The feature fusion is carried out by a splicing method. The gap layer is used to replace the traditional full connection layer for data dimensionality reduction, and the features after dimensionality reduction are input into the SoftMax classifier for classification. Experiments were carried out on the public data sets of XJTU and CWRU, and the accuracy of fault identification reached 99.72% and 99.73%. Compared with other latest models, the results show that the fault diagnosis method proposed in this paper has certain advantages.

Author Contributions

Methodology, H.C.; Investigation, Q.S.; Writing—original draft, W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by [China Higher Education Institution Industry-University-Research Innovation Fund] grant number [2024HY031] and the APC was funded by [China Higher Education Institution Industry-University-Research Innovation Fund].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://biaowang.tech/xjtu-sy-bearing-datasets.

Acknowledgments

Sincere gratitude is extended to Liang Tian for his valuable suggestions and professional guidance during the revision of this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

VMD	Variational mode decomposition
RK	Reweighted kurtosis
GAP	Global Average Pooling Layer
EMD	Empirical mode decomposition
LMD	Local means decomposition
IMF	Intrinsic mode functions
RNN	Recurrent neural network
BiLSTM	Bidirectional long short-term memory
BiGRU	Bidirectional gated recurrent unit
1DCNN	One-dimensional Convolutional Neural Network
FC	Full connection

References

Frosini, L. Novel diagnostic techniques for rotating electrical machines—A review. Energies 2020, 13, 5066. [Google Scholar] [CrossRef]
Yang, C.; Wang, H.; Gao, Z.; Cui, X. Improving rolling bearing online fault diagnostic performance based on multi-dimensional characteristics. R. Soc. Open Sci. 2018, 5, 180066. [Google Scholar] [CrossRef]
Ma, Z.; Ruan, W.; Chen, M.; Li, X. An Improved Time-Frequency Analysis Method for Instantaneous Frequency Estimation of Rolling Bearing. Shock Vib. 2018, 2018, 8710190. [Google Scholar] [CrossRef]
El Khadiri, K.; Elouaham, S.; Nassiri, B.; El Melhaoui, O.; Said, S.; El Kamoun, N.; Zougagh, H. A Comparison of the Denoising Performance Using Capon Time-Frequency and Empirical Wavelet Transform Applied on Biomedical Signal. Int. J. Eng. Appl. 2023, 11, 358–365. [Google Scholar] [CrossRef]
Meng, D.; Wang, H.; Yang, S.; Lv, Z.; Hu, Z.; Wang, Z. Fault analysis of wind power rolling bearing based on EMD feature extraction. CMES-Comput. Model. Eng. Sci. 2021, 130, 543–558. [Google Scholar] [CrossRef]
Xu, Y.; Zhang, K.; Ma, C.; Li, S.; Zhang, H. Optimized LMD method and its applications in rolling bearing fault diagnosis. Meas. Sci. Technol. 2019, 30, 125017. [Google Scholar] [CrossRef]
Torres, M.E.; Colominas, M.A.; Schlotthauer, G.; Flandrin, P. A complete ensemble empirical mode decomposition with adaptive noise. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; pp. 4144–4147. [Google Scholar]
Colominas, M.A.; Schlotthauer, G.; Torres, M.E. Improved complete ensemble EMD: A suitable tool for biomedical signal processing. Biomed. Signal Process. Control 2014, 14, 19–29. [Google Scholar] [CrossRef]
Wang, Z.; Wang, J.; Kou, Y.; Zhang, J.; Ning, S.; Zhao, Z. Weak fault diagnosis of wind turbine gearboxes based on MED-LMD. Entropy 2017, 19, 277. [Google Scholar] [CrossRef]
Liu, Z.; Peng, Y. Study on denoising method of vibration signal induced by tunnel portal blasting based on WOA-VMD algorithm. Appl. Sci. 2023, 13, 3322. [Google Scholar] [CrossRef]
Li, H.; Wu, X.; Liu, T.; Li, S.; Zhang, B.; Zhou, G.; Huang, T. Composite fault diagnosis for rolling bearing based on parameter-optimized VMD. Measurement 2022, 201, 111637. [Google Scholar] [CrossRef]
Randall, R.B.; Antoni, J. Rolling element bearing diagnostics—A tutorial. Mech. Syst. Signal Process. 2011, 25, 485–520. [Google Scholar] [CrossRef]
Sacerdoti, D.; Strozzi, M.; Secchi, C. A comparison of signal analysis techniques for the diagnostics of the IMS rolling element bearing dataset. Appl. Sci. 2023, 13, 5977. [Google Scholar] [CrossRef]
Shao, S.; Yan, R.; Lu, Y.; Wang, P.; Gao, R.X. DCNN-based multi-signal induction motor fault diagnosis. IEEE Trans. Instrum. Meas. 2019, 69, 2658–2669. [Google Scholar] [CrossRef]
Zhu, D.; Zhang, Y.; Zhao, L. Fault diagnosis method for rolling element bearing with variable rotating speed using envelope order spectrum and convolutional neural network. J. Intell. Fuzzy Syst. 2019, 37, 3027–3040. [Google Scholar] [CrossRef]
Shao, Y.; Yuan, X.; Zhang, C.; Song, Y.; Xu, Q. A novel fault diagnosis algorithm for rolling bearings based on one-dimensional convolutional neural network and INPSO-SVM. Appl. Sci. 2020, 10, 4303. [Google Scholar] [CrossRef]
Chen, H.; Meng, W.; Li, Y.; Xiong, Q. An anti-noise fault diagnosis approach for rolling bearings based on multiscale CNN-LSTM and a deep residual learning model. Meas. Sci. Technol. 2023, 34, 045013. [Google Scholar] [CrossRef]
Sharma, V.; Parey, A. Extraction of weak fault transients using variational mode decomposition for fault diagnosis of gearbox under varying speed. Eng. Fail. Anal. 2020, 107, 104204. [Google Scholar] [CrossRef]
Feng, W.; Zhu, Q.; Zhuang, J.; Yu, S. An expert recommendation algorithm based on Pearson correlation coefficient and FP-growth. Clust. Comput. 2019, 22, 7401–7412. [Google Scholar] [CrossRef]
Pan, H.; Yin, X.; Cheng, J.; Zheng, J.; Tong, J.; Liu, T. Periodic component pursuit-based kurtosis deconvolution and its application in roller bearing compound fault diagnosis. Mech. Mach. Theory 2023, 185, 105337. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, Z.; Wang, J.; Liu, Z.; Wang, L. Reweighted-Kurtogram with sub-bands rearranged and ensemble dual-tree complex wavelet packet transform for bearing fault diagnosis. Struct. Health Monit. 2022, 21, 2951–2967. [Google Scholar] [CrossRef]
Chen, L.; Xu, G.; Zhang, S.; Yan, W.; Wu, Q. Health indicator construction of machinery based on end-to-end trainable convolution recurrent neural networks. J. Manuf. Syst. 2020, 54, 1–11. [Google Scholar] [CrossRef]
Székelyhidi, L. Convolution Type Functional Equations on Topological Abelian Groups; World Scientific: Singapore, 1991; Volume 3. [Google Scholar]
Zhang, B.; Zhang, S.; Li, W. Bearing performance degradation assessment using long short-term memory recurrent network. Comput. Ind. 2019, 106, 14–29. [Google Scholar] [CrossRef]
Liu, Z. Bearing Fault Diagnosis of End-to-End Model Design Based on 1DCNN-GRU Network. Discret. Dyn. Nat. Soc. 2022, 2022, 7167821. [Google Scholar]
Guo, Y.; Mao, J.; Zhao, M. Rolling bearing fault diagnosis method based on attention CNN and BiLSTM network. Neural Process. Lett. 2023, 55, 3377–3410. [Google Scholar] [CrossRef]
Durairaj, D.M.; Mohan, B.H.K. A convolutional neural network based approach to financial time series prediction. Neural Comput. Appl. 2022, 34, 13319–13337. [Google Scholar] [CrossRef] [PubMed]
Cecotti, H.; Belaid, A. Rejection strategy for convolutional neural network by adaptive topology applied to handwritten digits recognition. In Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR’05), Seoul, Republic of Korea, 31 August–1 September 2005; pp. 765–769. [Google Scholar]
Sun, H.; Zhao, S. Fault diagnosis for bearing based on 1DCNN and LSTM. Shock Vib. 2021, 2021, 1221462. [Google Scholar] [CrossRef]
Li, J.; Han, Y.; Zhang, M.; Li, G.; Zhang, B. Multi-scale residual network model combined with Global Average Pooling for action recognition. Multimed. Tools Appl. 2022, 81, 1375–1393. [Google Scholar] [CrossRef]
Ni, G.; Zhang, X.; Ni, X.; Cheng, X.; Meng, X. A WOA-CNN-BiLSTM-based multi-feature classification prediction model for smart grid financial markets. Front. Energy Res. 2023, 11, 1198855. [Google Scholar] [CrossRef]
Wang, B.; Lei, Y.; Li, N.; Li, N. A hybrid prognostics approach for estimating remaining useful life of rolling element bearings. IEEE Trans. Reliab. 2018, 69, 401–412. [Google Scholar] [CrossRef]
Xue, T.; Wang, H.; Wu, D. MobileNetV2 combined with fast spectral kurtosis analysis for bearing fault diagnosis. Electronics 2022, 11, 3176. [Google Scholar] [CrossRef]
Zhao, J.; Shi, Y.; Tan, F.; Wang, X.; Zhang, Y.; Liao, J.; Yang, F.; Guo, Z. Research on an intelligent diagnosis method of mechanical faults for small sample data sets. Sci. Rep. 2022, 12, 21996. [Google Scholar] [CrossRef] [PubMed]
Sun, J.; Wen, J.; Yuan, C.; Liu, Z.; Xiao, Q. Bearing fault diagnosis based on multiple transformation domain fusion and improved residual dense networks. IEEE Sens. J. 2021, 22, 1541–1551. [Google Scholar] [CrossRef]
Ma, J.; Wang, X. Compound fault diagnosis of rolling bearing based on ACMD, Gini index fusion and AO-LSTM. Symmetry 2021, 13, 2386. [Google Scholar] [CrossRef]
Yang, M.; Liu, W.; Zhang, W.; Wang, M.; Fang, X. Bearing vibration signal fault diagnosis based on LSTM-cascade CatBoost. J. Internet Technol. 2022, 23, 1155–1161. [Google Scholar] [CrossRef]
Che, C.; Wang, H.; Ni, X.; Lin, R. Hybrid multimodal fusion with deep learning for rolling bearing fault diagnosis. Measurement 2021, 173, 108655. [Google Scholar] [CrossRef]

Figure 1. GRU network structure diagram.

Figure 2. 1DCNN-GAP pooling process.

Figure 3. Flow chart of fault signal processing.

Figure 4. Flow chart of parallel hybrid neural network fault diagnosis.

Figure 5. Experimental platform.

Figure 6. Square envelope spectral image of original signal.

Figure 7. Reweighted kurtosis corresponding to different reconstructed signals.

Figure 8. Image of reconstructed signal and original signal.

Figure 9. Square envelope spectral image of reconstructed signal.

Figure 10. Training error curve.

Figure 11. Fault diagnosis results.

Table 1. XJTU bearing datasets.

Bearing Data Set	Speed/Radial Force	Number of Files	Lifetime	Fault Element
Bearing 1_1 Bearing 1_4	35 Hz/12 kN	123	2 h 3 min	Outer race
Bearing 1_1 Bearing 1_4	35 Hz/12 kN	122	2 h 2 min	Cage
Bearing 2_1 Bearing 2_2	37.5 Hz/11 kN	491	8 h 11 min	Inner race
Bearing 2_1 Bearing 2_2	37.5 Hz/11 kN	161	2 h 41 min	Outer race
Bearing 3_3 Bearing 3_5	40 Hz/10 kN	371	6 h 11 min	Inner race
Bearing 3_3 Bearing 3_5	40 Hz/10 kN	114	1 h 54 min	Outer race

Table 2. Experimental data set.

Bearing Data Set	Data Size	Fault Element
Bearing 1_1	(1, 163840)	Outer
Bearing 1_4	(1, 163840)	Cage
Bearing 2_1	(1, 163840)	Inner
Bearing 2_2	(1, 163840)	Outer
Bearing 3_3	(1, 163840)	Inner
Bearing 3_5	(1, 163840)	Outer

Table 3. New bearing failure data set.

Bearing Data Set	Training Set	Test Set	Label
Bearing 1_1	(480, 1024)	(120, 1024)	0
Bearing 1_4	(480, 1024)	(120, 1024)	1
Bearing 2_1	(480, 1024)	(120, 1024)	2
Bearing 2_2	(480, 1024)	(120, 1024)	3
Bearing 3_3	(480, 1024)	(120, 1024)	4
Bearing 3_5	(480, 1024)	(120, 1024)	5

Table 4. Hyperparameters of model.

Network Layer Name	Parameter Setting	Output Shape
Input Layer	/	[batch, 1024, 1]
BiGRU Layer 1-1	Units = 32	[batch, 1024, 64]
BiGRU Layer 1-2	Units = 16	[batch, 1024, 32]
1D-CNN Layer1-3	Filters = 32, Kernel_size= 64	[batch, 961, 32]
1D-CNN Layer1-4	Filters = 16, Kernel_size= 16	[batch, 946, 16]
BiLSTM Layer2-1	Units = 32	[batch, 1024, 64]
BiLSTM Layer2-2	Units = 16	[batch, 1024, 32]
1D-CNN Layer2-3	Filters = 32, Kernel_size= 64	[batch, 961, 32]
1D-CNN Layer2-4	Filters = 16, Kernel_size= 16	[batch, 946, 16]
Concatenate Layer	/	[batch, 1892, 16]
1D-GAP Layer	/	[batch, 16]
Output Layer	SoftMax classifier	[batch, 6]

Table 5. Denoising results of various methods.

Method	SNR (dB)	$r$ (%)	SNR (dB)	$r$ (%)	SNR (dB)	$r$ (%)
Noise Signal	5	86.84	10	95.21	15	98.43
FFT	6.21	87.85	10.41	95.24	15.25	98.45
WT	8.93	93.26	12.65	97.17	16.65	98.88
WOA-VMD	8.58	92.93	13.74	97.86	18.20	98.96
Kurtosis–VMD	8.6	92.80	13.23	97.70	13.59	98.06
EWT	9.04	93.58	13.71	97.88	17.57	98.92
Proposed method	9.13	93.64	14.5	98.20	18.89	99.35

Table 6. Comparison of ablation results.

Fault Diagnosis Model	Train Time (s)	Test Time (s)	$P$ (%)	$F_{1}$ (%)
GRU	674.93	3.89	93.95	93.87
GRU-CNN	940.97	3.95	95.91	95.77
BiGRU-CNN	1120.87	4.12	97.03	96.76
The parallel hybrid neural network–FC	1953.28	5.17	98.47	98.42
The parallel hybrid neural network–GAP	1864.88	4.18	98.50	98.46
Proposed method	1867.56	4.19	99.73	99.72

Table 7. Accuracy of different models.

Fault Diagnosis Model	Accuracy (%)
IRDN	97.37
ACM-GIF-AOLSTM	98.67
LSTM-Cascade CatBoost	99.33
HMF-DL	99.57
Proposed method	99.72

Table 8. Diagnosis results under different load conditions.

Load Condition	$P$ (%)	$F_{1}$ (%)
Load 1	99.71	99.70
Load 2	100.00	100.00
Load 3	99.40	99.40
Load 4	99.80	99.80
Average value	99.73	99.73

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, W.; Cai, H.; Sun, Q. Bearing Fault Diagnosis Method Based on Improved VMD and Parallel Hybrid Neural Network. Appl. Sci. 2025, 15, 4430. https://doi.org/10.3390/app15084430

AMA Style

Chen W, Cai H, Sun Q. Bearing Fault Diagnosis Method Based on Improved VMD and Parallel Hybrid Neural Network. Applied Sciences. 2025; 15(8):4430. https://doi.org/10.3390/app15084430

Chicago/Turabian Style

Chen, Wuyi, Huafeng Cai, and Qiu Sun. 2025. "Bearing Fault Diagnosis Method Based on Improved VMD and Parallel Hybrid Neural Network" Applied Sciences 15, no. 8: 4430. https://doi.org/10.3390/app15084430

APA Style

Chen, W., Cai, H., & Sun, Q. (2025). Bearing Fault Diagnosis Method Based on Improved VMD and Parallel Hybrid Neural Network. Applied Sciences, 15(8), 4430. https://doi.org/10.3390/app15084430

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bearing Fault Diagnosis Method Based on Improved VMD and Parallel Hybrid Neural Network

Abstract

1. Introduction

2. Model Construction Principle

2.1. Variational Mode Decomposition

2.2. Reweighted Kurtosis Method

2.3. Recurrent Neural Network

2.4. Convolutional Neural Network

3. Materials and Methods

3.1. Fault Signal Processing Method

3.2. Fault Diagnosis Model Design

3.3. Data Set Processing

4. Results

4.1. Fault Signal Processing

4.2. Fault Diagnosis Model Based on Parallel Hybrid Neural Network

5. Discussion

5.1. Comparison of Denoising Experiments

5.2. Experimental Results and Comparative Analysis

5.3. Generality Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI