1. Introduction
Gearboxes are mainly composed of gears, shafts, bearings and casings. With the continuous development of modern science and technology, their application is becoming ever more extensive [
1]. They are currently used in mechanical transmission systems such as aero-engines, wind power, petrochemicals and metallurgy [
2,
3]. The working environment of the gearbox is usually very harsh. Under high-speed and heavy-load operating conditions, the internal parts of a gearbox are easily damaged and can even stop working [
4,
5], leading to economic losses and casualties. Therefore, fault diagnosis has great significance for gearboxes [
6].
A large number of scholars have carried out research on gearbox fault diagnosis. Liu et al. [
7] used a combination of empirical mode decomposition and the Hilbert spectrum for gearbox fault diagnosis. Cheng et al. [
8] proposed a fault diagnosis method using singular value and empirical mode decomposition to extract the characteristics of gear- and roller-bearing vibration signals, and then used support vector machines for pattern recognition and classification. Wang et al. [
9] proposed a gearbox fault diagnosis method based on a combination of recursive graphs and 2DCNN. First of all, the vibration signal was converted into a recursive graph, and this was then input into the 2DCNN model for gearbox fault pattern recognition and classification. Chen et al. [
10] proposed a gearbox fault diagnosis method based on CNN–SVM. This study used a CNN model for feature extraction, and then the researchers input the feature information into SVM for pattern recognition and classification and achieved good results. There are also many experts who analyzed gear failures and conducted experimental verification in the form of dynamic modeling [
11,
12]. They conducted their research by discussing how response curves and spectra related to actual failure modes [
13,
14]. With the continuous development of the field of computer science and gradual expansion of its scope of application, fault diagnosis technology has also begun to develop from traditional fault diagnosis into intelligent fault diagnosis [
15].
Traditional fault diagnosis is mainly divided into two aspects: feature extraction and pattern recognition [
16]. Feature extraction and selection are mainly manual. The design of the shallow structure of the classifier itself has limitations when dealing with large-scale data and is difficult to adapt to the intelligent development trend of fault diagnosis [
17]. Therefore, there is great interest from researchers in developing methods to improve the accuracy, intelligence, versatility and practical application ability in gearbox fault diagnosis engineering, and realize intelligent gearbox diagnosis based on big data [
18,
19].
In recent years, against the background of artificial intelligence, there has been continuous innovation and development in deep learning technology [
20]. This technology is widely used by scholars at home and abroad in the fields of automatic driving, big data processing, image recognition, etc. [
21,
22]. The most important aspect of deep learning is its feature learning ability, which can adaptively learn the mapping relationship between input and output in data, and its capacity to mine the nonlinear information hidden in the deep layers of data [
23]. Deep learning technology has natural advantages in the intelligent diagnosis of gearboxes. It can unify the two aspects of feature extraction and pattern recognition [
24] and realize the entire process of end-to-end fault diagnosis without manual intervention. As a consequence, scholars are increasingly using convolutional neural network (CNN) [
25], dynamic Bayesian network (DBN) [
26], recurrent neural network (RNN) [
27] and other deep learning methods to study intelligent fault diagnosis methods for application to gearboxes based on characteristics of the industrial scenarios in which gearboxes are operated.
Although the application of deep learning in fault diagnosis has achieved some initial results, most of the research has been based on one-dimensional vibration signal data as the input. This method has a lot of feature information and is complicated and difficult to screen. It has problems such as incomplete feature information, heavy extraction workload and easy signal interference. To a certain extent, it limits the further improvement of gearbox fault diagnosis accuracy and the optimization of diagnosis efficiency [
28].
Deep learning techniques have a natural advantage in processing images. It is therefore worth exploring the potential advantages of a bearing fault diagnosis method that converts the vibration signal into two-dimensional image data and then uses the two-dimensional image data as the input into the deep learning model [
29]. Visualizing the vibration signal can not only capture high-quality information from the vibration signal, but also optimize the preprocessing of the vibration signal data. In addition, the deep learning model has good recognition and processing characteristics in relation to the converted two-dimensional image data [
30].
CNN is a typical deep learning model. Currently, 1DCNN and 2DCNN are commonly used. The model extracts the signal features layer by layer through convolution, pooling and nonlinear activation function mapping [
31]. Compared with the fully connected deep learning model, CNN is more robust and has better generalization ability. It can improve network performance and reduce training costs through weight setting and pooling [
32], and the overfitting phenomenon is uncommon when using it.
In order to effectively improve the complexity and incompleteness of manual feature extraction in the fault diagnosis process of railway vehicle gearboxes, the workload of the extraction should be high, the signal should be susceptible to noise interference, the interference caused by the impact of external experimental environment noise on fault recognition should be reduced, and more complete feature information should be saved. This article proposes a gearbox fault diagnosis method based on multidomain information fusion CNN. The contributions of this paper are as follows:
- (1)
An experimental platform for gearbox fault diagnosis was built, and a gearbox fault diagnosis method based on multidomain information fusion CNN was proposed. The method was verified as having high robustness and feasibility.
- (2)
The SVD algorithm was used to preprocess and denoise the original signal of the gearbox. In terms of SVD signal reconstruction, a singular value energy difference spectrum was introduced. This method determines the effective order of the reconstruction matrix after singular value decomposition based on the contribution of noise signals and useful signals to singular values.
- (3)
The one-dimensional gearbox vibration signal and the two-dimensional frequency map of STFT time and CNN were combined. CNN multifeature fusion was used to enrich the features of the two different dimensions, which reduced the problem of gearbox information loss during the adaptive extraction process.
This paper is composed as follows:
Section 2 introduces the relevant algorithm principles used: SVD, STFT, 1 DCNN, 2 DCNN and SVM;
Section 3 describes the construction of the relevant fault diagnosis models;
Section 4 covers the building of the gearbox fault diagnosis experimental platform and the data collection;
Section 5 sets out the experimental analysis and verification; and
Section 6 presents the conclusions of this study.
2. Principle Introduction
2.1. SVD
The singular value decomposition (SVD) method is currently used in data dimensionality reduction, image processing, signal processing and other fields [
33]. In the field of signal processing, SVD has been successfully used for signal noise reduction. In the practical application of SVD, the determination of the effective order of the reconstructed matrix after decomposition is a challenge. Some scholars propose the use of the singular entropy increment and threshold method to select the reconstruction order, but these methods often rely on user experience, and the subsequent noise reduction effect is not ideal. In order to solve this problem, this paper introduces the singular value energy difference spectrum [
34] and determines the reconstruction order according to the energy contribution of the signal and noise to the singular value, thereby achieving the noise reduction of the vibration signal. The main principles of SVD are as follows:
For the gearbox vibration signal,
because the vibration signal is usually a one-dimensional signal. SVD cannot be directly performed on it, and a two-dimensional matrix needs to be constructed first. There are many ways to construct a two-dimensional matrix from a one-dimensional signal, such as via the circular, Toeplitz and Hankel matrices. The Hankel matrix is the most widely used because of its zero-phase-shift characteristics and wavelet-like characteristics, and so we first construct the Hankel matrix
for
:
where
is a Hankel matrix constructed for pairs,
,
is useful signal space and
is noise signal space. When
, the Hankel matrix has a good noise reduction effect.
In terms of signal reconstruction, the determination of the useful order of singular values is particularly important. If more singular values are selected for signal reconstruction, a part of the noise signal remains in the signal after noise reduction, and the noise reduction will be incomplete. However, if fewer singular values are selected for signal reconstruction, useful signals will be deleted, resulting in incomplete information in the original vibration signal [
35]. In this paper, the singular value energy difference spectrum is introduced, and the effective order of the reconstruction matrix after singular value decomposition is determined according to the contribution of the noise signal and the useful signal to the singular value. The signal energy is shown in Formula (2):
In Formula (2),
represents the signal energy.
,
, …,
are the singular values of the matrix
, and
represents the total order, that is, up to
. Therefore, the singular value energy difference spectrum is defined and normalized:
Here, the sequence formed by is called the singular energy difference spectrum, and Formula (3) represents the energy change represented by adjacent singular values. The singular value energy of the useful signal accounts for a larger proportion than the noise signal, meaning that it will cause greater peak fluctuations at the boundary between the noise signal and the useful signal. The singular value after the peak is mainly generated by the noise signal, meaning that the singular value corresponding to this point can be found in the singular value energy difference spectrum. We then take this point as the order of the reconstructed signal, which enables the separation of noise signal and useful signal and succeeds in reducing the noise of the gearbox vibration signal.
2.2. STFT
The short-time Fourier transform (STFT) is also referred to as a windowed Fourier transform. Because the Fourier transform is only suitable for steady-state signal analysis, and as unsteady-state signals are very common in mechanical equipment, the short-time Fourier transform is a method developed to adapt to unsteady-state signal analysis [
36]. This method can transform the one-dimensional gearbox vibration signal into a two-dimensional matrix containing feature information in the time–frequency domain, which can then be input into the 2DCNN. The main principle is to process a non-stationary signal with a square frame, where the time inside the frame is regarded as a stationary signal. The square frame here is equivalent to a window, and so it is also called windowing. The window function is multiplied with the signal and then the Fourier transform is performed to obtain the spectrum information. A series of spectrum information is obtained by moving the window function, and splicing these together produces data with a frequency that changes with time [
37]. The short-time Fourier transform expression is as follows:
In Formula (4),
represents the frequency,
represents the starting time of the current window and
represents the contribution of the signal component with frequency
in the window at time
.
is the period of time,
is the unsteady signal and
is the window function. The schematic diagram of STFT is shown in
Figure 1.
Assume that
in
Figure 1 is a window function, and that
,
, and
are time periods. When
, the short-time Fourier transform is restored to the Fourier transform. The choice of window function and window width are important factors affecting the effect of STFT. A suitable window function can effectively reduce the spectrum leakage caused by the interception of the original nonstationary signal. The selection of window width will affect the resolution in the time and frequency domains. If the window is too narrow, the signal in the window will be too short and the accuracy of frequency domain resolution will not be high. If the window is too wide, the time domain will not be sufficiently fine and the time resolution will be low.
2.3. CNN
Convolutional neural network (CNN) is a feedforward neural network that has become one of the most commonly used algorithms in the field of deep learning in recent years, particularly in the field of pattern classification [
38]. The network can avoid image preprocessing in the early stage, and the original image can be directly input. A typical convolutional neural network mainly consists of an input layer, a convolutional layer, a pooling layer, a fully connected layer and an output layer.
- (1)
Input layer. The CNN input layer can preprocess the input data, such as standardization, normalization, etc.
- (2)
Convolutional layer. The convolutional layer is the core component of CNN. Its largest feature is weight sharing, which can be realized through the convolution kernel. The convolutional layer uses the convolution kernel to locally operate on the input data to extract the corresponding features of this part. As the number of convolutional layers deepens, the required parameters also increase. Deeper features can also be extracted. The convolution operation expression is as follows:
In Formula (5), the number of convolutional layers is
, the output of the
layer is
, the input of the
layer is
, the weight matrix is
, the bias is
, the activation function used by the convolutional layer is
and
is the
th convolution area of the feature map
layer.
- (3)
Pooling layer. Pooling layers are also called downsampling layers. This layer mainly performs feature extraction and dimensionality reduction during the running of the CNN, which can reduce the amount of calculation required. To a certain extent, it can also reduce the possibility of overfitting. The maximum pooling formula is:
In Formula (6),
means maximum pooling, the number of pooling layers is
,
represents the activation value,
represents the pooling width, the minimum value of
is
and the maximum value range is
. The
-th activation value of the
-th eigenvalue of the
layer is
.
- (4)
Fully connected layer and output layer. After the previous convolutional and pooling rounds, the fully connected layer of the image input is fully connected between the input and output. This mainly summarizes the features extracted by the convolutional layer and the pooling layer to achieve global optimization [
39]. The Softmax function is generally used as the classifier of the output layer. However, as the Softmax classifier leads to insufficient generalization ability of the graphical model and is not suitable for image classification, here we instead use SVM.
2DCNN is widely used in the field of image recognition and is effective for image feature extraction and classification [
40]. It differs from 1DCNN in the dimensions of the input data. The input of 2DCNN comprises two-dimensional or three-dimensional data. In our study, the two-dimensional time–frequency image generated by the short-time Fourier transform is input into the 2DCNN.
2.4. SVM
Support vector machine (SVM) is a data analysis method developed on the basis of statistics. Its basic principle is to map the nonlinear problem in the original low-dimensional input space onto the high-dimensional feature space for solution, and it is often used in classification and regression analysis and to solve other problems [
41].
In SVM nonlinear data classification, the input data are mapped onto the high-dimensional space primarily via the kernel function. The selection of different kernel functions has an impact on the classification effect. Kernel functions include polynomial, Laplacian and radial basis function kernels. We selected the radial basis function kernel for our study because it has a wider range of applications. The main principle of SVM is shown in Formula (7):
In Formula (7), the given data set is . The input feature vector is , which for the label is . Classification samples are , the weight is , the penalty factor is , and and are the parameters to be optimized. is the relaxation component of the i-th component, linearly separable when = 0. The hyperplane used for classification is obtained using the above formula. Then, the appropriate kernel function and parameters are selected, and the classification discriminant function is used to judge the category of .
4. Fault Diagnosis Experimental Setup and Data Collection
In order to verify the actual effect of the method proposed in this paper in gearbox fault diagnosis, this experiment used the JZQ250 fixed-axis gearbox for fault diagnosis research. The experimental platform is shown in
Figure 4.
It can be seen from
Figure 4 and
Figure 5 that the platform was mainly composed of a PC, a data acquisition card (model YE6231), a piezoelectric acceleration sensor (model CAYD051V), a gearbox, a magnetic powder brake, a three-phase asynchronous motor (model YE2-100L2-4) and an inverter (model G7R5/P011-T4). The specific operation steps were as follows:
- (1)
An air switch was added between the inverter and the power plug to ensure that the experimental process was carried out under safe conditions;
- (2)
The motor was connected to the frequency converter, and then the gearbox and the motor were connected by a belt. The magnetic powder brake and the gearbox were connected via coupling.
- (3)
A piezoelectric acceleration sensor was installed at the axial position of the high-speed shaft end cover of the gearbox and was connected to a PC via an acquisition card.
This was a no-load gearbox experiment, meaning that the magnetic powder brake was closed. In terms of fault diagnosis experiment design, system variability and limited fault coverage will affect the accuracy and reliability of fault diagnosis techniques. As most internal failures in gearboxes occur in the gears, we primarily focused on the gears. The type of gear measured in the experiment is shown in
Figure 3, the motor speed was 900 r/min, and the frequency was 6 kHz. The specific data are shown in
Table 2.
It can be seen from
Table 2 that the gearbox fault diagnosis experiment was divided into four states: pitting, broken teeth, wear and normal. The length of each group of data was 1024 points.
The number of training, verification and test sets are shown in
Table 3.
As can be seen from
Table 3, this study used a total of 4000 sets of sample data. These consisted of 1000 sets of pitted gears, 1000 sets of broken teeth, 1000 sets of worn gears and 1000 sets of normal gears. The corresponding labels were 0, 1, 2, and 3, and these were divided into 2800 sets of training sets, 800 sets of verification sets and 400 sets of test sets.