1. Introduction
In recent years, prognostic and health management (PHM) has become an important field in the intelligent manufacturing industry [
1]. As an additional external excitation source for electric vehicle transmission, the drive motor adds tangential electromagnetic force, axial electromagnetic force, and electromagnetic torque fluctuations and excitations during transmission. These excitations act on the bearings and gears through the rotor and have an undeniable impact on the working conditions and vibration characteristics of the bearings [
2]. The working speed of the motor is significantly higher than that in traditional internal combustion engines. In addition, the rapid power response of the motor increases the impact excitation of the gear, and the recovery of the braking energy also causes frequent changes in the direction of the load borne by the gear. The multi-level factors brought by driving motors accelerate the fatigue damage and fault evolution of the gears and bearings, which creates higher requirements for the early fault identification and classification diagnosis of bearings.
The vibration characteristics of transmission bearings exhibit complex features, such as nonstationarity and nonlinearity, and their vibration signals are often disturbed by the vibration or noise of other components, making it difficult to accurately identify and extract fault features directly from the original vibration signals.
Guo et al. [
3] proposed a modulation signal bispectrum analysis method based on non-Gaussian noise suppression, using autoregressive filters as preprocessing units to effectively process non-Gaussian noise while retaining the advantage of suppressing Gaussian noise, achieving efficient and accurate performance in extracting fault features. Ziani et al. [
4] achieved the detection of bevel gears under variable load conditions by successfully integrating EMD, the Teager–Kaiser energy operator, and an impact detector. This method effectively extracts the fault features from complex vibration signals and has important reference value for the fault feature extraction of rolling bearings. In their research on the fault diagnosis of rolling bearings, Zhang et al. [
5] adopted the ensemble empirical mode decomposition technique and selected the singular value entropy as the key criterion; this enabled the accurate identification and classification of different fault characteristics of rolling bearings through the in-depth analysis of vibration signals. Zhen et al. [
6] introduced a fault detection technique for the analysis of nonstationary vibration signals. This method leverages weighted average ensemble empirical mode decomposition and modulation signal bispectrum analysis to effectively reduce Gaussian noise and decompose the intrinsic modulation components within the vibration signal. Consequently, it enables the detection of faults in both the inner and outer races of rolling bearings.
The feature extraction diagnostic method performed through signal processing heavily relies on the professional experience and knowledge of engineers or advanced manual signal processing methods. Due to the rapid advancements in machine learning within the vision and speech recognition domains, intelligent fault diagnosis techniques for PHM have gained widespread adoption in mechanical fault diagnosis. These methods are favored for their adaptive learning capabilities, automated feature extraction processes, and robust nonlinear regression abilities.
The existing fault diagnosis approaches based on traditional machine learning usually combine signal processing methods, and the diagnosis process is as follows: first, the fault features are extracted and enhanced; then, traditional machine learning algorithms are used to identify bearing faults.
In their research on the fault diagnosis of rolling bearings, Amar et al. [
7] converted the vibration signals of rolling bearings into spectral images. In order to enhance the key features in the images, a two-dimensional average filter was applied to process the spectral images. Finally, an ANN was used to successfully classify the spectral images of bearings with different fault types. Lei et al. [
8] proposed an improved distance evaluation technique and selected six sensitive features from the temporal and spectral characteristics of bearing signals as the input dataset for an adaptive neural fuzzy inference system, achieving bearing fault classification. Liu et al. [
9] used the multi-scale entropy feature index of rolling bearings and implemented fault detection using BPNN. Muruganatham et al. [
10] used singular values of the bearing condition as characteristic indicators and implemented the fault diagnosis of bearings using BPNN. Khazaee et al. [
11] collected the vibration and sound signals of planetary gearboxes and then used wavelet analysis to extract features from the temporal to time–frequency domains. After signal processing, the data from each sensor were used as input to the ANN classifier for primary fault diagnosis. The output of the classification was used as input for the Dempster–Shafer rule, which was used for the fusion of classifiers, thus achieving high accuracy in the final classification. Li et al. [
12] extracted 10 temporal features from the vibration signals of bearings and then conducted fault diagnosis research on the bearings using an ANN optimized with the firefly algorithm. In their study of the early fault diagnosis of rotors, Bin et al. [
13] first used a wavelet packet transform (WPT) to process the original vibration signal; they then reconstructed the wavelet coefficients obtained through WPT processing using EMD to obtain the energy characteristics of each component; finally, they used BPNN to achieve the efficient and accurate diagnosis of early rotor faults.
With the significant achievements in bearing fault diagnosis research utilizing signal processing and traditional machine learning techniques, it has been possible, to some extent, to reduce the dependence on factors such as fault mechanisms and knowledge experience models and improve the accuracy of fault diagnosis. However, there are also certain limitations, which mainly include the following.
- (1)
The precision of fault diagnosis largely depends on the choice of feature indicators and the design of the feature components. However, in the case of strong background noise and weak fault features, selecting sensitive feature indicators is a challenging research task [
14].
- (2)
Due to the presence of weak early fault signals, low signal-to-noise ratios, and the varying operational conditions of transmission bearings, there is a complex mapping relationship between the sensitive features that characterize the degrees of faults. However, the nonlinear feature learning ability of traditional machine learning is limited, making it difficult to fully explore the fault information contained in the vibration signals of transmission bearings [
15].
Cheng et al. [
16] proposed a data-driven intelligent fault diagnosis method for rotating machinery based on a new continuous wavelet transform local binary CNN, and they established an end-to-end diagnostic mechanism. Li et al. [
17] proposed a feature fusion algorithm for bearing fault diagnosis based on an integrated deep CNN and the improved Dempster–Shafer theory. Their algorithm used the root mean square of the spectral features of two sensors as input data and achieved good results on the open-source CWRU bearing dataset. Miao et al. [
18] converted the vibration signal into an angle domain. Then, the corner domain signal was converted into corresponding envelope and squared envelope spectral features and fused into a red–green–blue color image to enhance the sample features and expand the differences between various health states. Finally, a CNN was constructed to complete fault identification. Pang et al. [
19] proposed an intelligent diagnosis method for planetary gear faults based on a deep CNN and vibration bispectrum. Raouf et al. [
20] introduced a feature aggregation network into a two-dimensional CNN and used scale map images to detect faults in the servo motor bearings of industrial robots.
Transfer learning has shown great potential in dealing with data scarcity problems and has become a new research hotspot. Pan et al. [
21] proposed a residual service life prediction method combining a multi-head attention network and adaptive meta-transfer learning, which achieved the accurate residual service life prediction of low-temperature bearings in rocket engines during the steady-state stage. Chen et al. [
22] proposed an online unsupervised anomaly detection framework that did not rely on professional knowledge or labeled historical data. To address the issue of data scarcity, they proposed an adaptive self-transfer learning algorithm based on Gaussian processes, which modeled monitoring data using uncertainty information and achieved the fault diagnosis and monitoring of steam turbines. Fang et al. [
23] proposed a method based on transfer learning and deep transfer clustering, which achieved the high-precision diagnosis of unknown faults. The application of transfer learning in various fields provides new ideas in solving the problem of rocket engine fault diagnosis. Li et al. [
24] proposed an extreme learning machine based on transfer learning to align the distribution differences in data from turbofan engines, and they verified the effectiveness and feasibility of the method through fault diagnosis experiments on turbofan engines. Jamil et al. [
25] proposed an instance-based weight deep transfer learning method that could update source and target machine training samples separately, thereby achieving the high-precision fault detection of wind turbine gearboxes.
However, in service scenarios, environmental noise, electromagnetic excitation, and other transmission components of the gearbox can cause significant interference in vibration signals, making it difficult to extract clear fault features. Traditional fault diagnosis methods rely heavily on professional domain knowledge, prior models, and signal preprocessing methods. The accuracy of fault diagnosis relies on the quality of fault-sensitive feature extraction by vibration signal preprocessing methods. Here, given sufficient sample data, a novel intelligent diagnosis approach is proposed, which leverages the original time-domain vibration signal for end-to-end fault diagnosis, using a convolutional neural network (CNN) as the underlying model framework. This study constructs a two-dimensional CNN network structure with strong feature extraction abilities and optimizes the hyperparameters to achieve the high-precision fault diagnosis and classification of transmission bearings.
This study proposes the RVDCNN intelligent fault diagnosis model based on raw vibration data. The time-domain vibration signals of transmission bearings are converted into continuous two-dimensional numerical matrices. A two-dimensional CNN model is constructed through network structure optimization to train and test the original time-domain vibration signal numerical matrices of bearings, extract and learn abstract fault features of different fault types, and then achieve the fault classification of bearings. To verify the generalization capacity of the RVDCNN intelligent fault diagnosis model, it is utilized for the diagnosis and identification of faults in rolling bearings within a two-speed mechanical automatic transmission system in an electric vehicle, achieving multi-type and high-precision diagnosis and recognition and overcoming the difficulties associated with advanced signal preprocessing technology and professional diagnostic experience.
3. Analysis and Selection of CNN Model Parameters
Based on the advantages of CNNs in adaptive feature learning, a two-dimensional deep CNN intelligent diagnostic model is constructed using the framework of “convolutional layer–pooling layer–convolutional layer–pooling layer–fully connected layer”.
We optimize the CNN’s hyperparameters using fault bearing data from the open-source Case Western Reserve University (CWRU) Bearing Data Center. The CWRU fault bearing sampling frequency is fs = 12 kHz. The minimum speed in the experiment is 1730 r/min and the maximum speed is 1797 r/min, so the number of data acquired in one rotation is Mbearing = 400~416. The input speed of the electric vehicle transmission is significantly increased, leading to a substantial decrease in the quantity of data points collected after one rotation of the gears and bearings and a decrease in the data volume required for a single training sample in the end-to-end intelligent diagnostic model. Each sample’s data size should include at least one vibration signal of the bearing rotation cycle to ensure that the sample contains the characteristics of the bearing cycle sampling point data. Therefore, the input numerical matrix size of the model is designated as 24 × 24, which means that each sample contains 576 data points.
In order to exploit the potential of deep learning, improve the diagnostic accuracy, and obtain better CNN models, the selection of the model’s structural parameters is crucial. Research in the literature has shown that the network layer structures and hyperparameters of CNN models have a complex impact on the diagnostic accuracy [
28]. Shallow network layers can lead to insufficient feature information extracted by the convolutional layers and low classification accuracy. When the depth of the network layers is too great, the weight of the convolutional kernel increases, which not only increases the time cost but may also cause overfitting. If the convolutional kernel size is too small, it can lead to fragmented features after pooling and reduce the sensitivity of feature recognition. If the convolutional kernel size is too large, it will also increase the number of weights, the computational time, and the probability of overfitting. Therefore, in order to obtain a better deep CNN structure, it is necessary to define a reasonable range of hyperparameters.
Using the CWRU rolling bearing fault dataset as the training and test set, we explore the influence of the convolutional kernel number and size and the pooling function on the diagnostic results.
The datasets used for training and testing are presented in
Table 1. Considering the deep feature extraction abilities of the CNN model, the data type is selected as a weak fault diameter of 0.18 mm located on the outer race, and it is divided into four labels under different working conditions. The original time-domain signal of each label is taken for 9.6 s, with a total of 115,200 data points, forming 200 samples, with 576 data points per sample. The fault forms are the same, but the working conditions are different, and the feature recognition of the model is refined and sensitive to small differences.
3.1. Determining the Quantity of Convolutional Kernels
Adopting a structure consisting of a convolutional layer, pooling layer, another convolutional layer, another pooling layer, and a fully connected layer, a two-dimensional deep CNN model is constructed, where the activation function is ReLU, the pooling function is average pooling, the quantity of neurons in the fully connected layer is set to 4, the step size is set to 1, and the learning rate is set to 0.01, with weight decay of 0.005, momentum value of 0.9, and a dropout rate of 0.8. The batch processing volume is 10, and 50 iterations are performed. The first convolutional layer has a kernel size of 5 × 5, with numbers of 6, 8, and 10, respectively. The size and quantity of the second convolutional layer are both 5 × 5 × 24, with 80% of the sample size used as training data and 20% as testing data. After conducting five diagnoses, the average value is computed. The outcomes of these diagnoses are displayed in
Table 2.
With six convolutional kernels, the diagnostic accuracy is 87.00%, which is lower than the 94.20% achieved with eight convolutional kernels. However, further increasing the number to 10 does not enhance the diagnostic results. Therefore, eight convolutional kernels in the first layer is the optimal choice.
After determining the optimal number of convolutional kernels for the first layer, five experiments are carried out to assess the effect of varying the number of convolutional kernels in the second layer on the diagnostic outcomes. In the second layer, the number of convolutional kernels is set to 10, 16, 20, 24, and 30, respectively, while all other parameters remain constant. The diagnostic accuracy results are summarized in
Table 3, revealing a similar trend to that observed with the first layer’s convolutional kernels.
As the number of convolutional kernels increases, the fault recognition rate improves to a certain point. Specifically, the diagnostic accuracy is gradually enhanced as the number of convolutional kernels increases from 10 to 24, suggesting that an increased number of convolutional kernels aids in the better extraction of fault features. However, when the number reaches 30, the recognition accuracy declines to 89.60%. Consequently, for the second layer, a more suitable choice for the number of convolutional kernels is 24.
3.2. Selection of Convolutional Kernel Size
To investigate the influence of the convolutional kernel size on the model’s diagnostic results, experiments were conducted based on a configuration with eight convolutional kernels in the first layer and 24 in the second layer. The size of the first convolutional kernel was varied as follows: 5 × 5, 7 × 7, 9 × 9, 11 × 11, and 13 × 13. Corresponding to each of the first convolutional kernel sizes, the second convolutional kernel size was also varied.
When the size of the first convolutional kernel was 5 × 5, the tested sizes for the second convolutional kernel were 3 × 3, 5 × 5, 7 × 7, and 9 × 9. When the size of the first convolutional kernel was 7 × 7, the tested sizes for the second convolutional kernel were 2 × 2, 4 × 4, 6 × 6, and 8 × 8. When the size of the first convolutional kernel was 9 × 9, the tested sizes for the second convolutional kernel were 3 × 3, 5 × 5, and 7 × 7. When the size of the first convolutional kernel was 11 × 11, the tested sizes for the second convolutional kernel were 2 × 2, 4 × 4, and 6 × 6. When the size of the first convolutional kernel was 13 × 13, the tested sizes for the second convolutional kernel were 3 × 3 and 5 × 5.
A total of 16 deep convolutional neural network models were evaluated, and their fault diagnosis results are presented in
Table 4. When the second layer’s convolutional kernel size was 3 × 3, and when comparing models 1, 9, and 15, which had first-layer convolutional kernel sizes of 5 × 5, 9 × 9, and 13 × 13, respectively, the diagnostic results were 90.30%, 93.80%, and 95.80%. These results indicate a positive trend as the convolutional kernel size increases. Additionally, when comparing models 3, 11, 5, and 12, as well as models 7 and 14, it can be observed that when the second convolutional kernel’s size remains constant, an increase in the first convolutional kernel’s size leads to an improvement in the model’s diagnostic accuracy.
When the second convolutional kernel size is 5 × 5, and when comparing models 2, 10, and 16, it is observed that as the first convolutional kernel size increases from 5 × 5 to 9 × 9, the diagnostic results improve. However, when the first convolutional kernel size is increased to 13 × 13, the diagnostic accuracy decreases by 0.40%. Similarly, when the second convolutional kernel size is 4 × 4, upon comparing models 6 and 13, it is found that as the first convolutional kernel size increases from 7 × 7 to 11 × 11, the diagnostic results decrease by 1.20%.
When the size of the first convolutional kernel remains constant, and when comparing models 1 and 4, it is observed that as the size of the second convolutional kernel increases, the diagnostic result improves from 90.30% to 94.20%. However, further increases to 7 × 7 and 9 × 9 result in a decrease in the diagnostic accuracy. Similarly, when the first convolutional kernel size is 7 × 7, upon comparing models 5 and 8, it is found that as the second convolutional kernel size increases from 2 × 2 to 8 × 8, the diagnostic results show an initial improvement, followed by a decrease and then another improvement. On the other hand, comparing models 12, 13, and 14, when the first convolutional kernel size is 11 × 11, the accuracy increases with the size of the second convolutional kernel, reaching the maximum fault recognition accuracy of 96.40%. Lastly, when the first convolutional kernel size is 13 × 13, changes in the size of the second convolutional kernel do not significantly improve the diagnostic results.
The results of this comprehensive comparison of the models indicate that, while increasing the size of the convolutional kernel can expand the convolutional receptive field, which aids in feature learning and extraction, this relationship is not linear. In fact, blindly increasing the size of the convolutional kernel may result in a decrease in diagnostic accuracy, producing the opposite effect. Therefore, it is crucial to choose the network structure parameters carefully. Upon comparing the models, it is found that the recognition accuracy is optimal when the first convolutional kernel size is 11 × 11 and the second convolutional kernel size is 6 × 6.
3.3. RVDCNN Model Structure
The proposed two-dimensional deep convolutional neural network’s structure is shown in
Figure 3. The network consists of two convolutional layers, two pooling layers, and one fully connected layer. The input layer is a 24 × 24 numerical matrix of the original vibration signals, and the first convolutional layer consists of eight large-sized 11 × 11 convolutional kernels. The pooling layer uses average pooling to maintain feature homogenization. The second convolutional layer consists of 24 small-sized 6 × 6 convolutional kernels. The first convolutional layer uses large-sized convolutional kernels, while the second convolutional layer has three times the number of kernels as the first layer. These configurations are beneficial in extracting features that reflect the different health conditions of rotating machinery. After the pooling layer, the number of neurons in the fully connected layer depends on the fault label of the diagnostic object. As an adaptive parameter, in this section, the number of neurons in the fully connected layer is 4~10, and the results of the fault types are classified.
The model adopts the cross-entropy loss function and mini-batch gradient descent optimization algorithm. The signal data used as input for the model are time-domain vibration acceleration data collected on the surface of the gearbox or gearbox housing. Each sample undergoes an initial layer of large-scale convolution and average pooling to generate eight 7 × 7 abstract numerical feature matrices. Then, after the second layer of deep convolution and average pooling, 24 abstract numerical features are generated. Finally, a one-dimensional feature matrix consisting of 4~10 values is generated through a fully connected layer.