A Sensor Data-Driven Fault Diagnosis Method for Automotive Transmission Gearboxes Based on Improved EEMD and CNN-BiLSTM

Xu, Youhong; Wang, Hui; Xu, Feng; Bi, Shaoping; Ye, Jiangang

doi:10.3390/pr13041200

Open AccessArticle

A Sensor Data-Driven Fault Diagnosis Method for Automotive Transmission Gearboxes Based on Improved EEMD and CNN-BiLSTM

by

Youhong Xu

¹,

Hui Wang

^2,*,

Feng Xu

^3,4,5,

Shaoping Bi

³ and

Jiangang Ye

⁴

¹

School of Information Engineering, Quzhou College of Technology, Quzhou 324000, China

²

School of Information and Software Engineering, East China Jiaotong University, Nanchang 330013, China

³

School of Mechanical Engineering, Quzhou College of Technology, Quzhou 324000, China

⁴

Quzhou Special Equipment Inspection & Testing Research Institute, Quzhou 324000, China

⁵

College of Communication Engineering, Jilin University, Changchun 130022, China

^*

Author to whom correspondence should be addressed.

Processes 2025, 13(4), 1200; https://doi.org/10.3390/pr13041200

Submission received: 16 March 2025 / Revised: 10 April 2025 / Accepted: 14 April 2025 / Published: 16 April 2025

(This article belongs to the Special Issue Fault Diagnosis, Fault Tolerant Control and Process Simulation of Nonlinear Systems)

Download

Browse Figures

Versions Notes

Abstract

:

With the rapid development of new energy vehicle technologies, higher demands have been placed on fault diagnosis for automotive transmission gearboxes. To address the poor adaptability of traditional methods under complex operating conditions, this paper proposes a sensor data-driven fault diagnosis method based on improved ensemble empirical mode decomposition (EEMD) combined with convolutional neural networks (CNNs) and Bidirectional Long Short-Term Memory (BiLSTM) networks. The method incorporates a dynamic noise adjustment mechanism, allowing the noise amplitude to adapt to the characteristics of the signal. This improves the stability and accuracy of signal decomposition, effectively reducing the instability and error accumulation associated with fixed-amplitude white noise in traditional EEMD. By combining the CNN and BiLSTM modules, the approach achieves efficient feature extraction and dynamic modeling. First, vibration signals of the transmission gearbox under different operating states are collected via sensors, and an improved EEMD method is employed to decompose the signals, removing background noise and nonstationary components to extract diagnostically significant intrinsic mode functions (IMFs). Then, the CNN is utilized to extract features from the IMFs, deeply mining their spatiotemporal characteristics, while the BiLSTM captures the temporal sequence dependencies of the signals, enhancing the comprehensive modeling of nonlinear and dynamic fault features. The combination of these two networks enables efficient adaptation to complex conditions, achieving accurate classification and identification of multiple gearbox fault modes. Results indicate that the proposed approach is highly accurate and robust for identifying gearbox fault modes, significantly exceeding the performance of conventional methods and isolated network models. This provides an efficient and intelligent solution for fault diagnosis of automotive transmission gearboxes.

Keywords:

fault diagnosis; improved EEMD; convolutional neural network; bidirectional long short-term memory network; gearbox

1. Introduction

With the rapid development of the new energy vehicle (NEV) industry, the operational stability and safety of automotive transmission gearboxes, as core components of the power transmission system, have drawn increasing attention [1,2,3]. Under complex operating conditions, NEVs must cope with frequent acceleration and deceleration operations, significant load variations, and diverse natural environments, making the service conditions of transmission gearboxes extremely demanding [4,5]. As critical power transmission components, gearboxes directly influence the vehicle’s operational performance and are closely related to driving safety and service life. During gearbox operation, prolonged exposure to mechanical impacts, vibrations, and wear may lead to various potential faults, such as gear wear, fractures, and bearing failures, which can disrupt power transmission or reduce efficiency, ultimately jeopardizing vehicle safety [6,7,8]. Therefore, achieving efficient and accurate fault identification of gearboxes under complex operating conditions, as well as timely monitoring and assessment of their health status, is of great significance for ensuring the driving safety of NEVs, enhancing operational efficiency, and reducing maintenance costs [9,10,11]. Related research not only improves the safety performance of NEVs but also provides theoretical and practical support for the development of intelligent and proactive maintenance technologies.

In recent years, gearbox fault diagnosis technologies based on deep learning have gained widespread attention and demonstrated excellent performance under stable operating conditions. However, automotive transmission gearboxes typically operate under complex and variable conditions, including frequent load changes, dynamic speed fluctuations, and interference from various environmental noises, posing significant technical challenges for fault diagnosis [12]. Under variable-speed conditions, gearbox fault signals exhibit nonlinear dynamic characteristics due to speed variations. Specifically, in the time domain, the occurrence of fault impact signals is irregular; in the frequency domain, fault characteristic frequencies spread across multiple frequency bands as the speed fluctuates [13,14]. Additionally, load variations directly influence the vibration characteristics of gearboxes. When the load increases, the vibration amplitude during the gear meshing process intensifies significantly, while sudden load shocks may generate strong interference signals. These interferences can easily obscure or mix with actual fault characteristic signals, making fault mode identification even more challenging [15,16].

To address the interference of operating condition variations on the performance of deep learning-based fault diagnosis models, researchers have proposed improved methods that integrate traditional techniques to optimize diagnostic performance under complex conditions. Kumar et al. [17] introduced a fault diagnosis method combining discrete wavelet feature extraction and machine learning algorithms. By leveraging wavelet transform for multi-scale signal decomposition, this method highlights key fault features and integrates data from multiple sensors. Liang et al. [18] investigated a diagnostic method for gearbox compound faults by utilizing wavelet transform and convolutional neural networks (CNNs) in combination with multi-label classification techniques. The method employs wavelet transform to extract time–frequency features from vibration signals, converting them into a format suitable for input to the diagnostic model. Jiang et al. [19] developed a novel multi-scale convolutional neural network architecture for monitoring the health status of wind turbine gearboxes. This architecture allows the model to simultaneously learn critical fault features across different directions and scales, significantly enhancing feature extraction capabilities. Xie et al. [20] introduced a diagnostic approach that integrates data from multiple sensors with a CNN. Principal component analysis is applied to convert sensor signals into RGB images, which are then used as input for CNN-based fault detection. Kim et al. [21] introduced a fault diagnosis method based on signal segmentation and a CNN. The initial vibration signals are segmented according to the gear tooth positions, and the segmented dataset is compared with the unsegmented dataset. Liu et al. [22] addressed the challenge of extracting localized weak feature information by proposing a fault diagnosis method combining variational mode decomposition (VMD), singular value decomposition (SVD), and CNNs. VMD decomposes the raw signals into intrinsic mode components with physical significance, SVD extracts localized feature information to generate singular value vectors, and these vectors serve as CNN inputs for model training and fault classification. Xu et al. [23] addressed the challenge of detecting multiple individual faults, which is difficult with conventional diagnostic approaches, by proposing a method that combines an enhanced mixed attention mechanism and a spatial-channel attention module. The model adaptively generates single- or multi-class fault labels, enabling accurate detection of combined faults. Zhang et al. [24] introduced a hybrid fault detection approach utilizing CNNs and Long Short-Term Memory (LSTM) models, enhanced by the sparrow optimization technique for automated tuning of parameters. By leveraging the global exploration potential of the sparrow-inspired method, this approach replaces manual parameter configuration, greatly improving the model’s speed and accuracy. Zou et al. [25] introduced an innovative technique for fault detection by integrating ensemble empirical mode decomposition (EEMD) with LSTM architectures. Data preparation methods combined with EEMD refine signals for clarity, while LSTM autonomously identifies fault-related patterns. This combination boosts the speed of fault feature recognition, significantly improving the accuracy of fault detection. However, traditional EEMD methods use fixed-amplitude white noise for signal decomposition, which can lead to instability when dealing with different types of vibration signals. This instability may cause mode mixing and error accumulation, reducing the reliability of extracted features and the accuracy of diagnosis. At the same time, LSTM mainly relies on time-series information for feature learning. While it excels at capturing long-term dependencies, it struggles to extract local spatial features effectively, limiting its generalization ability under complex conditions such as varying loads, speeds, or environmental noise. Additionally, using LSTM alone for feature extraction often comes with high computational complexity, making training and inference resource-intensive and challenging for efficient deployment in resource-limited online monitoring systems. In contrast, this study proposes an improved fault diagnosis method combining EEMD and CNN-Bidirectional Long Short-Term Memory (BiLSTM) networks. By introducing a dynamic adaptive noise injection mechanism, the noise amplitude adjusts according to signal characteristics, improving the stability of decomposition and the accuracy of feature extraction. The integration of CNN-BiLSTM modules enables efficient feature extraction and dynamic modeling.

The main contributions of this paper are as follows:

(1) Traditional ensemble empirical mode decomposition often faces issues like instability or mode mixing due to the use of fixed-amplitude white noise. This study introduces a dynamic noise adjustment mechanism, allowing the noise amplitude to adapt based on the signal characteristics. This improvement enhances the stability and accuracy of the decomposition, making the extracted intrinsic mode functions (IMFs) more representative of the core features of gearbox vibration signals.

(2) To strengthen the expression of signal features and reduce noise interference, Pearson correlation coefficients are used to evaluate and filter the IMFs. Only components highly correlated with the original signal are retained. This ensures that the reconstructed signal holds greater diagnostic value, providing high-quality input for subsequent learning models.

(3) In the feature extraction stage, the proposed model first uses a CNN to extract local spatial patterns, automatically identifying key characteristics of gearbox faults. Then, the bidirectional long BiLSTM models the dependencies in both forward and backward sequences, improving the model’s ability to capture dynamic features. This combination effectively balances the needs of spatial pattern recognition and time-series modeling.

(4) Using the deep features extracted by CNN-BiLSTM, the classifier can accurately identify the gearbox’s operating conditions and various fault types, such as wear or damage. Experimental results show that the proposed method achieves accuracies of 99.28% and 99.46% on the CWRU and Southeast University datasets, respectively, outperforming existing approaches. Additionally, t-SNE visualizations and confusion matrices further confirm the model’s capability to classify and distinguish between different fault categories effectively.

2. Materials and Methods

This chapter introduces the theory behind the proposed method, which combines improved EEMD with CNN-BiLSTM. To address the instability caused by fixed-amplitude white noise in traditional EEMD, a dynamic noise adjustment mechanism is introduced to enhance the stability and accuracy of signal decomposition. Next, CNNs are used to extract deep spatial features from the IMFs, improving the representation of local patterns. Finally, BiLSTM networks are employed to model time-dependent characteristics, capturing the dynamic behavior of gearbox fault signals for more accurate fault identification.

2.1. Improved Ensemble Empirical Mode Decomposition

EEMD [26] is a data analysis method developed as an extension of empirical mode decomposition (EMD), frequently employed due to its strong capability in the study of dynamic signals. The key concept of EEMD is to inject random noise with varying strengths into the original signal to address the challenge of modal interference encountered during the signal separation procedure of standard EMD. Subsequently, robust IMFs are extracted by iterating the decomposition process and averaging the results, thereby more accurately capturing the key features of the signal. EEMD, with its good adaptability and decomposition accuracy for complex signals, provides an effective solution for nonstationary signal processing. The particular method for decomposition can be described in the following steps:

For the original signal

c (t)

, the disturbed signal is obtained by adding the white noise signal

n_{k} (t)

at the k-th iteration:

\begin{matrix} c_{k} (t) = c (t) + α n_{k} (t) \end{matrix}

(1)

where

α

is the noise amplitude coefficient and

n_{k} (t)

is Gaussian white noise. EEMD obtains the final IMF set by averaging the EMD decomposition results of multiple

c_{k} (t)

.

\begin{matrix} {IMF}_{j} = \frac{1}{M} \sum_{m = 1}^{M} {IMF}_{j, k} \end{matrix}

(2)

where M indicates the overall iteration count and

IM F_{j, k}

corresponds to the j-th mode derived from the m-th decomposition.

However, the traditional EEMD method has the following limitations: its noise amplitude is fixed, making it difficult to adaptively adjust according to the complexity of different signals, which may lead to mode mixing or unstable decomposition results. Additionally, for nonstationary signals under complex operating conditions, the decomposition effect is limited, making it challenging to accurately extract key characteristic modes. To overcome these problems, this study suggests an enhanced EEMD technique that applies a flexible and adaptive fractal noise mechanism along with a mechanism for weighted combination to strengthen the robustness of decomposition and the resolution of modes. The improved EEMD method dynamically adjusts the noise amplitude built upon the unique traits inherent in the signal, where the noise injection amplitude

b_{k}

is defined as follows:

\begin{matrix} b_{k} = β \cdot σ (x) \end{matrix}

(3)

where

β

is the adaptive adjustment coefficient and

σ (x)

indicates the degree of variation within the signal

c (t)

.

This way, the noise amplitude can be adaptively adjusted, enhancing the robustness of the decomposition. To further optimize the IMF results after decomposition, the proposed method introduces a weighted integration mechanism, assigning a weight

w_{k}

to each iteration result. The weight

w_{k}

is dynamically adjusted and grounded in the Signal-to-Noise Ratio (SNR), as detailed below.

\begin{matrix} w_{k} = \frac{1}{{SNR}_{k}} \end{matrix}

(4)

\begin{matrix} {SNR}_{k} = {10 log}_{10} \frac{{∥c (t)∥}^{2}}{{∥n_{k} (t)∥}^{2}} \end{matrix}

(5)

The formula for the weighted IMF proceeds in the following manner:

\begin{matrix} {IMF}_{j} = \frac{\sum_{k = 1}^{K} w_{k} \cdot {IMF}_{j, k}}{\sum_{k = 1}^{K} w_{k}} \end{matrix}

(6)

Through weighted integration, the accuracy of the IMF decomposition is enhanced, and the effect of noise on the outcome is lessened. The introduction of the dynamic adaptive noise injection mechanism improves the stability of the decomposition process, while the weighted integration mechanism enhances the quality of the IMF results, collectively improving the ability to extract key modes.

2.2. Convolutional Neural Network

The CNN [27,28] demonstrates powerful feature extraction capabilities when processing vibration signals. Through convolution operations, it efficiently captures the local characteristics of the signal while avoiding the reliance on manually designed features typical of traditional methods. Compared to 2D CNNs, the convolution kernels of 1D CNNs slide along the time or sequence dimension, making them suitable for handling time dependencies and sequential features in data. By stacking layers progressively, CNNs can extract higher-level global features from lower-level basic patterns. This local-to-global feature learning mechanism enables 1D CNNs to show broad application potential in fields such as time series analysis and vibration signal processing.

Given the input signal

f = [f (1), f (2), \dots, f (L)]

, with the convolution kernel

w \in R^{k}

(where k is the length of the convolution kernel), the definition of the convolution layer on a one-dimensional signal is as follows:

\begin{matrix} y (i) = \sum_{j = 1}^{k} f (i + j - 1) w (j) + b \end{matrix}

(7)

where i denotes the beginning location of the sliding window and y is the magnitude of the convolution result at the corresponding output signal.

The pooling layer helps reduce the dimensionality and simplify the feature map generated by the convolutional layer, assisting the convolutional neural network in extracting more representative features, reducing computational load, and improving the model’s generalization ability. Max pooling, as a commonly used pooling method, preserves important features while lowering the network’s computational demand and enhances the network’s robustness to variations in the input data:

\begin{matrix} p (i) = max {x (i \cdot s), \dots, x (i \cdot s + k - 1)} \end{matrix}

(8)

where s is the stride size, k is the window size of the max pooling, x is the input data to the convolutional layer, and y is the output result after pooling.

In CNNs, the fully connected layer functions to combine the extracted local attributes from the convolutional and pooling layers into comprehensive global representations, performing the ultimate classification or decision-making based on this information. While the convolutional and pooling layers emphasize localized details, the fully connected layer synthesizes these attributes to uncover richer and more holistic global insights:

\begin{matrix} o = w x + b \end{matrix}

(9)

where w represents the corresponding weight vector, x represents the features extracted by the previous layers of the network, and b represents the corresponding bias vector.

In a typical CNN model, as shown in Figure 1, an output layer is usually added at the end to make the final classification or regression decision based on the features extracted by the preceding layers. For classification tasks such as fault diagnosis, the output layer maps the network’s learning results to different categories, commonly employing the Softmax mechanism to compute the likelihood of each category. The exact formulation is provided below:

\begin{matrix} Softmax (x_{i}) = \frac{e^{x_{i}}}{\sum_{j} e^{x_{j}}} \end{matrix}

(10)

The Softmax function is a probability distribution function, where

x_{i}

represents the corresponding class score and j denotes the corresponding class.

2.3. Bidirectional Long Short-Term Memory Neural Network

BiLSTM [29,30,31] is an enhanced model developed based on the standard LSTM. Unlike traditional unidirectional LSTM, BiLSTM introduces a bidirectional information flow mechanism, enabling it to capture patterns concurrently from sequences in both forward and backward directions. This means that when processing sequential data, BiLSTM can capture not only past information but also future information, providing a comprehensive understanding of the context at each moment. This bidirectional information flow enables BiLSTM to more effectively capture dependencies and complex feature patterns in long sequences, significantly improving model performance, particularly in tasks where contextual relationships need to be considered.

LSTM consists of a forget gate, an input gate, and an output gate, which are detailed as follows.

The forget gate is an important component of the LSTM network. It determines the extent to which information from the previous time step is retained by controlling how much of the previous memory is preserved. Specifically, a value within the interval of 0 to 1 is calculated by the forget gate depending on the current input and the preceding hidden state. This value acts as the forget ratio, determining how much of the retained information from the preceding time step is retained or discarded. The specific definition is as follows:

\begin{matrix} γ_{t} = σ (Φ_{f} \cdot [h_{t - 1}, x_{t}] + φ_{f}) \end{matrix}

(11)

where

γ_{t}

is the corresponding output result,

Φ_{f}

is the corresponding weight matrix vector,

φ_{f}

is the bias,

x_{t}

represents the output at the current moment, and

h_{t - 1}

represents the hidden information from the preceding time step.

The input gate is responsible for controlling the degree of update of the cell state with respect to the current input information. LSTM can flexibly decide which historical information and current information is necessary for updating the cell state, avoiding excessive accumulation or forgetting of information. This makes the model more stable and effective when processing long sequences. The specific definition is as follows:

\begin{matrix} μ_{t} = σ (Λ_{i} \cdot [h_{t - 1}, x_{t}] + ξ_{i}) \end{matrix}

(12)

where

μ_{t}

denotes the result at the associated time step,

Λ_{i}

denotes the weight matrix for this step, and

ξ_{i}

denotes the offset term.

The output gate dynamically controls the transmission of useful information to the next time step while suppressing parts that may introduce noise or interference, thereby enhancing the model’s expressiveness and stability. The specific definition is

\begin{matrix} o_{t} = σ (Φ_{n} \cdot [h_{t - 1}, x_{t}] + φ_{n}) \end{matrix}

(13)

where

o_{t}

indicates the output for the relevant time step,

Φ_{n}

signifies the relevant matrix of weights, and

φ_{n}

denotes the offset term.

BiLSTM consists of two LSTM modules: one works on the sequence starting from the beginning and moving to the end, while the other starts from the end and progresses to the beginning, as shown in Figure 2. The outputs of the two modules are combined either by concatenation or weighted averaging to form the final output. The specific definition is

\begin{matrix} {\vec{h}}_{t} = LST M_{forward} (x_{t}) \end{matrix}

(14)

\begin{matrix} {\overset{\leftarrow}{h}}_{t} \end{matrix} = LST M_{backward} (x_{t})

(15)

h_{t} = [{\vec{h}}_{t}; {\overset{\leftarrow}{h}}_{t}]

(16)

where

{\vec{h}}_{t}

represents the result generated by the forward LSTM,

{\overset{\leftarrow}{h}}_{t}

indicates the resultproduced by the backward LSTM, and h_t signifies the combined output after concatenation.

3. Experimental Result

Python 3.8 is used as the language for programming in this experiment, PyCharm as the development tool, and an NVIDIA GeForce RTX 3060 Ti graphics card for hardware configuration. The main steps of the experiment include signal preprocessing and IMF decomposition (using improved EEMD) and feature extraction (through a CNN and BiLSTM). First, necessary Python libraries such as NumPy 1.24.3, SciPy 1.10.1, Matplotlib 3.7.1, PyTorch 1.13.1, and the EEMD library (e.g., PyEMD) are installed to process signal data and train deep learning models. To speed up computation, the GPU-compatible version of PyTorch is installed to fully utilize the 3060 Ti graphics card. During network training, common parameters are set, such as learning rate (0.001), batch size (64), optimizer (e.g., Adam), and loss function (e.g., cross-entropy loss). During training, the training loss and validation accuracy are monitored, and network parameters are adjusted to achieve optimal model performance. Finally, the experimental results are presented using spectrograms, IMF decomposition result graphs, and model training process charts, with accurate classification of different fault modes.

3.1. A Fault Diagnosis Method for Automotive Gearboxes Based on Improved EEMD and CNN-BiLSTM

As shown in Figure 3, the proposed gearbox fault diagnosis method integrates improved EEMD with CNN-BiLSTM techniques. By combining signal decomposition, adaptive filtering, deep feature extraction, and classification decisions, this approach achieves efficient identification of gearbox faults. Compared to traditional methods, it not only significantly improves diagnostic accuracy but also demonstrates greater robustness and adaptability under complex conditions, offering a smart and reliable solution for real-world engineering applications. The process consists of four main stages.

3.1.1. Signal Preprocessing and Decomposition

After collecting vibration signals from the gearbox, the improved EEMD is applied for signal decomposition. Unlike conventional EEMD, which uses fixed-amplitude white noise and can result in unstable decomposition or mode mixing, a dynamic noise adjustment mechanism is introduced to adapt noise amplitude based on the signal’s characteristics. This enhances the stability and accuracy of the decomposition. The decomposed signal comprises multiple IMFs, each representing different frequency components, facilitating further analysis.

3.1.2. IMF Selection and Signal Reconstruction

To ensure that the reconstructed signal accurately reflects the core features of the original data while minimizing noise interference, Pearson correlation coefficients are used to evaluate the relevance of all IMFs. Only IMFs with a high correlation to the original signal are retained for reconstruction. This step removes irrelevant or low-relevance components, further enhancing the representation of fault-related features.

3.1.3. Deep Feature Extraction with CNN-BiLSTM

The filtered and reconstructed signal is fed into a CNN, which extracts local spatial and temporal features through multi-layer convolutional operations, automatically identifying significant patterns related to faults. The CNN excels at capturing local spatial details, enabling the identification of issues such as gear wear or cracks. The deep features extracted by the CNN are then passed to a BiLSTM network, which models the sequential dependencies in both forward and backward directions. By leveraging its gating mechanism, BiLSTM captures dynamic behaviors of the gearbox under various operating conditions, improving the differentiation of fault patterns.

3.1.4. Fault Classification and Diagnosis

The multidimensional deep features extracted by the CNN-BiLSTM combination are input into a classifier, which accurately identifies the operating state of the gearbox. This allows for efficient classification of normal states and various fault modes, such as gear wear or missing teeth. Experimental results validate the effectiveness of this method. The confusion matrix clearly demonstrates the classification accuracy for different fault categories, showing the model’s strong recognition ability. t-SNE visualization reveals that the proposed method forms distinct clusters in feature space for different fault types, confirming its discriminative power. The accuracy and loss curves over training iterations indicate stable convergence, effectively avoiding overfitting while maintaining high accuracy on the test set.

3.2. Dataset Introduction

In this study, we utilized the publicly available experimental dataset from Southeast University to investigate automotive transmissions. This dataset is extensive use in gearbox research and holds high authority and representativeness. By selecting this dataset, we can not only validate the effectiveness of the model using publicly available, high-quality data but also provide a reference for comparative experiments by other researchers. Due to the complexity and limitations in the data collection process for real automotive gearbox faults, the use of this dataset helps overcome the difficulties in data acquisition during practical testing while ensuring the integrity and reproducibility of the experimental data. Specifically, the vibration signal sampling frequency and tested gear parameters in this dataset form a certain mapping relationship with real-world applications, providing a reliable experimental foundation for automotive gearbox fault diagnosis.

The data used in this study were provided by Southeast University from their transmission system dynamic simulator. The dataset has a rotational speed of 20 Hz (equivalent to 1200 rpm) and a load of 0 V (corresponding to 0 Nm), and it includes five different fault types: normal state, rolling element fault, composite fault, inner race fault, and outer race fault. The dataset contains a total of 5000 samples, with each sample having a length of 1024 data points. To ensure the effectiveness of model training and testing, the dataset is divided into a training set and a testing set with a ratio of 7:3, which helps validate the model’s generalization ability and robustness. Through in-depth analysis of these fault data, this study aims to achieve efficient automotive gearbox fault diagnosis. Specific data details are shown in Table 1.

3.3. Improved EEMD Decomposition

The conventional EEMD approach encounters challenges when analyzing intricate signals. For instance, the static Gaussian white noise addition method cannot align well with the localized features of the signal, leading to a discrepancy between the added noise and the signal’s core configuration, which may cause blending of intrinsic signal components. Moreover, the standard mean-based combination technique used in EEMD does not effectively utilize the relationship between decompositions and the original signal, potentially leading to a reduction in key signal details and inconsistent final outputs.

To overcome these limitations, we propose a flexible fractal noise addition and correlation-based combination framework. This method adjusts the magnitude and frequency features of the added noise in real time, ensuring better alignment with the underlying framework of the signal. Recursive-patterned noise replaces traditional Gaussian noise to further enhance the precision of signal breakdown. Additionally, we introduce an adaptive weighting method guided by correlation, which allocates varying significance to each decomposition based on its relationship with the original signal, refining the resulting IMF.

Using rolling bearing faults and combined faults as case studies, Figure 4 illustrates the outcomes of decomposing these two fault types with the conventional EEMD technique. Subfigure (a) presents the decomposition results for rolling bearing faults, while subfigure (b) displays those for combined faults. As shown in Figure 4, although EEMD can extract localized signal patterns effectively, certain intrinsic mode functions (IMFs) exhibit noticeable noise interference and component mixing, leading to the degradation of signal clarity and the loss of critical information.

Figure 5 provides the decomposition results for the same dataset using the enhanced EEMD approach. Subfigures (a) and (b) show the refined decomposition for rolling bearing and combined faults, respectively. The enhanced method employs a dynamically adjusted fractal noise addition mechanism combined with correlation-driven weighted integration. This enables better retention of the signal’s essential characteristics while minimizing the impact of noise and mode overlapping.

The improvements significantly enhance decomposition precision and the clarity of signal representation. A comparative analysis highlights the ability of the improved EEMD to more accurately delineate the underlying structures and distinctive features of different fault types, particularly for complex combined faults. It achieves a more effective separation of modes, which substantially bolsters the dependability and precision of fault detection processes.

In the experiment, we compared the decomposition results of EEMD and improved EEMD methods on the same signal. Specifically, we applied Fast Fourier Transform (FFT) to the IMF components generated by each method to extract their spectral amplitude characteristics, allowing a more intuitive comparison of the differences between the two methods. The spectral comparison is shown in Figure 6, which illustrates the differences in the spectra of IMF components generated by the EEMD and improved EEMD methods. In the frequency spectrum, we can clearly see the distinct behaviors of the IMF signals in the frequency domain for both methods. EEMD, when processing complex signals, particularly high-frequency components, tends to introduce some noise interference, which often manifests as unnecessary high-frequency peaks in the frequency spectrum. This is due to the inherent noise impact in EEMD, which cannot fully eliminate the noise in the high-frequency components. In contrast, the improved EEMD method, with more effective noise suppression, can focus more effectively on the main frequency and its harmonics, making the main frequency peaks more prominent in the spectrum with less noise and thus better reflecting the true physical characteristics of the signal. Taking the rolling element fault and compound fault signals as examples, the IMF generated by EEMD may exhibit significant noise interference in the high-frequency region, causing the high-frequency components in the spectrum to appear unclear. On the other hand, the enhanced EEMD method is more effective in emphasizing the main frequency features of the signal, making the main frequency peaks more distinct and reducing noise interference, thereby more accurately reflecting the fault features.

In mechanical fault diagnosis tasks, selecting the most representative modes from the multiple IMFs obtained from signal decomposition is a crucial step to improve diagnostic accuracy. To select IMFs, this method uses the Pearson correlation coefficient, measuring the relationship between each IMF and the original signal, and retains those with higher correlations. This effectively extracts key fault-related information while reducing redundancy and noise. The main reason for selecting the Pearson correlation coefficient as the filtering criterion is its ability to quantify the linear relationship between two signals. Specifically, the Pearson correlation coefficient effectively evaluates the correlation between each IMF and the original signal. Its value ranges from −1 to 1, with values closer to 1 indicating a stronger correlation and values closer to 0 indicating a weaker correlation. By calculating the Pearson correlation coefficient between each IMF and the original signal, it becomes straightforward to determine whether an IMF contains key information similar to the original signal. In practical applications, selecting the appropriate correlation threshold is essential, typically determined through experimentation to balance feature extraction and noise suppression. When combined with techniques such as spectral analysis, this method can effectively reveal the frequency components of mechanical faults, enhancing diagnostic accuracy and robustness, particularly in gearbox bearing fault diagnosis.

As shown in Table 2, IMF1 and IMF2 under a rolling element fault, as well as IMF6 and IMF7 under a compound fault, all exhibit high correlation (greater than the threshold of 0.5). This indicates that these IMFs can effectively reflect the key frequency components in the signal. Therefore, during the signal reconstruction process, these modes can effectively preserve the main features while suppressing the impact of noise.

3.4. Discussion and Analysis of Different Comparison Methods

This study presents an automotive transmission method based on an enhanced EEMD combined with CNN-BiLSTM and validates its superiority through comparative experiments. The method first decomposes the original vibration signal using improved EEMD and selects the IMF highly correlated with the original signal using Pearson’s correlation coefficient, thereby achieving precise extraction of key frequency components and noise suppression. We compared it with a traditional CNN, DenseNet [32], ResNet18 [33], CNN-LSTM [34], and the unmodified EEMD-CNN-BiLSTM. These comparison methods each have their own characteristics: the traditional CNN focuses on local feature extraction but struggles to capture long-term dependencies; CNN-LSTM improves the extraction of spatiotemporal features; DenseNet and ResNet18 enhance information flow in deep networks through dense connections and residual learning; and the unmodified EEMD-CNN-BiLSTM also uses EEMD for signal decomposition but falls short in noise suppression and strengthening key features compared to the improved method. Experimental results show that the improved EEMD-CNN-BiLSTM exhibits significant advantages in fault diagnosis accuracy and robustness. It can more effectively extract core information reflecting fault states from complex signals, providing an efficient and reliable solution for automotive gearbox fault diagnosis.

3.4.1. Iterative Curve Analysis

Figure 7a shows the performance progression curve, illustrating that the enhanced EEMD-CNN-BiLSTM framework rapidly boosts its precision during the initial training phase and maintains its lead throughout, eventually achieving the highest grouping accuracy rate. Meanwhile, the error metric evolution plot in Figure 7b reveals that the method stabilizes rapidly to a low error level with reduced oscillations, reflecting excellent learning consistency and steady-state behavior. By leveraging optimized EEMD decomposition, the framework more effectively captures signal-specific attributes. Coupled with CNN’s strength in identifying localized characteristics and BiLSTM’s ability to model dual-direction time-series links, the overall approach excels in key feature extraction and interference reduction.

In comparison, the unoptimized EEMD-CNN-BiLSTM approach, while quick to stabilize and achieving good accuracy, is slightly slower in improving precision and final outcomes. Although ResNet18 and DenseNet use skip connection strategies and compact network linking to improve learning in deeper networks, their ability to handle intricate mechanical failure attributes does not match the proposed framework. The traditional CNN struggles with time–sequence dynamics and deeper pattern learning, leading to lower precision and slower error stabilization. CNN-LSTM, though offering some enhancement by combining convolutional layers with one-way temporal LSTM, remains limited in addressing dual-sequence temporal links, placing its performance between the conventional CNN and the enhanced EEMD-CNN-BiLSTM approach.

Overall, the experimental findings highlight the notable advantages of the optimized EEMD-CNN-BiLSTM framework for sophisticated machine fault diagnosis. By refining preprocessing and architectural design, this approach preserves essential signal attributes while minimizing unwanted disturbances. It achieves higher precision, faster stabilization, and consistent training outcomes. Compared to alternative approaches, the proposed method demonstrates improved resilience and real-world applicability, delivering a practical solution for diagnostic challenges.

3.4.2. Visual Analysis of Confusion Matrix

As shown in Figure 8, the specific diagnostic results of six different models are presented. The confusion matrix offers a comprehensive breakdown of the classification outcomes for each fault type: rows correspond to the actual categories, columns denote the predicted categories, diagonal entries show the count of correctly identified samples, and nondiagonal entries indicate instances of incorrect predictions. By analyzing the confusion matrix, it becomes clear which categories are more prone to confusion, offering insights for further model optimization. For instance, the improved EEMD-CNN-BiLSTM method outperforms others across all categories, especially in predicting rolling element faults and compound faults, with significantly higher accuracy compared to other methods. This aligns with its highest average accuracy of 99.46%, as shown in Table 3. In contrast, the traditional CNN method, due to its limited ability to capture temporal information, exhibits a higher number of misclassifications, particularly in predicting compound faults and inner ring faults. This matches its relatively low accuracy of 86.23% and its fast diagnosis time of 0.18 s. Additionally, ResNet18 and DenseNet, despite achieving relatively high accuracies of 98.43% and 96.26%, respectively, show certain misclassification cases in specific categories, such as outer ring faults and compound faults. This suggests that while these methods are effective in feature extraction, they still face challenges in distinguishing complex fault patterns.

3.4.3. T-SNE Visualization Analysis

Figure 9 illustrates the t-SNE [35] dimensionality reduction results for different comparison methods. t-SNE is an effective high-dimensional data visualization technique that maps data into a two-dimensional space, helping us observe the relationships and distributions of samples from different categories intuitively. As shown in Figure 9, the improved EEMD-CNN-BiLSTM method demonstrates tightly clustered sample distributions in the low-dimensional space, with clear separations between different fault categories. This indicates its excellent performance in feature extraction and its ability to effectively distinguish between various fault types. This result aligns with its superior accuracy and convergence performance, further validating its advantages in complex fault diagnosis tasks. In contrast, the results of other comparison methods reveal more scattered feature distributions. Particularly, the traditional CNN method shows significant overlap among samples, leading to suboptimal classification performance. Although ResNet18 and DenseNet exhibit some improvements, certain categories still display considerable overlap, suggesting limitations in these methods’ fault feature extraction and classification capabilities. The CNN-LSTM method achieves relatively distinct distributions, but compared to the improved EEMD-CNN-BiLSTM method, it still shows some degree of category overlap, especially for complex fault types such as compound faults and inner ring faults. This indicates its relative inadequacy in capturing discriminative features. In Figure 9, the color scale represents different fault types, with each fault type assigned a unique label: 0 for normal, 1 for rolling element fault, 2 for compound fault, 3 for inner race fault, and 4 for outer race fault. These labels are visually distinguished by corresponding colors in the figure.

In summary, the t-SNE visualization further confirms the outstanding performance of the improved EEMD-CNN-BiLSTM method in fault diagnosis tasks. It demonstrates a superior ability to extract and differentiate fault signal features, achieving stronger classification capabilities and higher accuracy.

3.4.4. Analysis of Evaluation Indicators

Table 3 highlights significant differences among the methods in terms of accuracy, precision, recall, and F1-score, reflecting their varying abilities to extract fault features, suppress noise, and capture temporal information. The traditional CNN method achieved an average accuracy of 86.23%, precision of 84.12%, and recall of 87.05%, resulting in an F1-score of 85.56%. These results indicate limitations in its ability to extract fault features effectively. While its diagnosis time of 0.18 s provides an advantage in response speed, its performance in 5-fold cross-validation (86.0 ± 2.1) suggests poor generalization and instability across different data splits, making it less suitable for complex fault diagnosis tasks.

DenseNet and ResNet18 achieved accuracy rates of 96.26% and 98.43%, respectively, with significant improvements in precision, recall, and F1-score. DenseNet recorded a precision of 95.51%, recall of 96.81%, and F1-score of 96.15%, while ResNet18 reached a precision of 98.15%, recall of 98.75%, and F1-score of 98.42%. Their 5-fold cross-validation results (96.3 ± 1.3 for DenseNet and 98.4 ± 0.9 for ResNet18) exhibited smaller fluctuations, indicating better stability and reliability. Their diagnosis times were 0.23 and 0.26 s, respectively, offering a balanced performance overall.

The CNN-LSTM method, combining convolutional layers with LSTM’s sequential modeling, achieved an accuracy of 98.86%, with a precision of 98.71%, recall of 98.92%, and F1-score of 98.83%. Its 5-fold cross-validation result of 98.8 ± 0.7 validated its stability in capturing temporal features. However, its reliance on unidirectional LSTM presents some limitations in modeling dynamically changing fault signals. The diagnosis time was 0.25 s.

The EEMD-CNN-BiLSTM method used EEMD to decompose the raw signal, extracting high-quality IMFs and combining a CNN with BiLSTM for feature extraction and temporal modeling. This approach achieved further improvements with an accuracy of 99.41%, precision of 99.33%, recall of 99.45%, and F1-score of 99.38%. Its 5-fold cross-validation result of 99.41 ± 0.5 demonstrated consistent performance across different data splits, highlighting strong generalization ability. Although its diagnosis time was slightly higher at 0.45 s, its overall performance was excellent.

Building on this, the improved EEMD-CNN-BiLSTM method introduced a dynamic noise adjustment mechanism to enhance the stability of signal decomposition, combined with CNN and BiLSTM for efficient feature extraction and temporal modeling. This resulted in slight increases in accuracy to 99.46%, precision to 99.41%, recall to 99.51%, and F1-score to 99.45%. Its 5-fold cross-validation result of 99.46 ± 0.4 showed minimal fluctuations, indicating exceptional accuracy and robustness across various splits. The diagnosis time was also reduced to 0.43 s, making it the most optimal solution overall.

In summary, as model complexity increased, performance metrics generally improved, with 5-fold cross-validation results further supporting the robustness and stability of the methods. The improved EEMD-CNN-BiLSTM method consistently outperformed others across all evaluation metrics and cross-validation results, demonstrating its high accuracy, strong generalization, and real-time capabilities. This makes it the most promising diagnostic model for complex conditions in this study.

4. Different Datasets and Feasibility Analysis

In this section, we analyzed different datasets used for the proposed fault diagnosis model, highlighting their differences in features such as fault types, data collection processes, and real-world applicability of the data. In addition, we conducted a feasibility analysis to evaluate the performance of the model on different datasets and its deployment potential in actual low-power hardware environments. We also investigated the challenges associated with each dataset and explored how to adapt the model to different scenarios.

4.1. CWRU Dataset Validation

To closely replicate real-world scenarios, we used the Case Western Reserve University (CWRU) dataset, which simulates the operating conditions of industrial equipment such as motors and gearboxes. This dataset includes data collected under various loads, speeds, and environmental noise conditions, covering several typical speeds (1730–1797 rpm) and load levels (0–3 HP). It provides vibration signals for five operating conditions: normal, rolling element fault, inner race fault, outer race fault, and a simulated composite fault (inner race + outer race). Each fault type is further categorized by different fault sizes (0.007, 0.014, and 0.021 inches) to increase the diversity and complexity of the samples. The composite fault condition, in particular, combines defects of different modes and scales, presenting a greater challenge for fault diagnosis and offering robust support for evaluating the stability and adaptability of the model. As summarized in Table 4, these realistic conditions reflect the model’s stability and reliability in dynamic environments, validating its practical applicability in complex and variable settings.

4.2. Comparison Method Test Results

When testing with the CWRU dataset, various fault diagnosis methods showed differences in performance across metrics such as accuracy, precision, recall, and F1-score, reflecting how the models handled the complexity of data and noise interference under different operating conditions. These results are shown in Table 5. Overall, as the model structure was enhanced, diagnostic performance significantly improved, demonstrating strong generalization ability.

The traditional CNN method achieved an accuracy of 84.57%, precision of 82.80%, recall of 85.95%, and F1-score of 84.35%. While it showed basic recognition ability under simpler conditions, its feature extraction capability was limited in more complex scenarios with varying speeds and loads, which affected its overall performance. DenseNet and ResNet18 achieved average accuracies of 94.62% and 97.23%, respectively, with both precision and recall maintaining high levels. By using dense connections and residual learning, these networks were better able to capture feature details from complex fault signals, enhancing model stability and robustness. The CNN-LSTM method further improved the model’s ability to capture temporal features, with an accuracy of 97.65% and an F1-score of 97.59%, demonstrating its effectiveness in capturing dynamic signal changes under varying conditions, making it suitable for state recognition during vehicle operation. The EEMD-CNN-BiLSTM method introduced EEMD decomposition in the signal preprocessing stage, effectively separating noise and fault features. Combined with the bidirectional LSTM for deep temporal modeling, its accuracy rose to 99.11%. The improved model further optimized the EEMD decomposition strategy and feature extraction structure, achieving an accuracy of 99.28% and an F1-score of 99.28% on the CWRU dataset, with all metrics reaching optimal levels.

Overall, the improved EEMD-CNN-BiLSTM method maintained exceptionally high diagnostic accuracy and stability across varying speeds, loads, and composite faults, demonstrating excellent generalization ability and strong potential for real-world deployment.

4.3. T-SNE Visualization Analysis

On the CWRU dataset, the t-SNE clustering results of different methods show the differences in model performance for fault diagnosis, as shown in Figure 10. First, while the traditional CNN method has a fast diagnosis time, its lower accuracy (84.57%) leads to a mixed distribution of fault categories in Figure 10a, with unclear classification boundaries. This suggests that the CNN struggles to capture complex fault features and suppress noise interference, resulting in suboptimal clustering. In contrast, the DenseNet (accuracy 94.62%) and ResNet18 (accuracy 97.23%) methods, with their more complex network structures, are better at extracting fault features and improving classification performance. In Figure 10b,c, the distribution of fault categories is more concentrated, and the clustering results have significantly improved, especially for ResNet18, where the separation between fault categories is much clearer, further demonstrating the advantages of residual learning.

The CNN-LSTM method (accuracy 97.65%), by introducing the LSTM module, effectively captures temporal features, further improving the separation of the clusters. In Figure 10d, the classification performance is enhanced, with clearer boundaries between the clusters, especially when handling fault modes with significant temporal changes, leading to a noticeable improvement in diagnostic accuracy. EEMD-CNN-BiLSTM (accuracy 99.10%), combining the advantages of signal preprocessing and bidirectional LSTM, handles complex fault signals better. Figure 10e shows a more distinct and compact clustering of categories, indicating its strengths in signal extraction and temporal modeling. Finally, the improved EEMD-CNN-BiLSTM (accuracy 99.28%) achieves the best performance in both accuracy and clustering. Figure 10f presents a very clear distribution of categories, further verifying its accuracy and robustness in fault diagnosis.

Through the visual analysis of t-SNE clustering, it is clear that as the model complexity increases, the ability to distinguish fault categories improves significantly. Ultimately, the improved EEMD-CNN-BiLSTM method not only achieves optimal performance in accuracy but also demonstrates superior clustering results. This indicates that the method can effectively tackle complex fault diagnosis tasks on the CWRU dataset, successfully extract key fault features, and suppress noise interference, further proving its superiority and practical value in real-world applications.

4.4. Feasibility Analysis

In our study, the improved EEMD-CNN-BiLSTM method showed excellent performance in terms of accuracy, precision, recall, and F1-score. For example, the accuracy reached 99.46% on the Southeast University dataset and 99.28% on the CWRU dataset. These results demonstrate that the method performs exceptionally well in fault diagnosis tasks, effectively identifying and classifying various types of faults. However, when deploying this high-precision model to an actual vehicle transmission gearbox online monitoring system, real-time performance and resource usage issues must be addressed.

To tackle this challenge, we introduced model compression techniques, including pruning and quantization. Through pruning, we removed some redundant parameters and smaller weights from the network, reducing the computational load and storage requirements. For example, pruning reduced the model size by about 20%. With quantization, we converted floating-point 32-bit parameters into 8-bit integers, further decreasing the model’s storage usage and computational complexity. These compression techniques significantly reduced the storage and computational resource requirements while also improving the model’s efficiency on low-power hardware.

Testing results on low-power hardware showed that after model compression, the EEMD-CNN-BiLSTM method, while experiencing a slight drop in accuracy and other evaluation metrics (for example, accuracy dropped from 99.28% to 98.9%), still maintained high diagnostic precision. Additionally, the computation time of the model was significantly reduced. For instance, the original model took 0.45 s to process a sample, while the compressed model only required 0.23 s. This indicates that the compressed model not only achieved significant improvements in real-time performance but also retained sufficient accuracy on low-power hardware, meeting the practical requirements of the online monitoring system, as shown in Table 6.

4.5. Comparison Between the Proposed Method and the Literature Results

Compared with existing studies, the literature [28] reports an accuracy of 96.45% under the class imbalance condition of the CWRU dataset, while the improved EEMD-CNN-BiLSTM method proposed in this study achieved 99.28% under the same condition, significantly outperforming existing methods. Under balanced data conditions, our method also achieved a high accuracy of 99.46% on the Southeast University dataset, surpassing the 97.1% accuracy achieved in the literature [27]. Additionally, compared to the CNN-LSTM method in the literature [29], which achieved only 83.63% accuracy under cross-domain conditions, our method demonstrated stronger generalization ability and robustness, validating its practicality and superiority under real-world complex conditions, as shown in Table 7.

5. Conclusions

This paper proposes an improved EEMD-CNN-BiLSTM method for automotive gearbox fault diagnosis. Additionally, t-SNE visualization results show that the model demonstrates a good ability to distinguish between different fault categories, with clear clustering. The improved EEMD effectively extracts key features, the CNN captures deep spatial features, and BiLSTM enhances the capture of temporal information, making the model both highly accurate and robust. At the same time, the diagnosis time is kept under 0.43 s, ensuring strong real-time performance, which meets industrial application requirements. In summary, the method shows good stability and scalability under complex working conditions, providing a reliable solution for intelligent fault diagnosis in vehicle gearboxes.

First, the experimental results show that this method outperforms several comparison models in terms of accuracy, temporal modeling ability, and feature extraction capability. Specifically, it achieves an accuracy of 99.46% on the Southeast University dataset and 99.28% on the CWRU dataset, significantly higher than existing methods in the literature (the highest being only 97.1%). Additionally, t-SNE dimensionality reduction analysis further validates its distinct separation between different fault categories. Moreover, the method demonstrates high real-time performance in diagnosis, meeting the industrial demands for both fault diagnosis efficiency and accuracy.

Secondly, through a comparative analysis with the unmodified EEMD-CNN-BiLSTM, ResNet18, DenseNet, CNN-LSTM, and traditional CNN methods across multiple evaluation metrics, the comprehensive performance advantages of the proposed method are further confirmed. Whether in accuracy, precision, recall, or F1-score, the improved method achieves optimal performance, especially with an accuracy close to 99.5% on both the CWRU and Southeast University datasets. This indicates that the proposed method not only demonstrates strong practicality in complex mechanical fault diagnosis but also provides new insights and references for research and practical applications in related fields.

Finally, in terms of future research directions, future work can explore the potential of multimodal data fusion and transfer learning. Moreover, optimizing model structures and diagnostic strategies for complex working conditions and multi-fault scenarios could further enhance diagnostic performance and generalization ability, providing more reliable technological support for intelligent maintenance and efficient industrial production.

Author Contributions

Conceptualization, Y.X. and H.W.; methodology, Y.X. and H.W.; validation, F.X.; formal analysis, S.B.; resources, Y.X. and J.Y.; data curation, H.W.; writing—original draft, Y.X.; writing—review and editing, H.W. and J.Y.; funding acquisition, Y.X., H.W. and J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Quzhou Science and Technology Research Project (2024K171, 2024K188, 2023K044).

Data Availability Statement

The data presented in this study are available in the article.

Conflicts of Interest

Jiangang Ye was employed by the company Quzhou Special Equipment Inspection & Testing Research Institute. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Nomenclature

Symbol	Description
EEMD	Ensemble empirical mode decomposition
CNN	Convolutional neural network
BiLSTM	Bidirectional Long Short-Term Memory network
IMFs	Intrinsic mode functions
NEV	New energy vehicle
VMD	Variational mode decomposition
SVD	Singular value decomposition
LSTM	Long Short-Term Memory
EMD	Empirical mode decomposition
t-SNE	t-Distributed Stochastic Neighbor Embedding
CWRU	Case Western Reserve University

References

Praveenkumar, T.; Saimurugan, M.; Krishnakumar, P.; ran Ramach, K.I. Fault diagnosis of automobile gearbox based on machine learning techniques. Procedia Eng. 2014, 97, 2092–2098. [Google Scholar] [CrossRef]
Li, X.; Lei, Y.; Xu, M.; Li, N.; Qiang, D.; Ren, Q.; Li, X. A spectral self-focusing fault diagnosis method for automotive transmissions under gear-shifting conditions. Mech. Syst. Signal Process. 2023, 200, 110499. [Google Scholar] [CrossRef]
Fu, X.; Fang, Y.; Xu, Y.; Xu, H.; Guo, M.; Peng, N. Current Status of Research on Fault Diagnosis Using Machine Learning for Gear Transmission Systems. Machines 2024, 1, 679. [Google Scholar] [CrossRef]
Behvar, A.; Tahmasbi, K.; Savich, W.; Haghshenas, M. Tooth interior fatigue fracture in automotive differential gears. Eng. Fail. Anal. 2024, 156, 107829. [Google Scholar] [CrossRef]
Wang, Y.; Zhou, Z.; Yang, L.; Gao, R.X.; Yan, R. Wavelet-driven differentiable architecture search for planetary gear fault diagnosis. J. Manuf. Syst. 2024, 74, 587–593. [Google Scholar] [CrossRef]
Liu, G.; Wu, L. Running Gear Global Composite Fault Diagnosis Based on Large Model. IEEE Trans. Ind. Inform. 2025; early access. [Google Scholar]
Sreenath, P.G.; Praveen Kumare, G.; Pravin, S.; Vikram, K.N.; Saimurugan, M. Automobile gearbox fault diagnosis using Naive Bayes and decision tree algorithm. Appl. Mech. Mater. 2015, 813, 943–948. [Google Scholar] [CrossRef]
Singh, M.K.; Kumar, S.; Nandan, D. Faulty voice diagnosis of automotive gearbox based on acoustic feature extraction and classification technique. J. Eng. Res. 2023, 11, 100051. [Google Scholar] [CrossRef]
Boral, S.; Chaturvedi, S.K.; Naikan, V.N.A. A case-based reasoning system for fault detection and isolation: A case study on complex gearboxes. J. Qual. Maint. Eng. 2019, 25, 213–235. [Google Scholar] [CrossRef]
Zadshakoyan, M.; Gholami, A. Design of the L90 automobile gearbox fault detection system using the audio signal analysis. Iran. J. Mech. Eng. Trans. ISME 2022, 23, 114–132. [Google Scholar]
Hossain, M.N.; Rahman, M.M.; Ramasamy, D. Artificial Intelligence-Driven Vehicle Fault Diagnosis to Revolutionize Automotive Maintenance: A Review. CMES-Comput. Model. Eng. Sci. 2024, 141, 951. [Google Scholar] [CrossRef]
Jing, L.; Wang, T.; Zhao, M.; Wang, P. An adaptive multi-sensor data fusion method based on deep convolutional neural networks for fault diagnosis of planetary gearbox. Sensors 2017, 17, 414. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Xu, J.; Sun, C.; Yan, R.; Chen, X. Intelligent fault diagnosis for planetary gearbox using time-frequency representation and deep reinforcement learning. IEEE/ASME Trans. Mechatron. 2021, 27, 985–998. [Google Scholar] [CrossRef]
Furse, C.M.; Kafal, M.; Razzaghi, R.; Shin, Y.J. Fault diagnosis for electrical systems and power networks: A review. IEEE Sensors J. 2020, 21, 888–906. [Google Scholar] [CrossRef]
Kong, X.; Meng, L.; Su, Y.; Xu, T.; Lan, X.; Li, Y. Untrained compound fault diagnosis for planetary gearbox based on adaptive learning VMD and DSSECNN. IEEE Sensors J. 2023, 23, 11838–11854. [Google Scholar] [CrossRef]
Wang, C.; Peng, Z.; Liu, R.; Chen, C. Research on multi-fault diagnosis method based on time domain features of vibration signals. Sensors 2022, 22, 8164. [Google Scholar] [CrossRef]
Kumar, T.P.; Saimurugan, M.; Haran, R.H.; Siddharth, S.; Ramachandran, K.I. A multi-sensor information fusion for fault diagnosis of a gearbox utilizing discrete wavelet features. Meas. Sci. Technol. 2019, 30, 085101. [Google Scholar] [CrossRef]
Liang, P.; Deng, C.; Wu, J.; Yang, Z.; Zhu, J.; Zhang, Z. Compound fault diagnosis of gearboxes via multi-label convolutional neural network and wavelet transform. Comput. Ind. 2019, 113, 103132. [Google Scholar] [CrossRef]
Jiang, G.; He, H.; Yan, J.; Xie, P. Multiscale convolutional neural networks for fault diagnosis of wind turbine gearbox. IEEE Trans. Ind. Electron. 2018, 66, 3196–3207. [Google Scholar] [CrossRef]
Xie, T.; Huang, X.; Choi, S.K. Intelligent mechanical fault diagnosis using multisensor fusion and convolution neural network. IEEE Trans. Ind. Inform. 2021, 18, 3213–3223. [Google Scholar] [CrossRef]
Kim, S.; Choi, J.H. Convolutional neural network for gear fault diagnosis based on signal segmentation approach. Struct. Health Monit. 2019, 18, 1401–1415. [Google Scholar] [CrossRef]
Liu, C.; Cheng, G.; Chen, X.; Pang, Y. Planetary gears feature extraction and fault diagnosis method based on VMD and CNN. Sensors 2018, 18, 1523. [Google Scholar] [CrossRef] [PubMed]
Xu, Q.; Jiang, H.; Zhang, X.; Li, J.; Chen, L. Multiscale convolutional neural network based on channel space attention for gearbox compound fault diagnosis. Sensors 2023, 23, 3827. [Google Scholar] [CrossRef] [PubMed]
Zhang, C.; Chen, P.; Jiang, F.; Xie, J.; Yu, T. Fault diagnosis of nuclear power plant based on sparrow search algorithm optimized CNN-LSTM neural network. Energies 2023, 16, 2934. [Google Scholar] [CrossRef]
Zou, P.; Hou, B.; Lei, J.; Zhang, Z. Bearing fault diagnosis method based on EEMD and LSTM. Int. J. Comput. Commun. Control 2020, 15, 1. [Google Scholar] [CrossRef]
Zhang, H.; Feng, L.; Wang, J.; Gao, N. Development of technology predicting based on EEMD-GRU: An empirical study of aircraft assembly technology. Expert Syst. Appl. 2024, 246, 123208. [Google Scholar] [CrossRef]
Eren, L. Bearing fault detection by one-dimensional convolutional neural networks. Math. Probl. Eng. 2017, 2017, 8617315. [Google Scholar] [CrossRef]
Sonmez, E.; Kacar, S.; Uzun, S. A new deep learning model combining CNN for engine fault diagnosis. J. Braz. Soc. Mech. Sci. Eng. 2023, 45, 644. [Google Scholar] [CrossRef]
Jang, G.B.; Cho, S.B. Feature space transformation for fault diagnosis of rotating machinery under different working conditions. Sensors 2021, 21, 1417. [Google Scholar] [CrossRef]
Liu, F.; Liang, C. Short-term power load forecasting based on AC-BiLSTM model. Energy Rep. 2024, 11, 1570–1579. [Google Scholar] [CrossRef]
Vankadara, R.K.; Mosses, M.; Siddiqui, M.I.H.; Ansari, K.; Panda, S.K. Ionospheric total electron content forecasting at a low-latitude Indian location using a bi-long short-term memory deep learning approach. IEEE Trans. Plasma Sci. 2023, 51, 3373–3383. [Google Scholar] [CrossRef]
Mohandass, G.; Krishnan, G.H.; Selvaraj, D.; Sridhathan, C. Lung cancer classification using optimized attention-based convolutional neural network with DenseNet-201 transfer learning model on CT image. Biomed. Signal Process. Control 2024, 95, 106330. [Google Scholar] [CrossRef]
Sunkari, S.; Sangam, A.; Suchetha, M.; Raman, R.; Rajalakshmi, R.; Tamilselvi, S. A refined ResNet18 architecture with Swish activation function for Diabetic Retinopathy classification. Biomed. Signal Process. Control 2024, 88, 105630. [Google Scholar] [CrossRef]
Wang, X.; Li, X.; Wang, L.; Ruan, T.; Li, P. Adaptive cache management for complex storage systems using cnn-lstm-based spatiotemporal prediction. arXiv 2024, arXiv:2411.12161. [Google Scholar]
Xia, L.; Lee, C.; Li, J.J. Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters. Nat. Commun. 2024, 15, 1753. [Google Scholar] [CrossRef]

Figure 1. CNN network structure diagram.

Figure 2. BiLSTM network structure layers.

Figure 3. Fault diagnosis method based on improved EEMD and CNN-BiLSTM.

Figure 4. EEMD decomposition results of rolling element fault (a) and compound fault (b).

Figure 5. Enhanced EEMD decomposition results of rolling element fault (a) and compound fault (b).

Figure 6. Comparison of the frequency spectra of IMF signals obtained from EEMD and improved EEMD for rolling element fault (a) and compound fault (b).

Figure 7. The accuracy (a) and loss (b) iteration curves.

Figure 8. Confusion matrix of different comparison methods.

Figure 9. t-SNE plots of different comparison methods.

Figure 10. t-SNE clustering performance of different comparison methods on the CWRU dataset.

Table 1. Experimental gearbox data.

Fault Type	Motor Speed/RPM	Sample Length	Label
Normal	1200	1024	0
Rolling element fault	1200	1024	1
Compound fault	1200	1024	2
Inner race fault	1200	1024	3
Outer Race Fault	1200	1024	4

Table 2. Experimental gearbox data.

Rolling Element Fault	Pearson Correlation Coefficient	Compound Fault	Pearson Correlation Coefficient
IMF1	0.50032	IMF1	0.12479
IMF2	0.55233	IMF2	0.06461
IMF3	0.44509	IMF3	0.09135
IMF4	0.35590	IMF4	0.14308
IMF5	0.30467	IMF5	0.16546
IMF6	0.42363	IMF6	0.98298
IMF7	0.12627	IMF7	0.98183
IMF8	0.03186	IMF8	0.12479

Table 3. Test results data.

Method	Average Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)	Five-Fold Cross-Validation (%)	Diagnosis Time (s)
CNN	86.23	84.12	87.05	85.56	86.0 ± 2.1	0.18
DenseNet	96.26	95.51	96.81	96.15	96.3 ± 1.3	0.23
ResNet18	98.43	98.15	98.75	98.42	98.4 ± 0.9	0.26
CNN-LSTM	98.86	98.71	98.92	98.83	98.8 ± 0.7	0.25
EEMD-CNN-BiLSTM	99.41	99.33	99.45	99.38	99.41 ± 0.5	0.45
Improved EEMD-CNN-BiLSTM	99.46	99.41	99.51	99.45	99.46 ± 0.4	0.43

Table 4. CWRU selected dataset.

Fault Type	Speed (Rpm)	Load (Hp)	Fault Location	Fault Sizes (In)	Label
Normal	1797	0	No fault	/	0
Rolling element fault	1772	1	Rolling element	0.007	1
Inner race fault	1750	2	Inner race	0.014	2
Composite fault	1750	2	Inner race + Outer race	0.014 + 0.021	3
Outer race fault	1730	3	Outer race	0.021	4

Table 5. CWRU data test results.

Method	Average Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)	Five-Fold Cross-Validation (%)
CNN	84.57	82.8	85.95	84.35	84.5 ± 2.6
DenseNet	94.62	93.75	95.2	94.47	94.6 ± 1.7
ResNet18	97.23	96.8	97.65	97.22	97.2 ± 1.0
CNN-LSTM	97.65	97.31	97.88	97.59	97.6 ± 0.9
EEMD-CNN-BiLSTM	99.11	99.03	99.16	99.10	99.1 ± 0.6
Improved EEMD-CNN-BiLSTM	99.28	99.21	99.35	99.28	99.2 ± 0.5

Table 6. Comparison of data before and after model compression.

Method	Dataset	Accuracy Before Compression (%)	Accuracy After Compression (%)	Diagnosis Time Before Compression (s)	Diagnosis Time After Compression (s)	Reduced Computation Time (%)	Reduced Storage Usage (%)
Improved EEMD-CNN-BiLSTM	CWRU dataset	99.28	98.9	0.45	0.23	48.89	20
Improved EEMD-CNN-BiLSTM	Southeast University dataset	99.46	99.12	0.43	0.21	51.16	20

Table 7. Comparison of the proposed method with results from the literature.

Reference	Dataset	Model	Accuracy (%)
[27]	CWRU (balanced class condition)	1DCNN	97.1
[28]	CWRU (imbalanced class condition)	WDD-CNN	96.45
[29]	CWRU (source and target domain data with different distributions)	CNN-LSTM	83.63
Ours	CWRU (imbalanced class condition)	Improved EEMD-CNN-BiLSTM	99.28
Ours	Southeast University dataset (balanced class condition)	Improved EEMD-CNN-BiLSTM	99.46

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, Y.; Wang, H.; Xu, F.; Bi, S.; Ye, J. A Sensor Data-Driven Fault Diagnosis Method for Automotive Transmission Gearboxes Based on Improved EEMD and CNN-BiLSTM. Processes 2025, 13, 1200. https://doi.org/10.3390/pr13041200

AMA Style

Xu Y, Wang H, Xu F, Bi S, Ye J. A Sensor Data-Driven Fault Diagnosis Method for Automotive Transmission Gearboxes Based on Improved EEMD and CNN-BiLSTM. Processes. 2025; 13(4):1200. https://doi.org/10.3390/pr13041200

Chicago/Turabian Style

Xu, Youhong, Hui Wang, Feng Xu, Shaoping Bi, and Jiangang Ye. 2025. "A Sensor Data-Driven Fault Diagnosis Method for Automotive Transmission Gearboxes Based on Improved EEMD and CNN-BiLSTM" Processes 13, no. 4: 1200. https://doi.org/10.3390/pr13041200

APA Style

Xu, Y., Wang, H., Xu, F., Bi, S., & Ye, J. (2025). A Sensor Data-Driven Fault Diagnosis Method for Automotive Transmission Gearboxes Based on Improved EEMD and CNN-BiLSTM. Processes, 13(4), 1200. https://doi.org/10.3390/pr13041200

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Sensor Data-Driven Fault Diagnosis Method for Automotive Transmission Gearboxes Based on Improved EEMD and CNN-BiLSTM

Abstract

1. Introduction

2. Materials and Methods

2.1. Improved Ensemble Empirical Mode Decomposition

2.2. Convolutional Neural Network

2.3. Bidirectional Long Short-Term Memory Neural Network

3. Experimental Result

3.1. A Fault Diagnosis Method for Automotive Gearboxes Based on Improved EEMD and CNN-BiLSTM

3.1.1. Signal Preprocessing and Decomposition

3.1.2. IMF Selection and Signal Reconstruction

3.1.3. Deep Feature Extraction with CNN-BiLSTM

3.1.4. Fault Classification and Diagnosis

3.2. Dataset Introduction

3.3. Improved EEMD Decomposition

3.4. Discussion and Analysis of Different Comparison Methods

3.4.1. Iterative Curve Analysis

3.4.2. Visual Analysis of Confusion Matrix

3.4.3. T-SNE Visualization Analysis

3.4.4. Analysis of Evaluation Indicators

4. Different Datasets and Feasibility Analysis

4.1. CWRU Dataset Validation

4.2. Comparison Method Test Results

4.3. T-SNE Visualization Analysis

4.4. Feasibility Analysis

4.5. Comparison Between the Proposed Method and the Literature Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI