1. Introduction
With the rapid development of the new energy vehicle (NEV) industry, the operational stability and safety of automotive transmission gearboxes, as core components of the power transmission system, have drawn increasing attention [
1,
2,
3]. Under complex operating conditions, NEVs must cope with frequent acceleration and deceleration operations, significant load variations, and diverse natural environments, making the service conditions of transmission gearboxes extremely demanding [
4,
5]. As critical power transmission components, gearboxes directly influence the vehicle’s operational performance and are closely related to driving safety and service life. During gearbox operation, prolonged exposure to mechanical impacts, vibrations, and wear may lead to various potential faults, such as gear wear, fractures, and bearing failures, which can disrupt power transmission or reduce efficiency, ultimately jeopardizing vehicle safety [
6,
7,
8]. Therefore, achieving efficient and accurate fault identification of gearboxes under complex operating conditions, as well as timely monitoring and assessment of their health status, is of great significance for ensuring the driving safety of NEVs, enhancing operational efficiency, and reducing maintenance costs [
9,
10,
11]. Related research not only improves the safety performance of NEVs but also provides theoretical and practical support for the development of intelligent and proactive maintenance technologies.
In recent years, gearbox fault diagnosis technologies based on deep learning have gained widespread attention and demonstrated excellent performance under stable operating conditions. However, automotive transmission gearboxes typically operate under complex and variable conditions, including frequent load changes, dynamic speed fluctuations, and interference from various environmental noises, posing significant technical challenges for fault diagnosis [
12]. Under variable-speed conditions, gearbox fault signals exhibit nonlinear dynamic characteristics due to speed variations. Specifically, in the time domain, the occurrence of fault impact signals is irregular; in the frequency domain, fault characteristic frequencies spread across multiple frequency bands as the speed fluctuates [
13,
14]. Additionally, load variations directly influence the vibration characteristics of gearboxes. When the load increases, the vibration amplitude during the gear meshing process intensifies significantly, while sudden load shocks may generate strong interference signals. These interferences can easily obscure or mix with actual fault characteristic signals, making fault mode identification even more challenging [
15,
16].
To address the interference of operating condition variations on the performance of deep learning-based fault diagnosis models, researchers have proposed improved methods that integrate traditional techniques to optimize diagnostic performance under complex conditions. Kumar et al. [
17] introduced a fault diagnosis method combining discrete wavelet feature extraction and machine learning algorithms. By leveraging wavelet transform for multi-scale signal decomposition, this method highlights key fault features and integrates data from multiple sensors. Liang et al. [
18] investigated a diagnostic method for gearbox compound faults by utilizing wavelet transform and convolutional neural networks (CNNs) in combination with multi-label classification techniques. The method employs wavelet transform to extract time–frequency features from vibration signals, converting them into a format suitable for input to the diagnostic model. Jiang et al. [
19] developed a novel multi-scale convolutional neural network architecture for monitoring the health status of wind turbine gearboxes. This architecture allows the model to simultaneously learn critical fault features across different directions and scales, significantly enhancing feature extraction capabilities. Xie et al. [
20] introduced a diagnostic approach that integrates data from multiple sensors with a CNN. Principal component analysis is applied to convert sensor signals into RGB images, which are then used as input for CNN-based fault detection. Kim et al. [
21] introduced a fault diagnosis method based on signal segmentation and a CNN. The initial vibration signals are segmented according to the gear tooth positions, and the segmented dataset is compared with the unsegmented dataset. Liu et al. [
22] addressed the challenge of extracting localized weak feature information by proposing a fault diagnosis method combining variational mode decomposition (VMD), singular value decomposition (SVD), and CNNs. VMD decomposes the raw signals into intrinsic mode components with physical significance, SVD extracts localized feature information to generate singular value vectors, and these vectors serve as CNN inputs for model training and fault classification. Xu et al. [
23] addressed the challenge of detecting multiple individual faults, which is difficult with conventional diagnostic approaches, by proposing a method that combines an enhanced mixed attention mechanism and a spatial-channel attention module. The model adaptively generates single- or multi-class fault labels, enabling accurate detection of combined faults. Zhang et al. [
24] introduced a hybrid fault detection approach utilizing CNNs and Long Short-Term Memory (LSTM) models, enhanced by the sparrow optimization technique for automated tuning of parameters. By leveraging the global exploration potential of the sparrow-inspired method, this approach replaces manual parameter configuration, greatly improving the model’s speed and accuracy. Zou et al. [
25] introduced an innovative technique for fault detection by integrating ensemble empirical mode decomposition (EEMD) with LSTM architectures. Data preparation methods combined with EEMD refine signals for clarity, while LSTM autonomously identifies fault-related patterns. This combination boosts the speed of fault feature recognition, significantly improving the accuracy of fault detection. However, traditional EEMD methods use fixed-amplitude white noise for signal decomposition, which can lead to instability when dealing with different types of vibration signals. This instability may cause mode mixing and error accumulation, reducing the reliability of extracted features and the accuracy of diagnosis. At the same time, LSTM mainly relies on time-series information for feature learning. While it excels at capturing long-term dependencies, it struggles to extract local spatial features effectively, limiting its generalization ability under complex conditions such as varying loads, speeds, or environmental noise. Additionally, using LSTM alone for feature extraction often comes with high computational complexity, making training and inference resource-intensive and challenging for efficient deployment in resource-limited online monitoring systems. In contrast, this study proposes an improved fault diagnosis method combining EEMD and CNN-Bidirectional Long Short-Term Memory (BiLSTM) networks. By introducing a dynamic adaptive noise injection mechanism, the noise amplitude adjusts according to signal characteristics, improving the stability of decomposition and the accuracy of feature extraction. The integration of CNN-BiLSTM modules enables efficient feature extraction and dynamic modeling.
The main contributions of this paper are as follows:
(1) Traditional ensemble empirical mode decomposition often faces issues like instability or mode mixing due to the use of fixed-amplitude white noise. This study introduces a dynamic noise adjustment mechanism, allowing the noise amplitude to adapt based on the signal characteristics. This improvement enhances the stability and accuracy of the decomposition, making the extracted intrinsic mode functions (IMFs) more representative of the core features of gearbox vibration signals.
(2) To strengthen the expression of signal features and reduce noise interference, Pearson correlation coefficients are used to evaluate and filter the IMFs. Only components highly correlated with the original signal are retained. This ensures that the reconstructed signal holds greater diagnostic value, providing high-quality input for subsequent learning models.
(3) In the feature extraction stage, the proposed model first uses a CNN to extract local spatial patterns, automatically identifying key characteristics of gearbox faults. Then, the bidirectional long BiLSTM models the dependencies in both forward and backward sequences, improving the model’s ability to capture dynamic features. This combination effectively balances the needs of spatial pattern recognition and time-series modeling.
(4) Using the deep features extracted by CNN-BiLSTM, the classifier can accurately identify the gearbox’s operating conditions and various fault types, such as wear or damage. Experimental results show that the proposed method achieves accuracies of 99.28% and 99.46% on the CWRU and Southeast University datasets, respectively, outperforming existing approaches. Additionally, t-SNE visualizations and confusion matrices further confirm the model’s capability to classify and distinguish between different fault categories effectively.
3. Experimental Result
Python 3.8 is used as the language for programming in this experiment, PyCharm as the development tool, and an NVIDIA GeForce RTX 3060 Ti graphics card for hardware configuration. The main steps of the experiment include signal preprocessing and IMF decomposition (using improved EEMD) and feature extraction (through a CNN and BiLSTM). First, necessary Python libraries such as NumPy 1.24.3, SciPy 1.10.1, Matplotlib 3.7.1, PyTorch 1.13.1, and the EEMD library (e.g., PyEMD) are installed to process signal data and train deep learning models. To speed up computation, the GPU-compatible version of PyTorch is installed to fully utilize the 3060 Ti graphics card. During network training, common parameters are set, such as learning rate (0.001), batch size (64), optimizer (e.g., Adam), and loss function (e.g., cross-entropy loss). During training, the training loss and validation accuracy are monitored, and network parameters are adjusted to achieve optimal model performance. Finally, the experimental results are presented using spectrograms, IMF decomposition result graphs, and model training process charts, with accurate classification of different fault modes.
3.1. A Fault Diagnosis Method for Automotive Gearboxes Based on Improved EEMD and CNN-BiLSTM
As shown in
Figure 3, the proposed gearbox fault diagnosis method integrates improved EEMD with CNN-BiLSTM techniques. By combining signal decomposition, adaptive filtering, deep feature extraction, and classification decisions, this approach achieves efficient identification of gearbox faults. Compared to traditional methods, it not only significantly improves diagnostic accuracy but also demonstrates greater robustness and adaptability under complex conditions, offering a smart and reliable solution for real-world engineering applications. The process consists of four main stages.
3.1.1. Signal Preprocessing and Decomposition
After collecting vibration signals from the gearbox, the improved EEMD is applied for signal decomposition. Unlike conventional EEMD, which uses fixed-amplitude white noise and can result in unstable decomposition or mode mixing, a dynamic noise adjustment mechanism is introduced to adapt noise amplitude based on the signal’s characteristics. This enhances the stability and accuracy of the decomposition. The decomposed signal comprises multiple IMFs, each representing different frequency components, facilitating further analysis.
3.1.2. IMF Selection and Signal Reconstruction
To ensure that the reconstructed signal accurately reflects the core features of the original data while minimizing noise interference, Pearson correlation coefficients are used to evaluate the relevance of all IMFs. Only IMFs with a high correlation to the original signal are retained for reconstruction. This step removes irrelevant or low-relevance components, further enhancing the representation of fault-related features.
3.1.3. Deep Feature Extraction with CNN-BiLSTM
The filtered and reconstructed signal is fed into a CNN, which extracts local spatial and temporal features through multi-layer convolutional operations, automatically identifying significant patterns related to faults. The CNN excels at capturing local spatial details, enabling the identification of issues such as gear wear or cracks. The deep features extracted by the CNN are then passed to a BiLSTM network, which models the sequential dependencies in both forward and backward directions. By leveraging its gating mechanism, BiLSTM captures dynamic behaviors of the gearbox under various operating conditions, improving the differentiation of fault patterns.
3.1.4. Fault Classification and Diagnosis
The multidimensional deep features extracted by the CNN-BiLSTM combination are input into a classifier, which accurately identifies the operating state of the gearbox. This allows for efficient classification of normal states and various fault modes, such as gear wear or missing teeth. Experimental results validate the effectiveness of this method. The confusion matrix clearly demonstrates the classification accuracy for different fault categories, showing the model’s strong recognition ability. t-SNE visualization reveals that the proposed method forms distinct clusters in feature space for different fault types, confirming its discriminative power. The accuracy and loss curves over training iterations indicate stable convergence, effectively avoiding overfitting while maintaining high accuracy on the test set.
3.2. Dataset Introduction
In this study, we utilized the publicly available experimental dataset from Southeast University to investigate automotive transmissions. This dataset is extensive use in gearbox research and holds high authority and representativeness. By selecting this dataset, we can not only validate the effectiveness of the model using publicly available, high-quality data but also provide a reference for comparative experiments by other researchers. Due to the complexity and limitations in the data collection process for real automotive gearbox faults, the use of this dataset helps overcome the difficulties in data acquisition during practical testing while ensuring the integrity and reproducibility of the experimental data. Specifically, the vibration signal sampling frequency and tested gear parameters in this dataset form a certain mapping relationship with real-world applications, providing a reliable experimental foundation for automotive gearbox fault diagnosis.
The data used in this study were provided by Southeast University from their transmission system dynamic simulator. The dataset has a rotational speed of 20 Hz (equivalent to 1200 rpm) and a load of 0 V (corresponding to 0 Nm), and it includes five different fault types: normal state, rolling element fault, composite fault, inner race fault, and outer race fault. The dataset contains a total of 5000 samples, with each sample having a length of 1024 data points. To ensure the effectiveness of model training and testing, the dataset is divided into a training set and a testing set with a ratio of 7:3, which helps validate the model’s generalization ability and robustness. Through in-depth analysis of these fault data, this study aims to achieve efficient automotive gearbox fault diagnosis. Specific data details are shown in
Table 1.
3.3. Improved EEMD Decomposition
The conventional EEMD approach encounters challenges when analyzing intricate signals. For instance, the static Gaussian white noise addition method cannot align well with the localized features of the signal, leading to a discrepancy between the added noise and the signal’s core configuration, which may cause blending of intrinsic signal components. Moreover, the standard mean-based combination technique used in EEMD does not effectively utilize the relationship between decompositions and the original signal, potentially leading to a reduction in key signal details and inconsistent final outputs.
To overcome these limitations, we propose a flexible fractal noise addition and correlation-based combination framework. This method adjusts the magnitude and frequency features of the added noise in real time, ensuring better alignment with the underlying framework of the signal. Recursive-patterned noise replaces traditional Gaussian noise to further enhance the precision of signal breakdown. Additionally, we introduce an adaptive weighting method guided by correlation, which allocates varying significance to each decomposition based on its relationship with the original signal, refining the resulting IMF.
Using rolling bearing faults and combined faults as case studies,
Figure 4 illustrates the outcomes of decomposing these two fault types with the conventional EEMD technique. Subfigure (a) presents the decomposition results for rolling bearing faults, while subfigure (b) displays those for combined faults. As shown in
Figure 4, although EEMD can extract localized signal patterns effectively, certain intrinsic mode functions (IMFs) exhibit noticeable noise interference and component mixing, leading to the degradation of signal clarity and the loss of critical information.
Figure 5 provides the decomposition results for the same dataset using the enhanced EEMD approach. Subfigures (a) and (b) show the refined decomposition for rolling bearing and combined faults, respectively. The enhanced method employs a dynamically adjusted fractal noise addition mechanism combined with correlation-driven weighted integration. This enables better retention of the signal’s essential characteristics while minimizing the impact of noise and mode overlapping.
The improvements significantly enhance decomposition precision and the clarity of signal representation. A comparative analysis highlights the ability of the improved EEMD to more accurately delineate the underlying structures and distinctive features of different fault types, particularly for complex combined faults. It achieves a more effective separation of modes, which substantially bolsters the dependability and precision of fault detection processes.
In the experiment, we compared the decomposition results of EEMD and improved EEMD methods on the same signal. Specifically, we applied Fast Fourier Transform (FFT) to the IMF components generated by each method to extract their spectral amplitude characteristics, allowing a more intuitive comparison of the differences between the two methods. The spectral comparison is shown in
Figure 6, which illustrates the differences in the spectra of IMF components generated by the EEMD and improved EEMD methods. In the frequency spectrum, we can clearly see the distinct behaviors of the IMF signals in the frequency domain for both methods. EEMD, when processing complex signals, particularly high-frequency components, tends to introduce some noise interference, which often manifests as unnecessary high-frequency peaks in the frequency spectrum. This is due to the inherent noise impact in EEMD, which cannot fully eliminate the noise in the high-frequency components. In contrast, the improved EEMD method, with more effective noise suppression, can focus more effectively on the main frequency and its harmonics, making the main frequency peaks more prominent in the spectrum with less noise and thus better reflecting the true physical characteristics of the signal. Taking the rolling element fault and compound fault signals as examples, the IMF generated by EEMD may exhibit significant noise interference in the high-frequency region, causing the high-frequency components in the spectrum to appear unclear. On the other hand, the enhanced EEMD method is more effective in emphasizing the main frequency features of the signal, making the main frequency peaks more distinct and reducing noise interference, thereby more accurately reflecting the fault features.
In mechanical fault diagnosis tasks, selecting the most representative modes from the multiple IMFs obtained from signal decomposition is a crucial step to improve diagnostic accuracy. To select IMFs, this method uses the Pearson correlation coefficient, measuring the relationship between each IMF and the original signal, and retains those with higher correlations. This effectively extracts key fault-related information while reducing redundancy and noise. The main reason for selecting the Pearson correlation coefficient as the filtering criterion is its ability to quantify the linear relationship between two signals. Specifically, the Pearson correlation coefficient effectively evaluates the correlation between each IMF and the original signal. Its value ranges from −1 to 1, with values closer to 1 indicating a stronger correlation and values closer to 0 indicating a weaker correlation. By calculating the Pearson correlation coefficient between each IMF and the original signal, it becomes straightforward to determine whether an IMF contains key information similar to the original signal. In practical applications, selecting the appropriate correlation threshold is essential, typically determined through experimentation to balance feature extraction and noise suppression. When combined with techniques such as spectral analysis, this method can effectively reveal the frequency components of mechanical faults, enhancing diagnostic accuracy and robustness, particularly in gearbox bearing fault diagnosis.
As shown in
Table 2, IMF1 and IMF2 under a rolling element fault, as well as IMF6 and IMF7 under a compound fault, all exhibit high correlation (greater than the threshold of 0.5). This indicates that these IMFs can effectively reflect the key frequency components in the signal. Therefore, during the signal reconstruction process, these modes can effectively preserve the main features while suppressing the impact of noise.
3.4. Discussion and Analysis of Different Comparison Methods
This study presents an automotive transmission method based on an enhanced EEMD combined with CNN-BiLSTM and validates its superiority through comparative experiments. The method first decomposes the original vibration signal using improved EEMD and selects the IMF highly correlated with the original signal using Pearson’s correlation coefficient, thereby achieving precise extraction of key frequency components and noise suppression. We compared it with a traditional CNN, DenseNet [
32], ResNet18 [
33], CNN-LSTM [
34], and the unmodified EEMD-CNN-BiLSTM. These comparison methods each have their own characteristics: the traditional CNN focuses on local feature extraction but struggles to capture long-term dependencies; CNN-LSTM improves the extraction of spatiotemporal features; DenseNet and ResNet18 enhance information flow in deep networks through dense connections and residual learning; and the unmodified EEMD-CNN-BiLSTM also uses EEMD for signal decomposition but falls short in noise suppression and strengthening key features compared to the improved method. Experimental results show that the improved EEMD-CNN-BiLSTM exhibits significant advantages in fault diagnosis accuracy and robustness. It can more effectively extract core information reflecting fault states from complex signals, providing an efficient and reliable solution for automotive gearbox fault diagnosis.
3.4.1. Iterative Curve Analysis
Figure 7a shows the performance progression curve, illustrating that the enhanced EEMD-CNN-BiLSTM framework rapidly boosts its precision during the initial training phase and maintains its lead throughout, eventually achieving the highest grouping accuracy rate. Meanwhile, the error metric evolution plot in
Figure 7b reveals that the method stabilizes rapidly to a low error level with reduced oscillations, reflecting excellent learning consistency and steady-state behavior. By leveraging optimized EEMD decomposition, the framework more effectively captures signal-specific attributes. Coupled with CNN’s strength in identifying localized characteristics and BiLSTM’s ability to model dual-direction time-series links, the overall approach excels in key feature extraction and interference reduction.
In comparison, the unoptimized EEMD-CNN-BiLSTM approach, while quick to stabilize and achieving good accuracy, is slightly slower in improving precision and final outcomes. Although ResNet18 and DenseNet use skip connection strategies and compact network linking to improve learning in deeper networks, their ability to handle intricate mechanical failure attributes does not match the proposed framework. The traditional CNN struggles with time–sequence dynamics and deeper pattern learning, leading to lower precision and slower error stabilization. CNN-LSTM, though offering some enhancement by combining convolutional layers with one-way temporal LSTM, remains limited in addressing dual-sequence temporal links, placing its performance between the conventional CNN and the enhanced EEMD-CNN-BiLSTM approach.
Overall, the experimental findings highlight the notable advantages of the optimized EEMD-CNN-BiLSTM framework for sophisticated machine fault diagnosis. By refining preprocessing and architectural design, this approach preserves essential signal attributes while minimizing unwanted disturbances. It achieves higher precision, faster stabilization, and consistent training outcomes. Compared to alternative approaches, the proposed method demonstrates improved resilience and real-world applicability, delivering a practical solution for diagnostic challenges.
3.4.2. Visual Analysis of Confusion Matrix
As shown in
Figure 8, the specific diagnostic results of six different models are presented. The confusion matrix offers a comprehensive breakdown of the classification outcomes for each fault type: rows correspond to the actual categories, columns denote the predicted categories, diagonal entries show the count of correctly identified samples, and nondiagonal entries indicate instances of incorrect predictions. By analyzing the confusion matrix, it becomes clear which categories are more prone to confusion, offering insights for further model optimization. For instance, the improved EEMD-CNN-BiLSTM method outperforms others across all categories, especially in predicting rolling element faults and compound faults, with significantly higher accuracy compared to other methods. This aligns with its highest average accuracy of 99.46%, as shown in
Table 3. In contrast, the traditional CNN method, due to its limited ability to capture temporal information, exhibits a higher number of misclassifications, particularly in predicting compound faults and inner ring faults. This matches its relatively low accuracy of 86.23% and its fast diagnosis time of 0.18 s. Additionally, ResNet18 and DenseNet, despite achieving relatively high accuracies of 98.43% and 96.26%, respectively, show certain misclassification cases in specific categories, such as outer ring faults and compound faults. This suggests that while these methods are effective in feature extraction, they still face challenges in distinguishing complex fault patterns.
3.4.3. T-SNE Visualization Analysis
Figure 9 illustrates the t-SNE [
35] dimensionality reduction results for different comparison methods. t-SNE is an effective high-dimensional data visualization technique that maps data into a two-dimensional space, helping us observe the relationships and distributions of samples from different categories intuitively. As shown in
Figure 9, the improved EEMD-CNN-BiLSTM method demonstrates tightly clustered sample distributions in the low-dimensional space, with clear separations between different fault categories. This indicates its excellent performance in feature extraction and its ability to effectively distinguish between various fault types. This result aligns with its superior accuracy and convergence performance, further validating its advantages in complex fault diagnosis tasks. In contrast, the results of other comparison methods reveal more scattered feature distributions. Particularly, the traditional CNN method shows significant overlap among samples, leading to suboptimal classification performance. Although ResNet18 and DenseNet exhibit some improvements, certain categories still display considerable overlap, suggesting limitations in these methods’ fault feature extraction and classification capabilities. The CNN-LSTM method achieves relatively distinct distributions, but compared to the improved EEMD-CNN-BiLSTM method, it still shows some degree of category overlap, especially for complex fault types such as compound faults and inner ring faults. This indicates its relative inadequacy in capturing discriminative features. In
Figure 9, the color scale represents different fault types, with each fault type assigned a unique label: 0 for normal, 1 for rolling element fault, 2 for compound fault, 3 for inner race fault, and 4 for outer race fault. These labels are visually distinguished by corresponding colors in the figure.
In summary, the t-SNE visualization further confirms the outstanding performance of the improved EEMD-CNN-BiLSTM method in fault diagnosis tasks. It demonstrates a superior ability to extract and differentiate fault signal features, achieving stronger classification capabilities and higher accuracy.
3.4.4. Analysis of Evaluation Indicators
Table 3 highlights significant differences among the methods in terms of accuracy, precision, recall, and F1-score, reflecting their varying abilities to extract fault features, suppress noise, and capture temporal information. The traditional CNN method achieved an average accuracy of 86.23%, precision of 84.12%, and recall of 87.05%, resulting in an F1-score of 85.56%. These results indicate limitations in its ability to extract fault features effectively. While its diagnosis time of 0.18 s provides an advantage in response speed, its performance in 5-fold cross-validation (86.0 ± 2.1) suggests poor generalization and instability across different data splits, making it less suitable for complex fault diagnosis tasks.
DenseNet and ResNet18 achieved accuracy rates of 96.26% and 98.43%, respectively, with significant improvements in precision, recall, and F1-score. DenseNet recorded a precision of 95.51%, recall of 96.81%, and F1-score of 96.15%, while ResNet18 reached a precision of 98.15%, recall of 98.75%, and F1-score of 98.42%. Their 5-fold cross-validation results (96.3 ± 1.3 for DenseNet and 98.4 ± 0.9 for ResNet18) exhibited smaller fluctuations, indicating better stability and reliability. Their diagnosis times were 0.23 and 0.26 s, respectively, offering a balanced performance overall.
The CNN-LSTM method, combining convolutional layers with LSTM’s sequential modeling, achieved an accuracy of 98.86%, with a precision of 98.71%, recall of 98.92%, and F1-score of 98.83%. Its 5-fold cross-validation result of 98.8 ± 0.7 validated its stability in capturing temporal features. However, its reliance on unidirectional LSTM presents some limitations in modeling dynamically changing fault signals. The diagnosis time was 0.25 s.
The EEMD-CNN-BiLSTM method used EEMD to decompose the raw signal, extracting high-quality IMFs and combining a CNN with BiLSTM for feature extraction and temporal modeling. This approach achieved further improvements with an accuracy of 99.41%, precision of 99.33%, recall of 99.45%, and F1-score of 99.38%. Its 5-fold cross-validation result of 99.41 ± 0.5 demonstrated consistent performance across different data splits, highlighting strong generalization ability. Although its diagnosis time was slightly higher at 0.45 s, its overall performance was excellent.
Building on this, the improved EEMD-CNN-BiLSTM method introduced a dynamic noise adjustment mechanism to enhance the stability of signal decomposition, combined with CNN and BiLSTM for efficient feature extraction and temporal modeling. This resulted in slight increases in accuracy to 99.46%, precision to 99.41%, recall to 99.51%, and F1-score to 99.45%. Its 5-fold cross-validation result of 99.46 ± 0.4 showed minimal fluctuations, indicating exceptional accuracy and robustness across various splits. The diagnosis time was also reduced to 0.43 s, making it the most optimal solution overall.
In summary, as model complexity increased, performance metrics generally improved, with 5-fold cross-validation results further supporting the robustness and stability of the methods. The improved EEMD-CNN-BiLSTM method consistently outperformed others across all evaluation metrics and cross-validation results, demonstrating its high accuracy, strong generalization, and real-time capabilities. This makes it the most promising diagnostic model for complex conditions in this study.
4. Different Datasets and Feasibility Analysis
In this section, we analyzed different datasets used for the proposed fault diagnosis model, highlighting their differences in features such as fault types, data collection processes, and real-world applicability of the data. In addition, we conducted a feasibility analysis to evaluate the performance of the model on different datasets and its deployment potential in actual low-power hardware environments. We also investigated the challenges associated with each dataset and explored how to adapt the model to different scenarios.
4.1. CWRU Dataset Validation
To closely replicate real-world scenarios, we used the Case Western Reserve University (CWRU) dataset, which simulates the operating conditions of industrial equipment such as motors and gearboxes. This dataset includes data collected under various loads, speeds, and environmental noise conditions, covering several typical speeds (1730–1797 rpm) and load levels (0–3 HP). It provides vibration signals for five operating conditions: normal, rolling element fault, inner race fault, outer race fault, and a simulated composite fault (inner race + outer race). Each fault type is further categorized by different fault sizes (0.007, 0.014, and 0.021 inches) to increase the diversity and complexity of the samples. The composite fault condition, in particular, combines defects of different modes and scales, presenting a greater challenge for fault diagnosis and offering robust support for evaluating the stability and adaptability of the model. As summarized in
Table 4, these realistic conditions reflect the model’s stability and reliability in dynamic environments, validating its practical applicability in complex and variable settings.
4.2. Comparison Method Test Results
When testing with the CWRU dataset, various fault diagnosis methods showed differences in performance across metrics such as accuracy, precision, recall, and F1-score, reflecting how the models handled the complexity of data and noise interference under different operating conditions. These results are shown in
Table 5. Overall, as the model structure was enhanced, diagnostic performance significantly improved, demonstrating strong generalization ability.
The traditional CNN method achieved an accuracy of 84.57%, precision of 82.80%, recall of 85.95%, and F1-score of 84.35%. While it showed basic recognition ability under simpler conditions, its feature extraction capability was limited in more complex scenarios with varying speeds and loads, which affected its overall performance. DenseNet and ResNet18 achieved average accuracies of 94.62% and 97.23%, respectively, with both precision and recall maintaining high levels. By using dense connections and residual learning, these networks were better able to capture feature details from complex fault signals, enhancing model stability and robustness. The CNN-LSTM method further improved the model’s ability to capture temporal features, with an accuracy of 97.65% and an F1-score of 97.59%, demonstrating its effectiveness in capturing dynamic signal changes under varying conditions, making it suitable for state recognition during vehicle operation. The EEMD-CNN-BiLSTM method introduced EEMD decomposition in the signal preprocessing stage, effectively separating noise and fault features. Combined with the bidirectional LSTM for deep temporal modeling, its accuracy rose to 99.11%. The improved model further optimized the EEMD decomposition strategy and feature extraction structure, achieving an accuracy of 99.28% and an F1-score of 99.28% on the CWRU dataset, with all metrics reaching optimal levels.
Overall, the improved EEMD-CNN-BiLSTM method maintained exceptionally high diagnostic accuracy and stability across varying speeds, loads, and composite faults, demonstrating excellent generalization ability and strong potential for real-world deployment.
4.3. T-SNE Visualization Analysis
On the CWRU dataset, the t-SNE clustering results of different methods show the differences in model performance for fault diagnosis, as shown in
Figure 10. First, while the traditional CNN method has a fast diagnosis time, its lower accuracy (84.57%) leads to a mixed distribution of fault categories in
Figure 10a, with unclear classification boundaries. This suggests that the CNN struggles to capture complex fault features and suppress noise interference, resulting in suboptimal clustering. In contrast, the DenseNet (accuracy 94.62%) and ResNet18 (accuracy 97.23%) methods, with their more complex network structures, are better at extracting fault features and improving classification performance. In
Figure 10b,c, the distribution of fault categories is more concentrated, and the clustering results have significantly improved, especially for ResNet18, where the separation between fault categories is much clearer, further demonstrating the advantages of residual learning.
The CNN-LSTM method (accuracy 97.65%), by introducing the LSTM module, effectively captures temporal features, further improving the separation of the clusters. In
Figure 10d, the classification performance is enhanced, with clearer boundaries between the clusters, especially when handling fault modes with significant temporal changes, leading to a noticeable improvement in diagnostic accuracy. EEMD-CNN-BiLSTM (accuracy 99.10%), combining the advantages of signal preprocessing and bidirectional LSTM, handles complex fault signals better.
Figure 10e shows a more distinct and compact clustering of categories, indicating its strengths in signal extraction and temporal modeling. Finally, the improved EEMD-CNN-BiLSTM (accuracy 99.28%) achieves the best performance in both accuracy and clustering.
Figure 10f presents a very clear distribution of categories, further verifying its accuracy and robustness in fault diagnosis.
Through the visual analysis of t-SNE clustering, it is clear that as the model complexity increases, the ability to distinguish fault categories improves significantly. Ultimately, the improved EEMD-CNN-BiLSTM method not only achieves optimal performance in accuracy but also demonstrates superior clustering results. This indicates that the method can effectively tackle complex fault diagnosis tasks on the CWRU dataset, successfully extract key fault features, and suppress noise interference, further proving its superiority and practical value in real-world applications.
4.4. Feasibility Analysis
In our study, the improved EEMD-CNN-BiLSTM method showed excellent performance in terms of accuracy, precision, recall, and F1-score. For example, the accuracy reached 99.46% on the Southeast University dataset and 99.28% on the CWRU dataset. These results demonstrate that the method performs exceptionally well in fault diagnosis tasks, effectively identifying and classifying various types of faults. However, when deploying this high-precision model to an actual vehicle transmission gearbox online monitoring system, real-time performance and resource usage issues must be addressed.
To tackle this challenge, we introduced model compression techniques, including pruning and quantization. Through pruning, we removed some redundant parameters and smaller weights from the network, reducing the computational load and storage requirements. For example, pruning reduced the model size by about 20%. With quantization, we converted floating-point 32-bit parameters into 8-bit integers, further decreasing the model’s storage usage and computational complexity. These compression techniques significantly reduced the storage and computational resource requirements while also improving the model’s efficiency on low-power hardware.
Testing results on low-power hardware showed that after model compression, the EEMD-CNN-BiLSTM method, while experiencing a slight drop in accuracy and other evaluation metrics (for example, accuracy dropped from 99.28% to 98.9%), still maintained high diagnostic precision. Additionally, the computation time of the model was significantly reduced. For instance, the original model took 0.45 s to process a sample, while the compressed model only required 0.23 s. This indicates that the compressed model not only achieved significant improvements in real-time performance but also retained sufficient accuracy on low-power hardware, meeting the practical requirements of the online monitoring system, as shown in
Table 6.
4.5. Comparison Between the Proposed Method and the Literature Results
Compared with existing studies, the literature [
28] reports an accuracy of 96.45% under the class imbalance condition of the CWRU dataset, while the improved EEMD-CNN-BiLSTM method proposed in this study achieved 99.28% under the same condition, significantly outperforming existing methods. Under balanced data conditions, our method also achieved a high accuracy of 99.46% on the Southeast University dataset, surpassing the 97.1% accuracy achieved in the literature [
27]. Additionally, compared to the CNN-LSTM method in the literature [
29], which achieved only 83.63% accuracy under cross-domain conditions, our method demonstrated stronger generalization ability and robustness, validating its practicality and superiority under real-world complex conditions, as shown in
Table 7.
5. Conclusions
This paper proposes an improved EEMD-CNN-BiLSTM method for automotive gearbox fault diagnosis. Additionally, t-SNE visualization results show that the model demonstrates a good ability to distinguish between different fault categories, with clear clustering. The improved EEMD effectively extracts key features, the CNN captures deep spatial features, and BiLSTM enhances the capture of temporal information, making the model both highly accurate and robust. At the same time, the diagnosis time is kept under 0.43 s, ensuring strong real-time performance, which meets industrial application requirements. In summary, the method shows good stability and scalability under complex working conditions, providing a reliable solution for intelligent fault diagnosis in vehicle gearboxes.
First, the experimental results show that this method outperforms several comparison models in terms of accuracy, temporal modeling ability, and feature extraction capability. Specifically, it achieves an accuracy of 99.46% on the Southeast University dataset and 99.28% on the CWRU dataset, significantly higher than existing methods in the literature (the highest being only 97.1%). Additionally, t-SNE dimensionality reduction analysis further validates its distinct separation between different fault categories. Moreover, the method demonstrates high real-time performance in diagnosis, meeting the industrial demands for both fault diagnosis efficiency and accuracy.
Secondly, through a comparative analysis with the unmodified EEMD-CNN-BiLSTM, ResNet18, DenseNet, CNN-LSTM, and traditional CNN methods across multiple evaluation metrics, the comprehensive performance advantages of the proposed method are further confirmed. Whether in accuracy, precision, recall, or F1-score, the improved method achieves optimal performance, especially with an accuracy close to 99.5% on both the CWRU and Southeast University datasets. This indicates that the proposed method not only demonstrates strong practicality in complex mechanical fault diagnosis but also provides new insights and references for research and practical applications in related fields.
Finally, in terms of future research directions, future work can explore the potential of multimodal data fusion and transfer learning. Moreover, optimizing model structures and diagnostic strategies for complex working conditions and multi-fault scenarios could further enhance diagnostic performance and generalization ability, providing more reliable technological support for intelligent maintenance and efficient industrial production.