Gearbox Fault Diagnosis Based on Compressed Sensing and Multi-Scale Residual Network with Lightweight Attention Mechanism

Zhou, Shihua; Yu, Xinhai; Li, Xuan; Wang, Yue; Ji, Kaibo; Ren, Zhaohui

doi:10.3390/math13091393

Open AccessFeature PaperArticle

Gearbox Fault Diagnosis Based on Compressed Sensing and Multi-Scale Residual Network with Lightweight Attention Mechanism

by

Shihua Zhou

^1,2,*,

Xinhai Yu

¹

,

Xuan Li

¹,

Yue Wang

¹,

Kaibo Ji

¹ and

Zhaohui Ren

^1,*

¹

School of Mechanical Engineering & Automation, Northeastern University, Shenyang 110819, China

²

Key Laboratory of Vibration and Control of Aero-Propulsion Systems Ministry of Education of China, Northeastern University, Shenyang 110819, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2025, 13(9), 1393; https://doi.org/10.3390/math13091393

Submission received: 25 February 2025 / Revised: 13 April 2025 / Accepted: 21 April 2025 / Published: 24 April 2025

Download

Browse Figures

Versions Notes

Abstract

As a core component of mechanical transmission systems, gear damage status significantly impacts the safety and efficiency of an overall mechanical system. However, existing fault diagnosis methods often struggle to extract features effectively in complex application scenarios characterized by conditions such as high temperature, high humidity, and high-level vibrations. Consequently, they exhibit poor adaptability and limited anti-noise capabilities. To address these limitations and enhance the adaptability and precision of gear fault diagnosis (GFD), a novel compressive sensing lightweight attention multi-scale residual network (CS-LAMRNet) method is proposed. Initially, compressive sensing technology was employed to remove noise and redundant information from the vibration signal, and the reconstructed 1D gear vibration signal was then converted into a 2D image. Subsequently, a multi-scale feature extraction (MSFE) module was designed based on multi-scale learning, with the aim of improving the feature extraction ability of the signal in noisy environments. Finally, an improved depth residual attention (IDRA) module was established and connected to the MSFE module, further enhancing the exactitude and generalization ability of the diagnosis method. The performance of the proposed CS-LAMRNet was evaluated using the NEU dataset and the SEU dataset, and it was compared with seven other fault diagnosis methods. The experimental results demonstrate that the accuracies of the CS-LAMRNet reached 99.80% and 100%, respectively, thus proving that the proposed method has a higher fault identification capability for gears under noisy environments.

Keywords:

CS-LAMRNet; fault diagnosis; multi-scale feature extraction; attention mechanism; Gramian angular difference field

MSC:

74H45

1. Introduction

Gear is the core unit of power transmission in rotating machinery, which is widely used in the aerospace, rail transit, wind energy, and ship industries, as well as in other fields. Its reliability directly affects the performance and life of equipment [1,2]. Affected by complex working conditions and load impact, gears are prone to various faults, such as cracks, pitting, wear, spalling, and broken teeth, resulting in mechanical structure failure, equipment shutdown, casualties, and other consequences [3,4,5]. Therefore, research on gear fault diagnosis (GFD) technology in complex environments is of great significance for equipment condition monitoring, maintenance reduction, and reliability improvement [6,7].

Over the past few decades, as a widespread application method for gear fault detection, fault diagnosis based on vibrations has become increasingly favored by scholars, and some feasible GFD methods have been put forward. Cheng et al. [8] built a Ramanujan Fourier decomposition method, and it had excellent anti-noise properties and precise feature extraction capability. Zheng [9] put forward a novel empirical reconstruction Gaussian decomposition method, which was used for GFD. Based upon the comparison of fault feature detections models, namely variational mode decomposition (VMD) and empirical mode decomposition (EMD), Zhang [10] presented a novel method of bearing fault diagnosis based on VMD. However, these methods often need significant human intervention and experienced experts, and are not suitable for analyzing and processing large datasets. With the increase in the complexity of the vibration signal, the efficient feature extraction of the gear fault characteristics is key to understanding gear damage status, which led to machine learning becoming the primary mainstream method used for gear fault diagnosis. To extract gear fault features under ultra-low compression conditions by multi-channel compressed signals, Liu [11] proposed a new weighted and distributed empirical compressed sensing (CS) method. Integrating VMD and particle swarm optimization (PSO), Liu [12] presented a bearing fault diagnosis approach to overcome the problem of difficult fault feature extraction under strong noise conditions.

With the rapid development of modern industrialization and artificial intelligence, deep neural networks are being applied in the field of intelligent fault diagnosis [13]. To better extract gear fault features, Refs. [14,15] encoded raw 1D signals into 2D images and introduced a corresponding machinery fault diagnosis and classification approach by inputting a time series of vibration signals. Xing [16] proposed a MLDGCNN network model in order to obtain cutting top line information. To accurately extract fault features with insufficient samples, a GFD method combining a Gramian angular summation field (GASF) and the multi-scale channel attention mechanism DenseNet was proposed by Shi [17]. Combining a recurrence plot and a convolutional neural network (CNN), Wang [18] and Liu [19] presented an intelligent fault diagnosis and classification scheme for planetary gears and bearings, respectively. To achieve an end-to-end diagnostic mechanism, Chen [20] presented a rotating machinery fault diagnosis method involving novel continuous wavelet transform. Combining deep-transfer learning and CNNs, Liu and his team [21,22] developed different fault diagnosis methods for different types of rotating parts (such as bearings, gears, and blades). To enhance the adaptability and precision of GFD, Lin [23] put forward an intelligent gear diagnosis approach based upon a CS-improved VMD and a probabilistic neural network. Chen [24] proposed a bearing fault diagnosis approach on the basis of the improved empirical wavelet transform and compressive sensing joint denoising and lead convolution neural network. To upgrade the precision of the fault diagnosis approach with noise for rotating machinery, Weng [25] put forward a multi-scale kernel-based network that considered the attention mechanism, which additionally improved the precision and efficiency. For bearings, Jin [26] presented a novel anti-noise multi-scale CNN to enable coupling fault diagnosis at different levels of noise. Zhao [27] proposed a new multi-scale inverted residual CNN approach for variable load bearing fault diagnosis. In Ref. [28], a bearing fault diagnosis approach based on the multi-scale feature fusion of a parallel CNN was described. To break through the limitations of small samples and strong noise interference, Wang [29] designed a lightweight multi-scale CNN with a fault diagnosis approach. A novel multiscale wavelet prototypical network was developed by Yue [30], which enabled cross-component fault diagnosis. Wang [31] proposed a multi-layer fusion CNN and a multi-layer fusion module–relational knowledge distillation module with an attention mechanism to improve robustness in different noisy environments. To extract fault characteristics effectively from the vibration data, Zhang [32] and Zhong [33] presented different mechanical fault diagnosis methods, which were able to achieve synergistic effects between networks to improve deep learning capability. To overcome the limitation of complex background noise and the number of fault samples, Refs. [34,35,36] each put forward different fault diagnosis methods with an improved attention mechanism, all of which had excellent robustness and generalization properties when compared to other GFD approaches in noise conditions.

Through research and analysis, it was found that existing gear fault diagnosis methods have poor anti-noise performance and insufficient generalization ability. In order to meet the above challenges, a novel compressive sensing lightweight attention multi-scale residual network method is proposed to improve the adaptability and accuracy of GFD. Firstly, a MSFE module with a strong anti-noise ability was created, which can capture image features at different scales. Then, the lightweight attention (LATT) module was designed, which can allocate weights adaptively and reduce calculation parameters. Finally, the constructed IDRA module can effectively extract global and local fault information.

The contributions and highlights of this paper are as follows:

(1): The CS method was used to reconstruct vibration data and transform them into images through GADF so as to extract more comprehensive feature information.
(2): A MSFE module with strong anti-noise ability was constructed, image features of different scales were captured by parallel convolution layers with different convolution kernel sizes, and multi-scale features were extracted by feature fusion.
(3): An IDRA module was created. The IDRA module was embedded with the designed lightweight attention module, which ensures the full extraction of fault features and improves calculation efficiency.
(4): The effectiveness of the proposed method was verified by NEU dataset and SEU dataset, and the advantages of CS-LAMRNet and the effectiveness of the proposed modules were verified by noise contrast experiments and ablation experiments.

The layout of this research is as follows: Section 2 introduces the related theories of CS, GADF, and DSC. In Section 3, the architecture of CS-LAMRNet model is outlined and the overall structure of the network is discussed in detail. Section 4 demonstrates the verified effectiveness and superiority of the presented CS-LAMRNet compared with other fault diagnosis methods through the NEU dataset and the SEU dataset. Finally, the main conclusions of this research are summarized in Section 5.

2. Methodology

2.1. Compressed Sensing

CS technology enables complete sparse signal reconstruction with the characteristics of a low sampling rate and undistorted signals and plays a vital role in signal processing, speech processing, image processing, and wireless communication fields. Based on signal sparseness, the relevant information is projected into a low dimensional space matrix by compressed sensing technology that is independent of the sparse transform bases, reconstructing the original signal from a small amount of observational data with high probability of achieving signal denoising. Hence, the comprehensive information of gear fault features is stored and transmitted by utilizing the low-dimensional and compressed signals of CS [11].

Assuming that x (x ∈ R^N) is a one-dimensional signal of length N containing K nonzero terms and that the number of non-zero terms in the one-dimensional signal is much smaller than the vector length (namely, K << N), signal x is regarded as sparse, which is given as:

x = \sum_{k = 1}^{N} ψ_{k} s_{k} = Ψ s .

(1)

where

Ψ \in R^{N \times N}

and

s \in R^{N \times 1}

represent the sparse matrix and sparse vector, respectively.

The measurement matrix, as a finite number of linear measurements obtained from sparse signals, can realize the effective sampling and compression of signals, which can improve the accuracy of the reconstructed sparse signal. Therefore, the observed value y can be obtained by reconstructing the sparse discrete signal x through the measurement matrix, and the operation can be described by Equation (2):

y = Φ x = Φ Ψ s = Θ s .

(2)

in which Φ ∈ R^M×N is an M × N measurement matrix, Θ = ΦΨ denotes a sensing matrix, and y ∈ R^M×1 represents the measured values, which are smaller than the size of the original signal. To ensure the accurate reconstruction of the original signal x, the sensing matrix Θ should be subject to the restricted isometry principle (RIP), namely:

(1 - δ_{k}) {‖s‖}_{2}^{2} \leq {‖Θ s‖}_{2}^{2} \leq (1 + δ_{k}) {‖s‖}_{2}^{2} .

(3)

where δ_k ∈ (0, 1) denotes an isometric constant of the RIP. Based on the above analysis, it can be observed that Equation (3) is an undetermined system. To obtain the optimal solution of s, the s, y, and Θ need to meet the following conditions:

\begin{array}{l} \hat{s} = \arg \min ‖s‖, \\ s . t . y = Θ s . \end{array}

(4)

2.2. Gramian Angular Difference Fields

GADF is a data dimension transformation method based on a polar coordinate Gram matrix, which can retain and transfer 1D time series gear vibration information into image texture features and realize feature visualization [37]. Its coding process is shown in Figure 1. The Gram matrix is composed of inner products between vectors, which can maintain the time dependence of the time series. Every element in the matrix denotes the cosine angle between two time points. Assuming that the X = (X_i, i = 1, 2, … N) is a time series data point with N samples, the transition progress X by GADF is as follows:

(1) The original 1D time series data point X is normalized within a range of [−1, 1].

{\tilde{x}}_{- 1}^{i} = \frac{(x_{i} - \max (X) + (x_{i} - \min (X))}{\max (X) - \min (X)} .

(5)

(2) The scaled sequence data are converted from Cartesian to polar coordinates, and the

{\tilde{x}}_{- 1}^{i}

and t_i are encoded as φ_i and r_i, respectively. The progress is described as follows:

\begin{array}{l} φ = \arccos ({\tilde{x}}_{i}), - 1 \leq {\tilde{x}}_{i} \leq 1, {\tilde{x}}_{i} \in X, \\ r_{i} = \frac{t_{i}}{N}, t_{i} \in N . \end{array}

(6)

(3) Due to

{\tilde{x}}_{- 1}^{i}

∈ [−1, 1], φ_i varies monotonically at 0 ≤ φ_i ≤ π, which allows the encoding of the time series at polar coordinates to obtain a unique mapping relationship. In the encoding scheme, the time sequence continues from top left to bottom right. The main diagonal represents the original gear signal time information, and other regions show interconnections between points in different time series, where φ_i represents the angle of the ith sequence point. The GADF can be expressed as follows:

GADF = [\begin{matrix} \sin (φ_{1} - φ_{1}) & \dots & \sin (φ_{1} - φ_{n}) \\ \sin (φ_{2} - φ_{1}) & \dots & \sin (φ_{2} - φ_{n}) \\ ⋮ & \sin (φ_{i} - φ_{i}) & ⋮ \\ \sin (φ_{n} - φ_{1}) & \dots & \sin (φ_{n} - φ_{n}) \end{matrix}] .

(7)

2.3. Depthwise Separable Convolution

Depthwise separable convolution (DSC), as a common convolutional operation in CNN, is extensively utilized in fault diagnosis models such as MobileNet [38], ConvNeXt [39], and Xception [40] because it can maintain a good performance and greatly reduce computing time and model parameters, which improve the efficiency and speed of the diagnostic model. DSC primarily reduces computational and parametric complexity calculations and parameterized complexity by performing separate convolutions in different data channels; this process consists of two parts: depthwise convolution (DW) for extracting spatial features and point convolution (PW) for extracting channel features. The schematic diagram of DSC is displayed in Figure 2.

It assumes that the input feature size is H × W × M and the input/output channel number and convolution kernel are M/N and D_K × D_K, respectively. In addition, the step size is 1 and the fill is 0. The standard convolution is computed as follows:

Q_{1} = D_{K} \times D_{K} \times M \times N \times H \times W .

(8)

The parameter counts for DW/PW are given as follows:

Q_{D} = D_{K} \times D_{K} \times M \times H \times W .

(9)

Q_{P} = M \times N \times H \times W .

(10)

Adding Equations (9) and (10), the whole parameter count for depthwise separable convolution is expressed as follows:

\begin{array}{l} Q_{2} & = Q_{D} + Q_{P} \\ = D_{K} \times D_{K} \times M \times H \times W + M \times N \times H \times W . \end{array}

(11)

The ratio of DSC and standard convolution can be obtained from Equations (8) and (11):

\begin{array}{l} \frac{Q_{2}}{Q_{1}} & = \frac{D_{K} \times D_{K} \times M \times H \times W + M \times N \times H \times W}{D_{K} \times D_{K} \times M \times N \times H \times W} \\ = \frac{1}{N} + \frac{1}{D_{K}^{2}} . \end{array}

(12)

It can be seen from Equation (12) that the parameter computation of DSC is (1/N + 1/

D_{K}^{2}

) times that of the standard convolution, which can more obviously reduce the computational effort with the increase in the convolution kernel size, so the use of DSC can effectively save computing costs and improve efficiency.

3. CS-LAMRNet

Since gears mostly operate in a hostile environment, collected gear vibration data are easily polluted by loud and non-stationary noise, which leads to the collected data being very obviously nonlinear or even to the weak fault characteristics of the gear being overlooked. Thus, extracting effective features from gear data with noise is still a challenge in intelligent gear fault diagnosis. To overcome the above shortcomings, a novel compressive sensing lightweight attention multi-scale residual network method for gear fault diagnosis is proposed, which is divided into three parts: a multi-scale feature extraction (MSFE) module, a lightweight attention module, and an improved depth residual attention (IDRA) module.

3.1. Multi-Scale Feature Extraction Module

The limitations of relying on a single small-scale convolution kernel for global information extraction and the inability of large convolution kernels to precisely capture local features can result in critical information loss and an elevated risk of overfitting, thereby constraining a model’s capacity for feature representation. Moreover, the presence of noise and fault-related features distributed across various frequency bands in vibration data imparts a multi-scale characteristic to collected gear fault datasets. To effectively address this multi-scale nature, this section introduces a multi-scale feature extraction (MSFE) module predicated on multi-scale feature learning. This module leverages diverse convolution kernel sizes to extract features at multiple scales and levels, with the aim of improving the comprehensiveness and precision of feature extraction.

Combined with depth-separable convolution, a high-efficiency multi-scale feature extraction architecture is shown in Figure 3. The multi-scale feature extraction module has three parallel branches, and the sizes of convolution kernel on each branch are 3 × 3, 5 × 5, and 7 × 7, respectively. In addition, the filling parameters of each convolution kernel are set to 1, 2, and 3, which can ensure that the different scale features introduced into the fusion layer display the same dimensions. Finally, a batch normalization (BN) layer is added at the end of each branch to reduce the risk of over-fitting. The main function of the BN layer is to normalize each batch input data point, which makes network training more stable and faster, as shown below:

S_{i} = B ({DwConv 2 d}_{i} (I)) .

(13)

where I represents the input; DwConv2d_i (i = 1, 2, 3) is the ith depth separable convolution layer; B is the BN layer; and S_i (i = 1, 2, 3) denotes the output of the ith branch. Finally, the information extracted from the three channels is fused to obtain comprehensive features, which are input into the neural network:

F = Concatenate [S_{1}, S_{2}, S_{3}, \dim = 1] .

(14)

where F represents the output of MSFE and dim = 1 represents the connections in the channel dimension.

3.2. Lightweight Attention Module

To overcome the noise pollution in the collected gear fault dataset and enhance the noise resistance capacity of the model, a lightweight attention module was designed which can adaptively assign weights, ensure that the model focuses on important features related to gear failure, reduce the impact of redundant information, extract useful fault features, and effectively improve the robustness of GFD. During the calculation of LATT, for a given input

X \in R^{n \times d}

, the queries, keys, and values are

Q \in R^{n \times d_{q}}

,

K \in R^{n \times d_{k}}

, and

V \in R^{n \times d_{v}}

, respectively. Specifically, n is the patch quantity; d represents the dimensions of the input tensor; and d_q, d_k, and d_v denote the characteristic dimensions of the query, key, and value vectors. Then, in order to effectively pool attention, a faster, lighter scaled dot-product attention is adopted, which is expressed as follows:

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V .

(15)

To reduce the computational costs brought by pooling attention and decrease the spatial size of K and V, two 3 × 3 depthwise separable convolution layers were included. The framework of the proposed LATT module is shown in Figure 4.

K^{'} = DWConv (K) \in R^{\frac{n}{3^{4}} \times d_{k}}

and

V^{'} = DWConv (V) \in R^{\frac{n}{3^{4}} \times d_{v}}

. In other words, the relative position deviation B is added to each self-attention module, which generates a lightweight multi-head attention mechanism, and the formula is as follows:

LAttention (Q, K, V) = softmax (\frac{{Q K}^{' T}}{\sqrt{d_{k}}} + B) V^{'} .

(16)

The n × d_h for each head can be obtained by utilizing the “heads” of the LATT module. Subsequently, the output sequence of each head is stacked and forms a synthetic sequence sized n × d.

3.3. Improved Depth Residual Attention Module

In order to effectively improve gear fault feature extraction and accurately capture local and global feature information, an improved depth residual attention (IDRA) module was developed and is outlined in this section, consisting of an improved residual module and a LATT module. The IDRA module utilizes a depthwise separable convolution layer considering a convolution kernel size of 5 × 5 to improve the ability and efficiency of gear feature extraction, and a reverse bottleneck design in the Transformer was utilized to move the deep convolution layer up one layer in order to accommodate a larger convolution kernel. In addition, the GELU was selected as the activation function, which combines the advantages of ReLU and Dropout, and its core features are nonlinearity and smoothness. And it has a certain regularization effect, which helps to avoid the problem of gradient disappearance and explosion. The approximate calculation formula of GELU is expressed as follows:

GELU (x) = 0.5 x [1 + \tanh (\sqrt{\frac{2}{π}} (x + 0.047715 x^{3})] .

(17)

Figure 5 shows the framework of the IDRA module. To dynamically adjust the strength of the residual connection, a layer scale was introduced after the improved depth residual attention module. It can be observed that the input gear features enter the lightweight attention module for further feature extraction after the residual block treatment.

3.4. Architecture of CS-LAMRNet

In this section, the architecture of proposed compressive sensing lightweight attention multi-scale residual network is displayed in Figure 6. It can be seen intuitively that the input of CS-LAMRNet is a two-dimensional image of the 1D vibration signal converted by GADF. Firstly, the feature information is fully extracted by MSFE branches with three different convolution kernels and BN layers, and then the extracted features are fused as the input of the backbone network. The backbone network of the model is represented by the stack of IDRA modules in the following order: 1, 2, 2, 1. After repeated tests, the dim of each group was set to 96, 192, 384, and 768 to further process the feature information. A downsampling module was placed between each stack block, which consists of BN layer with step size of 2 and convolution kernel of 2 × 2, so as to reduce the space size of feature images and thus reduce the complexity of the model. Finally, the learned features were input into the GAP and the average values of each feature image were calculated, which significantly reduced the number of parameters in the full connection layer, and the classification of the fault types was completed by SoftMax. The specific parameters of each network layer of CS-LAMRNet are shown in Table 1.

3.5. Flow Chart of the Proposed Fault Diagnosis Method

Figure 7 presents a novel flowchart of gear fault diagnosis method for complex operating conditions based on the CS-LAMRNet proposed in this section, and four steps are contained in this framework.

Step 1: Data acquisition. The vibration data of gears under various working conditions are acquired by an acceleration sensor mounted on the gearbox and the data collection system.

Step 2: Signal processing. The vibration signal after noise reduction is converted into 2D image by Gramian angular difference fields; then, it is divided into two proportional parts, namely a training set and a test set.

Step 3: Model training. The training set is input into CS-LAMRNet for iterative training. The loss function is optimized by the Adam algorithm, and the Softmax classifier is used to classify.

Step 4: Fault diagnosis. To achieve fault identification, the test set is input into the proposed CS-LAMRNet, and the accuracy of the entire test set and single fault identification are the outputs.

4. Experimental Results and Discussion

In this section, the effectiveness and generalization ability of the CS-LAMRNet method are validated using experimental data from the Northeastern University gearbox dataset (NEU dataset) and Southeastern University gearbox dataset (SEU dataset). In addition, to verify the advantages of the present model, comparative tests against several popular models are carried out. Finally, an ablation experiment is performed to evaluate the effectiveness of the various components of the present model. The batch size, learning efficiency, and training epoch of the model are set to 16, 0.0005, and 100, respectively.

4.1. Dataset Description

(1): NEU dataset

To obtain gear status and fault data, a CL-100 gear wear testbed was set up, as shown in Figure 8. The gear experiment table consists of a gearbox, a DH5922D dynamic signal test system, a loading device, an acceleration sensor, etc. The gear vibration data sampling frequency was 20 kHz, the sampling time was 20 s, and the gear speed was set to 1450 r/min. The main parameters of the gears are listed in Table 2. The experimental platform has five states, namely normal gear (NO), tooth break (MT), tooth pitting (CT), tooth crack (RF), and tooth wear (SF), which are displayed in Figure 9.

In order to achieve a good balance between computational efficiency and information retention and to ensure that the generated GADF images can provide sufficient information, for each type of gear fault data, a sliding window of 1000 data points was used for extraction. GADF was utilized to convert the data of each fault type into 1000 images, with the image size set to 224 × 224, thereby providing sufficient spatial resolution to capture the fault information. A total of 5000 images were accumulated for the five types of faults. Figure 10a shows the GADF images of the NEU datasets. Then, the corresponding dataset was divided into a training set and a test set at a ratio of 8:2. The different gear fault types are labeled and listed in Table 3.

(2): SEU dataset

The SEU gear dataset of gearbox was acquired from the dynamic drivetrain simulator (DDS), as presented in Figure 11, which consists of motor, motor controller, parallel shaft gearbox, load, and load controller. Two different conditions (1200 r/min at 0 Nm and 1800 r/min at 7.32 Nm) were selected and researched [41]. The SEU dataset contains multi-type data and the vibration signal of gearbox in x-axis was selected under the first condition. The dataset contains four common gear faults and normal gearbox data: tooth pitting (CT), root fault (MT), tooth break (RF), tooth wear (SF), and normal state (NO). Each fault type adopts a 1000-data point sliding window for feature extraction and the training datasets are converted into 5000 images by GADF. Figure 10b shows the GADF images of SEU datasets. Subsequently, each fault type contains 800 training samples and 200 test samples. A detailed description of the different types of faults for gearboxes is given in Table 4. In order to better compare the differences and connections between the two datasets, we summarize the missing datasets in Table 5.

4.2. Experimental Results and Visual Analysis

(1): Experimental verification in NEU dataset

The accuracy and loss function were obtained using the CS-LAMRNet, and the results are presented in Figure 12. During the training process, the identification accuracy was gradually improved and the loss was diminished accordingly with increasing iterations. It should be noted that the proposed CS-LAMRNet method rapidly converges and stabilizes during training. As the iteration exceeds 20 epochs, the accuracies of training set and test set reached 99.80%, and the CS-LAMRNet rapidly reached the minimum loss value and stabilized at 0.2, which indicates that the CS-LAMRNet method basically achieved convergence, and nearly all training and testing samples were identified accurately. To further investigate the feature learning and fault classification capabilities of the CS-LAMRNet, the features of the model testing procedure were visualized by utilizing confusion matrix and t-SNE with the NEU dataset, which corrected the misclassified proportions of the different gear fault samples. As can be seen, the 1.0% test samples of the tooth wear are depicted as tooth cracks in Figure 13a, and all other test samples are correctly classified. The faster convergence and higher precision of the training process prove the better performance of the proposed CS-LAMRNet.

(2): Experimental verification in SEU dataset

In order to further verify the superiority of the proposed CS-LAMRNet model, the accuracy and loss of training and test samples were studied, as shown in Figure 14. After the first 20 epochs, the recognition accuracies almost overlapped, reached 100%, and stabilized; meanwhile, the corresponding losses fell rapidly and eventually stabilized at 0.001. Furthermore, to evaluate a more comprehensive image of the CS-LAMRNet and the SEU gear dataset, the confusion matrix and t-SNE were employed to obtain more detailed diagnostic information, as shown in Figure 15, which indicates that the proposed GFD method has an excellent classification ability.

4.3. Comparative Experiments

(1): Comparative experiments based on NEU dataset

To verify the effectiveness and feasibility of CS-LAMRNet in this research, the experiment results were compared with other seven methods, namely ResNet18 [42], VGG11 [43], MTF-ResNet [44], GADF-CNN [45], MobileNet V3 [38], ConvNeXt-T [39], and DRSN-CW [46]. To ensure a single variable, all methods were required to use the same dataset (NEU dataset) during training and testing. To avoid contingency and randomness and to improve the accuracy of the proposed model, the average values of the accuracy and F1-macro were chosen as model evaluation indexes, which are obtained by carrying out ten experiments for each fault diagnosis method. The expressions are as follows:

Accuracy = \frac{TP + FN}{TP + FP + TN + FN}

(18)

F_{1} = \frac{2 TP}{2 TP + FP + FN}

(19)

where TP/FP and TN/FN represent true/false positive examples and true/false negative examples, respectively.

Table 6 summarizes the average accuracies of the different methods, which are 87.45%, 97.72%, 96.91%, 96.90%, 98.96%, 99.30%, 99.18%, and 99.58%, respectively. In contrast, the average accuracy value of the VGG11 is the smallest at 87.45%, due to a small convolution kernel (3 × 3). Although the VGG11 enhances the network depth, it may lead to the large loss of features and spatial information in some cases, and the feature expression ability is limited, resulting in overfitting to a certain extent. The average accuracy of the other fault diagnosis methods was over 96.90%. Of these, the DRSN-CW yields better results, namely 99.30% with a F1-score of 99.30%. However, the highest accuracy and F1-score are 99.58% and 99.83%, which were obtained by the CS-LAMRNet proposed in this research. Compared with VGG11 and DRSN-CW, the accuracy rate and F1-score increased by 12.13%, 0.28%, and 12.32%, 0.53%, respectively, which indicates that the CS-LAMRNet has excellent diagnosability.

(2): Comparative experiments based on SEU dataset

In this section, the SEU dataset is utilized to evaluate the advantages of the proposed method and whether the corresponding calculation strategy and the previously mentioned fault diagnosis methods are the same as in the upper segment. The results of the comparison of the accuracies and F1-scores are displayed in Table 7. In contrast, it can be seen that the proposed CS-LAMRNet model has the highest accuracy and F1-score among the compared methods, reaching 100%. Therefore, from the previous investigation and comparison results, the conclusion can be drawn that the proposed model exhibits excellent fault classification and a good and robust diagnosis effect.

4.4. Comparative Experiment in Noisy Environments

Although the proposed CS-LAMRNet method in this research has an excellent fault diagnosis ability for the laboratory gear datasets (NEU dataset and SEU dataset), gear systems operate in complicated and volatile conditions, and the obtained gear vibration data are frequently polluted by noise in actual industrial applications. Hence, to evaluate the anti-noise ability of the proposed CS-LAMRNet, the SEU gear dataset is injected with Gaussian noise to simulate the noise in real working conditions. The signal-to-noise ratio (SNR) is applied to evaluate Gaussian noise intensity, and it can be expressed as follows:

SNR = 10 \log_{10} \frac{P_{S}}{P_{N}}

(20)

Here, P_S represents the power of the signal and P_N denotes power of the noise.

To ensure the consistency of the analysis, the several diagnostic methods mentioned above were used for comparison. The comparison experiments were carried out using the SEU dataset with different SNRs (4 dB, 6 dB, 8 dB, 10 dB, 12 dB, and 14 dB), which are shown in Figure 16. From the line chart in Figure 16, it can be observed that the accuracy of the proposed CS-LAMRNet shows better diagnostic ability under the examined SNRs than the seven other approaches. When SNR = 4 dB, the accuracies of VGG11, MobileNet V3, ResNet18, MTF-ResNet, and GADF-CNN are all below 86%, while the accuracies of ConvNeXt-T, DRSN-CW, and CS-LAMRNet remain above 92.5%. With increasing SNRs, it can be concluded that the proposed CS-LAMRNet model has the highest diagnostic accuracy rate of 100% when SNR ≥ 12 dB. Therefore, the results show that the proposed CS-LAMRNet model possesses excellent noise resistance properties.

To further evaluate the diagnostic performance of CS-LAMRNet in a noisy environment, Figure 17 displays the corresponding confusion matrix of the SEU gear dataset obtained by the aforementioned diagnostic methods under the condition of SNR = 6 dB. As shown in Figure 17, for VGG11, the 14%, 6%, and 6% of Label 2 are, respectively, identified as Label 1, Label 3, and Label 4; the 30% and 7% of Label 3 are, respectively, identified as Label 2 and Label 4; and the 20% and 13% of Label 4 are, respectively, identified as Label 2 and Label 3 in Figure 17a. In addition, the gear fault type is not precisely identified. This indicates that the VGG11 is not good at identifying the specific gear faults under strong noisy environments. In addition, the specific gear faults cannot be accurately classified by other diagnostic methods (ResNet18, MTF-ResNet, GADF-CNN, MobileNet V3, DRSN-CW, and ConvNeXt-T). However, it is easy to see that the diagnostic accuracy rate of CS-LAMRNet for Label 0/2 and Label 1/3 are 100% and 99%, respectively. The lowest accuracy rate is Label 4, but it also reaches 94%, which indicates that the CS-LAMRNet was able to correctly identify the five types of gear failure. Therefore, compared with other diagnostic methods, the proposed CS-LAMRNet presents the more accurate classification ability, especially in Labels 0, 1, 2, and 3, where the CS-LAMRNet has a better generalization ability.

To better understand the advantages and reliabilities of the CS-LAMRNet for GFD, the t-SNE algorithm was used to perform visual analysis. Figure 18 shows a diagram of the t-SNE of different models in a noisy 6 dB environment. It can be clearly seen that the visual results of the first seven diagnostic methods show obvious overlap, making it difficult to accurately identify different types of gear faults. In Figure 18h, the classification using CS-LAMRNet shows a remarkable clustering effect, namely the congeneric samples are clustered together in the feature space and form unique regions with clear boundaries. Hence, it can be concluded that the CS-LAMRNet displays good distinguishability and achieves accurate predictions for the SEU dataset.

4.5. Ablation Study

To evaluate the performance of every component of the proposed framework, four comparative methods of ablation experiments were carried out by removing or replacing specific structures within the CS-LAMRNet model using the SEU dataset; these are described as follows.

Method 1: The compressed sensing is not applied to process the signal with noise addition, which allows us to observe the changes in the model performance without denoising processing.

Method 2: The convolution kernels of 3 × 3, 5 × 5, and 7 × 7 in the MSFE module are all replaced with regular 3 × 3 convolutions, which can investigate the effect of convolution kernel size on feature extraction.

Method 3: The IDRA module is removed from the LATT module, which is used to evaluate the significance of the LATT module in the complete method.

Method 4: The CS-LAMRNet model in this research is compared with the previous three methods.

As shown in Table 8, it can be observed from the ablation experiments that the compressed sensing and MSFE module have a remarkable impact on diagnostic accuracy; in other words, the compressed sensing technology presents excellent denoising ability, which effectively improves the availability of signals in a complex environment. Meanwhile, multi-scale convolution kernel enhances the adequacy and diversity of feature extraction by capturing information at different scales. Additionally, the importance of the LATT module cannot be overlooked. The inclusion of the attention mechanism in CS-LAMRNet enables the model to adaptively focus on key fault information, alleviates the interference of noise, and improves diagnostic accuracy and reliability. Therefore, the effectiveness and necessity of each component of the proposed CS-LAMRNet are proven by the ablation experiments.

5. Conclusions

In this research, a novel compressive sensing lightweight attention multi-scale residual network for gear fault diagnosis is presented in order to achieve a higher gear fault identification capability under loud-noise conditions. The main conclusions are as follows:

(1): The noise and redundant information of the collected gear data were removed by the CS method. The data were then converted into a 2D image and enhanced the feature extraction capability of the diagnostic model.
(2): Combined with multi-scale learning strategies, an MSFE module with a strong anti-noise capability is proposed. Different size convolution kernels were utilized to extract features of different scales and levels, which can ameliorate the comprehensiveness and accuracy of feature extraction.
(3): An improved deep residual attention module was constructed by introducing a lightweight attention mechanism, which improved the capacity of the model to extract and distinguish features, and the accuracy and computational efficiency improved accordingly.
(4): In comparison to different gear failure datasets, the proposed CS-LAMRNet has better anti-noise capabilities and accurate diagnostic performance in gear fault diagnosis, as can be observed by comparing its accuracy, confusion matrix, and t-SNE feature visualization results with those of other gear fault diagnosis methods.
(5): The model was verified using the NEU dataset and the SEU dataset, and the average accuracy rates were 99.80% and 100%, respectively, and the accuracy rate was still 98.24% in a noisy 6 dB environment. Compared with the accuracy, confusion matrix, and t-SNE feature visualization results of the other gear fault diagnosis methods, the proposed CS-LAMRNet has a better anti-noise ability and accurate diagnosis performance. Finally, the four groups of in the ablation experiments strongly prove the effectiveness of each part of the model proposed in this paper.

Given that the information provided by a single sensor for gear faults is limited and considering the influence of a high-quality working environment and external interference factors, we will adopt the multi-sensor information fusion technology to combine the sound and vibration data, thereby enhancing the accuracy and robustness of GFD. Additionally, we will consider reducing the application cost of hardware, further improving the diagnostic capability for coupling faults, and conducting real-time industrial diagnosis and promoting its application in other rotating equipment, such as turbines and motors.

Author Contributions

Conceptualization, S.Z.; formal analysis, S.Z. and X.Y.; writing—original draft, S.Z.; writing—review and editing, S.Z.; software. X.L.; visualization, Y.W. and K.J.; investigation, Z.R. All authors have read and agreed to the published version of the manuscript.

Funding

The project is supported by the National Natural Science Foundation of China (no. 52275091), the Fundamental Research Funds for the Central Universities (no. N2303011), and the Shenyang Natural Science Foundation (no. 23-503-6-02).

Data Availability Statement

Selected data, models, and code generated or used during the study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

Mohammed, O.D.; Rantatalo, M. Gear fault models and dynamics-based modelling for gear fault detection—A review. Eng. Fail. Anal. 2020, 117, 104798. [Google Scholar] [CrossRef]
Yang, W.H.; Liu, F.B.; Huang, G.Y. Research on weak fault diagnosis of gearbox based on attention mechanism. J. Mach. Design. 2024, 1–12. [Google Scholar] [CrossRef]
Sun, H.C.; Wang, C.D.; Cao, X. An adaptive anti-noise gear fault diagnosis method based on attention residual prototypical network under limited samples. Appl. Soft Comput. J. 2022, 25, 109120. [Google Scholar] [CrossRef]
Wang, T.Y.; Han, Q.K.; Chu, F.L.; Feng, Z.P. Vibration based condition monitoring and fault diagnosis of wind turbine planetary gearbox: A review. Mech. Syst. Signal Process. 2019, 126, 662–685. [Google Scholar] [CrossRef]
Pan, H.Y.; Zheng, J.D.; Yang, Y.; Cheng, J.S. Nonlinear sparse mode decomposition and its application in planetary gearbox fault diagnosis. Mech. Mach. Theory 2021, 155, 104082. [Google Scholar] [CrossRef]
Zhao, X.L.; Yao, J.Y.; Deng, W.X.; Ding, P.; Jia, M.; Liu, Z. Intelligent fault diagnosis of gearbox under variable working conditions with adaptive intraclass and interclass convolutional neural network. IEEE Trans. Neural Networks Learn. Syst. 2023, 34, 6339–6353. [Google Scholar] [CrossRef] [PubMed]
Lu, Y.; Du, J.; Tong, X.C.; Zhang, W. Research on multi-source sparse optimization method and its application on gearbox compound fault detection. Eng. Sci. Technol. 2024, 57, 101800. [Google Scholar] [CrossRef]
Cheng, J.; Yang, Y.; Wu, Z.T.; Shao, H.D.; Pan, H.Y.; Cheng, J.S. Ramanujan fourier mode decomposition and its application in gear fault diagnosis. IEEE Trans. Ind. Inform. 2022, 18, 6079–6088. [Google Scholar] [CrossRef]
Zheng, X.B.; Yang, Y.; Hu, N.Q.; Cheng, Z.; Cheng, J.S. A novel empirical reconstruction Gauss decomposition method and its application in gear fault diagnosis. Mech. Syst. Signal Process. 2024, 210, 111174. [Google Scholar] [CrossRef]
Zhang, M.; Jiang, Z.N.; Feng, K. Research on variational mode decomposition in rolling bearings fault diagnosis of the multistage centrifugal pump. Mech. Syst. Signal Process. 2017, 93, 460–493. [Google Scholar] [CrossRef]
Liu, Z.Z.; Kuang, Y.C.; Jiang, F.; Zhang, Y.; Lin, H.; Ding, K. Weighted distributed compressed sensing: An efficient gear transmission system fault feature extraction approach for ultra-low compression signals. Adv. Eng. Inform. 2024, 62, 102833. [Google Scholar] [CrossRef]
Liu, R.J.; Wang, X.R.; Su, C.W.; Kang, Z.J.; Li, Y.; Yu, S.; Zhang, H. Bearing fault diagnosis method based on variational mode decomposition optimized by CS-PSO. J. Vib. Control 2024, 30, 973–987. [Google Scholar] [CrossRef]
Xing, Z.Z.; Zhao, S.F.; Guo, W.; Meng, F.; Guo, X.; Wang, S.; He, H. Coal resources under carbon peak: Segmentation of massive laser point clouds for coal mining in underground dusty environments using integrated graph deep learning model. Energy 2023, 285, 128771. [Google Scholar] [CrossRef]
Wang, H.T.; Dai, X.Y.; Shi, L.C.; Li, M.J.; Liu, Z.L.; Wang, R.H.; Xia, X.H. Data-augmentation based CBAM-ResNet-GCN method for unbalance fault diagnosis of rotating machinery. IEEE Access 2024, 12, 34785–34799. [Google Scholar] [CrossRef]
Han, B.; Zhang, H.; Sun, M.; Wu, F.T. A New Bearing fault diagnosis method based on capsule network and Markov transition field/Gramian angular field. Sensors 2021, 21, 7762. [Google Scholar] [CrossRef] [PubMed]
Xing, Z.Z.; Yang, Y.; Tan, L.; Guo, X.J. Multi-source physical information driven deep learning in intelligent education: Unleashing the potential of deep neural networks in complex educational evaluation. AIP Adv. 2025, 15, 025214. [Google Scholar] [CrossRef]
Shi, L.C.; Zhang, P.; Wang, H.T.; Zhou, X.Y. Small-sample gear fault diagnosis method based on GASF and MSCAM-DenseNet. Compu. Int. Manufact. 2023, 1–21. [Google Scholar] [CrossRef]
Wang, D.F.; Guo, Y.; Wu, X.; Na, J.; Litak, G. Planetary-gearbox fault classification by convolutional neural network and recurrence plot. App. Sci. 2020, 10, 932. [Google Scholar] [CrossRef]
Liu, X.P.; Xia, L.J.; Shi, J.; Zhang, L.J.; Bai, L.Y.; Wang, S.P. A fault diagnosis method of rolling bearing based on improved recurrence plot and convolutional neural network. IEEE Sens. J. 2023, 23, 10767–10775. [Google Scholar] [CrossRef]
Cheng, Y.W.; Lin, M.X.; Wu, J.; Zhu, H.P.; Shao, X.Y. Intelligent fault diagnosis of rotating machinery based on continuous wavelet transform-local binary convolutional neural network. Knowl.-Based Syst. 2021, 216, 106796. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, W.Y.; Wang, X.; Gu, H. A novel wind turbine fault diagnosis method based on compressed sensing and DTL-CNN. Renew. Energy 2022, 194, 249–258. [Google Scholar] [CrossRef]
Gu, H.; Liu, W.Y.; Zhang, Y.; Jiang, X.Y. A novel fault diagnosis method of wind turbine bearings based on compressed sensing and AlexNet. Meas. Sci. Technol. 2022, 33, 115011. [Google Scholar] [CrossRef]
Lin, Y.; Xiao, M.H.; Liu, H.J.; Li, Z.L.; Zhou, S.; Xu, X.M.; Wang, D.C. Gear fault diagnosis based on CS-improved variational mode decomposition and probabilistic neural network. Measurement 2022, 192, 110913. [Google Scholar] [CrossRef]
Chen, Z.G.; Du, X.L.; Zhang, N.; Zhang, J.L. Application of IEWT-CS and LCNN in bearing fault diagnosis. J. Harbin Eng. Univ. 2023, 41, 463–472. [Google Scholar]
Weng, C.Y.; Lu, B.C.; Gu, Q. A multi-scale kernel-based network with improved attention mechanism for rotating machinery fault diagnosis under noisy environments. Meas. Sci. Technol. 2022, 33, 055108. [Google Scholar] [CrossRef]
Jin, Y.R.; Qin, C.J.; Zhang, Z.N.; Tao, J.F.; Liu, C.L. A multi-scale convolutional neural network for bearing compound fault diagnosis under various noise conditions. Sci. China Technol. Sci. 2022, 65, 2551–2563. [Google Scholar] [CrossRef]
Zhao, W.L.; Wang, Z.J.; Cai, W.N.; Zhang, Q.Q. Multiscale inverted residual convolutional neural network for intelligent diagnosis of bearings under variable load condition. Measurement 2022, 188, 110511. [Google Scholar]
Jiang, S.; Feng, S.L.; Wu, B.; Wang, W.R.; Lu, F.L.; Yuan, X.B. Bearing fault diagnosis based on multi-scale feature fusion of parallel network. Trans. Micro. Technol. 2023, 42, 121–125. [Google Scholar]
Wang, Y.; Wang, J.N.; Tong, P.C. Small sample fault diagnosis for wind turbine gearbox based on lightweight multiscale convolutional neural network. Meas. Sci. Technol. 2023, 34, 095111. [Google Scholar] [CrossRef]
Yue, K.; Li, J.P.; Chen, J.B.; Huang, R.Y.; Li, W.H. Multiscale wavelet prototypical network for cross-component few-shot intelligent fault diagnosis. IEEE Trans. Instrum. Meas. 2023, 72, 3502411. [Google Scholar] [CrossRef]
Wang, M.Y.; Yang, Y.X.; Wei, L.X.; Li, Y.S. A lightweight gear fault diagnosis method based on attention mechanism and multilayer fusion network. IEEE Trans. Instrum. Meas. 2024, 73, 3503011. [Google Scholar] [CrossRef]
Zhang, S.; Liu, Z.W.; Chen, Y.P.; Jin, Y.L.; Bai, G.S. Selective kernel convolution deep residual network based on channel-spatial attention mechanism and feature fusion for mechanical fault diagnosis. ISA Trans. 2023, 133, 369–383. [Google Scholar] [CrossRef] [PubMed]
Zhong, X.Y.; Li, Y.F.; Xia, T.Y. Parallel learning attention-guided CNN for signal denoising and mechanical fault diagnosis. J. Braz. Soc. Mech. Sci. 2023, 45, 239. [Google Scholar] [CrossRef]
Zhan, S.N.; Shao, R.P.; Men, C.J.; Hao, H.M.; Wu, Z.F. Fault diagnosis method for planetary gearbox based on intrinsic feature extraction and attention mechanism. Meas. Sci. Technol. 2024, 35, 035116. [Google Scholar] [CrossRef]
Yang, Q.C.; Tang, B.P.; Shen, Y.Z.; Li, Q.K. Self-Attention Parallel Fusion Network for Wind Turbine Gearboxes Fault Diagnosis. IEEE Sens. J. 2023, 23, 23210–23220. [Google Scholar] [CrossRef]
Yao, Y.; Gui, G.; Yang, S.X.; Zhang, S. An adaptive anti-noise network with recursive attention mechanism for gear fault diagnosis in real-industrial noise environment condition. Measurement 2021, 186, 110169. [Google Scholar] [CrossRef]
Wang, X.L.; Fu, R.Q.; Jin, H.W.; Zhang, B.W.; Li, Y.S.; He, Y.L. Performance degradation assessment of rolling bearing by fusing texture feature of Gramian angular difference field. J. Vib. Eng. 2024, 2024, 1–8. [Google Scholar]
Howard, A.S.; Chu, M.G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; Le, Q.V. Searching for mobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Liu, Z.; Mao, H.Z.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11966–11976. [Google Scholar]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
Shao, S.Y.; McAleer, S.; Yan, R.Q.; Baldi, P. Highly-accurate machine fault diagnosis using deep transfer learning. IEEE Trans. Ind. Inform. 2019, 15, 2446–2455. [Google Scholar] [CrossRef]
Shafiq, M.; Gu, Z.Q. Deep residual learning for image recognition: A survey. Appl. Sci. 2022, 12, 8972. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
He, K.C.; Xu, Y.W.; Wang, Y.; Wang, J.H.; Xie, T.C. Intelligent diagnosis of rolling bearings fault based on multisignal fusion and MTF-ResNet. Sensors 2023, 23, 6281. [Google Scholar] [CrossRef]
Tong, Y.; Pang, X.Y.; Wei, Z.H. Fault diagnosis method of rolling bearing based on GADF-CNN. J. Vib. Shock 2021, 40, 247–253. [Google Scholar]
Zhao, M.H.; Zhong, S.S.; Fu, X.Y.; Tang, B.P.; Pecht, M. Deep residual shrinkage networks for fault diagnosis. IEEE Trans. Ind. Inform. 2020, 16, 4681–4690. [Google Scholar] [CrossRef]

Figure 1. Image coding process of the GADF vibration signal: (a) vibration signal; (b) polar coordinate domain representation; (c) GADF image generation.

Figure 2. The schematic diagram of DSC.

Figure 3. The multi-scale feature extraction module.

Figure 4. The lightweight attention module.

Figure 5. The improved depth residual attention module.

Figure 6. The architecture of CS-LAMRNet.

Figure 7. The flowchart of gear fault diagnosis system based on the CS-LAMRNet model.

Figure 8. The gear test system.

Figure 9. Gears with different fault types.

Figure 10. GADF images of different datasets. (a) GADF images of NEU dataset; (b) GADF images of SEU dataset.

Figure 11. Experimental setup of gear using SEU dataset [41].

Figure 12. The accuracy and loss function curves of CS-LAMRNet for the NEU dataset: (a) accuracy curves; (b) loss function curves.

Figure 13. The visualization results of CS-LAMRNet for the NEU dataset: (a) confusion matrix; (b) t-SNE visualization.

Figure 14. The accuracy and loss function curves of CS-LAMRNet for SEU dataset. (a) Accuracy curves; (b) loss function curves.

Figure 15. The visualization results of CS-LAMRNet for SEU dataset. (a) Confusion matrix; (b) t-SNE visualization.

Figure 16. The comparison results of different diagnostic methods for the SEU gear dataset with noise.

Figure 17. Confusion matrices for the gear damage classification of the SEU dataset with noise. (a) VGG11; (b) ResNet18; (c) MTF-ResNet; (d) GADF-CNN; (e) MobileNet V3; (f) DRSN-CW; (g) ConvNeXt-T; (h) CS-LAMRNet.

Figure 18. Visualization for the gear damage classification of the SEU dataset with noise. (a) VGG11; (b) ResNet18; (c) MTF-ResNet; (d) GADF-CNN; (e) MobileNet V3; (f) DRSN-CW; (g) ConvNeXt-T; (h) CS-LAMRNet.

Table 1. Parameters of the CS-LAMRNet model.

Network Layer	Convolutional Kernels	Stride	Output Layer
Input layer	−−	−−	3@224 × 224
MSFE Block	3 × 3/5 × 5/7 × 7	4	96@56 × 56
IDRA1-1 Block	3 × 3/1 × 1/1 × 1/3 × 3/3 × 3	1	96@56 × 56
Downsample	2 × 2	2	192@28 × 28
IDRA2-1Block	3 × 3/1 × 1/1 × 1/3 × 3/3 × 3	1	192@28 × 28
IDRA2-2Block	3 × 3/1 × 1/1 × 1/3 × 3/3 × 3	1	192@28 × 28
Downsample	2 × 2	2	384@14 × 14
IDRA3-1Block	3 × 3/1 × 1/1 × 1/3 × 3/3 × 3	1	384@14 × 14
IDRA3-2Block	3 × 3/1 × 1/1 × 1/3 × 3/3 × 3	1	384@14 × 14
Downsample	2 × 2	2	768@14 × 14
IDRA4-1Block	3 × 3/1 × 1/1 × 1/3 × 3/3 × 3	1	768@7 × 7
GAP	−−	−−	768@1 × 1
Fully connected layer	−−	−−	5

Table 2. The main parameters of gears.

Parameters	Module	Number of Teeth	Pressure Angle	Modification Coefficient
Driving gear	4.5 mm	16	20°	−0.5
Driven gear	4.5 mm	24	20°	0.8532

Table 3. Dataset status of experiments for five fault categories.

Label	Fault Type	Number (Training Samples/Test Samples)
0	MT	800/200
1	NO	800/200
2	CT	800/200
3	RF	800/200
4	SF	800/200

Table 4. Dataset status of the five fault categories.

Label	Fault Type	Number (Training Samples/Test Samples)
0	CT	800/200
1	NO	800/200
2	MT	800/200
3	RF	800/200
4	SF	800/200

Table 5. Dataset summary.

Dataset	NEU Dataset	SEU Dataset
Number of fault types	5	5
Signal length	1000	1000
Sampling frequency	20 kHz	20 Hz, 30 Hz
Rotation speed	1450 r/min	1200 r/min, 1800 r/min
Fault type	MT, NO, CT, RF, SF	MT, NO, CT, RF, SF
Samples (training/test)	800/200	800/200

Table 6. The comparison of different diagnostic method accuracies with the NEU dataset.

Model	Average Accuracy	Variance	Average F1-Score	Variance
VGG11	87.45%	1.5457	87.51%	1.5564
ResNet18	97.72%	0.0986	97.71%	0.0984
MTF-ResNet	96.91%	0.1042	96.91%	0.1040
GADF-CNN	96.90%	0.1027	96.90%	0.1025
MobileNet V3	98.96%	0.0725	99.00%	0.0723
DRSN-CW	99.30%	0.0043	99.30%	0.0043
ConvNeXt-T	99.18%	0.0065	99.25%	0.0063
CS-LAMRNet	99.58%	0.0056	99.83%	0.0051

Table 7. The comparison of accuracy of different diagnostic methods with the SEU dataset.

Model	Average Accuracy	Variance	Average F1-Score	Variance
VGG11	90.58%	0.8563	90.12%	0.8753
ResNet18	96.60%	0.1326	96.59%	0.1325
MTF-ResNet	97.80%	0.0876	97.79%	0.0876
GADF-CNN	97.70%	0.0976	97.69%	0.0974
MobileNet V3	96.50%	0.1464	96.51%	0.1463
DRSN-CW	99.60%	0.0045	99.60%	0.0045
ConvNeXt-T	99.10%	0.0064	99.10%	0.0063
CS-LAMRNet	100%	0	100%	0

Table 8. The comparative results of the ablation experiments.

Model	4 dB	6 dB	8 dB	10 dB	12 dB	14 dB
Method 1	85.12%	93.06%	95.24%	96.86%	97.23%	97.68%
Method 2	85.29%	91.08%	94.49%	96.23%	97.15%	97.97%
Method 3	92.36%	95.36%	96.45%	97.65%	98.68%	99.13%
Method 4	95.56%	98.24%	99.18%	99.78%	99.94%	100%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, S.; Yu, X.; Li, X.; Wang, Y.; Ji, K.; Ren, Z. Gearbox Fault Diagnosis Based on Compressed Sensing and Multi-Scale Residual Network with Lightweight Attention Mechanism. Mathematics 2025, 13, 1393. https://doi.org/10.3390/math13091393

AMA Style

Zhou S, Yu X, Li X, Wang Y, Ji K, Ren Z. Gearbox Fault Diagnosis Based on Compressed Sensing and Multi-Scale Residual Network with Lightweight Attention Mechanism. Mathematics. 2025; 13(9):1393. https://doi.org/10.3390/math13091393

Chicago/Turabian Style

Zhou, Shihua, Xinhai Yu, Xuan Li, Yue Wang, Kaibo Ji, and Zhaohui Ren. 2025. "Gearbox Fault Diagnosis Based on Compressed Sensing and Multi-Scale Residual Network with Lightweight Attention Mechanism" Mathematics 13, no. 9: 1393. https://doi.org/10.3390/math13091393

APA Style

Zhou, S., Yu, X., Li, X., Wang, Y., Ji, K., & Ren, Z. (2025). Gearbox Fault Diagnosis Based on Compressed Sensing and Multi-Scale Residual Network with Lightweight Attention Mechanism. Mathematics, 13(9), 1393. https://doi.org/10.3390/math13091393

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Gearbox Fault Diagnosis Based on Compressed Sensing and Multi-Scale Residual Network with Lightweight Attention Mechanism

Abstract

1. Introduction

2. Methodology

2.1. Compressed Sensing

2.2. Gramian Angular Difference Fields

2.3. Depthwise Separable Convolution

3. CS-LAMRNet

3.1. Multi-Scale Feature Extraction Module

3.2. Lightweight Attention Module

3.3. Improved Depth Residual Attention Module

3.4. Architecture of CS-LAMRNet

3.5. Flow Chart of the Proposed Fault Diagnosis Method

4. Experimental Results and Discussion

4.1. Dataset Description

4.2. Experimental Results and Visual Analysis

4.3. Comparative Experiments

4.4. Comparative Experiment in Noisy Environments

4.5. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI