Bearing Fault Diagnosis for Cross-Condition Scenarios Under Data Scarcity Based on Transformer Transfer Learning Network

Wu, Miaoling; Zhang, Jun; Xu, Peidong; Liang, Yingjie; Dai, Yuxin; Gao, Tianlu; Bai, Yuyang

doi:10.3390/electronics14030515

Open AccessArticle

Bearing Fault Diagnosis for Cross-Condition Scenarios Under Data Scarcity Based on Transformer Transfer Learning Network

by

Miaoling Wu

¹

,

Jun Zhang

^1,*,

Peidong Xu

¹,

Yingjie Liang

²,

Yuxin Dai

¹,

Tianlu Gao

¹ and

Yuyang Bai

¹

School of Electrical Engineering and Automation, Wuhan University, Wuhan 430072, China

²

National Key Laboratory of Electromagnetic Energy, Naval University of Engineering, Wuhan 430033, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(3), 515; https://doi.org/10.3390/electronics14030515

Submission received: 25 December 2024 / Revised: 19 January 2025 / Accepted: 26 January 2025 / Published: 27 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

Motor-bearing fault diagnosis is critical for industrial equipment reliability, yet traditional data-driven methods require extensive labeled data, which are often scarce in real-world applications. To address this challenge, we propose a Transformer transfer learning network (TTLN) for accurate fault diagnosis under cross-condition scenarios, particularly when target domain data are limited. First, we develop a Transformer-based fault diagnosis model that captures long-range dependencies in sequential data through self-attention, achieving high accuracy under single operating conditions. Second, we introduce the TTLN framework, which integrates domain adaptation to align marginal and conditional distributions, enabling robust cross-condition fault diagnosis with minimal target domain samples. Finally, we validated the proposed method on the CWRU and PU datasets, demonstrating the TTLN’s superior performance in data-scarce scenarios. For example, the TTLN achieved over 95% accuracy with only 100 target samples, outperforming traditional methods and fine-tuning-based approaches. The results underscore the TTLN’s effectiveness in cross-condition fault diagnosis, offering a practical solution for industrial applications with limited labeled data.

Keywords:

bearing fault diagnosis; multiple operating conditions; Transformer model; transfer learning; domain adaptation

1. Introduction

Fault diagnosis is a critical technology for the maintenance of industrial equipment, particularly in the context of smart manufacturing and Industry 4.0, where timely and accurate fault detection can substantially reduce downtime and maintenance costs [1]. Traditional fault diagnosis methods typically rely on large volumes of labeled data; however, in practical applications, obtaining sufficient fault data is often challenging, especially for specific fault conditions in critical equipment. This is due to the high cost and complexity of data acquisition, leading to the problem of sample scarcity.

In industrial environments, operating conditions such as speed, load, and temperature often differ significantly between source and target domains, leading to substantial distribution shifts in vibration signals. These differences make it challenging to collect sufficient labeled data for each specific condition, exacerbating sample scarcity. Traditional fault diagnosis methods, which rely on large amounts of labeled data for each condition, are often inadequate in such scenarios. This highlights the need for robust transfer learning approaches that can leverage data from one condition to improve diagnostic performance in another, even when labeled data are limited.

Conventional machine-learning-based fault diagnosis typically involves three key stages: data collection, feature extraction, and fault classification [2,3]. However, the manual nature of feature extraction often leads to the inclusion of irrelevant or redundant features, which can degrade the accuracy of the classification models [4]. Common classification algorithms include decision trees, support vector machines (SVM), and neural networks [5]. For example, Amarnath et al. [6] proposed a decision tree-based method for bearing fault diagnosis, while Konar and Chattopadhyay [7] applied SVM to analyze vibration signals from asynchronous motor bearings. Similarly, Tian et al. [8] integrated feature extraction with k-nearest neighbor (k-NN) distance analysis to achieve accurate motor bearing fault detection. Other notable approaches include the combination of empirical mode decomposition (EMD) energy entropy with neural networks for rolling bearing fault diagnosis [9], and the integration of particle swarm optimization (PSO) with hidden Markov models (HMM) for automatic bearing fault classification [10]. Despite their effectiveness, traditional machine learning methods often require labor-intensive manual feature selection, which becomes increasingly inefficient when dealing with large-scale data.

To reduce the cost of manual feature extraction, researchers have begun exploring deep learning methods, which can automatically learn features from data and efficiently process complex motor data for precise fault diagnosis [4,11]. Common deep learning models include convolutional neural networks, recurrent neural networks, restricted Boltzmann machines, autoencoders, and deep belief networks. Chen et al. [12] investigated a comparative diagnosis approach for motor faults using convolutional neural networks (CNN) and long short-term memory (LSTM) networks, assessing their effectiveness in fault diagnosis. Shao et al. [13] introduced a novel fault diagnosis method for electric locomotive bearings based on a convolutional deep belief network (CDBN). Chen et al. [14] proposed a data fusion model for bearing fault diagnosis, combining a sparse autoencoder with a deep belief network (DBN). The experimental results demonstrated the model’s high accuracy and robustness. Chu et al. [15] developed a bearing fault diagnosis method utilizing a Gaussian restricted Boltzmann machine (GRBM), which effectively captures latent information from fault signals, improving both accuracy and stability. Zhang et al. [16] presented a recurrent Kalman variational autoencoder (RKVAE) for monitoring complex dynamic processes, enhancing fault detection effectiveness. Shao et al. [17] introduced a novel feature learning approach based on a deep autoencoder for fault diagnosis in rotating machinery. However, the outstanding performance of deep learning typically hinges on large amounts of labeled training data, which are often scarce in real industrial environments, directly affecting the robustness and generalization ability of deep learning models.

Furthermore, deep learning models often assume that the training and testing data share the same distribution. However, in real-world applications, variations in operating conditions, noise, and fault severity frequently violate this assumption. This results in a challenge in maintaining model performance when data are scarce or conditions change [18]. Consequently, addressing the problem of sample scarcity under variable conditions has become a crucial research focus.

To overcome this challenge, transfer learning has emerged as a promising solution. Transfer learning enables the transfer of knowledge from a source domain with abundant data to a target domain with limited data. Transfer learning offers a promising solution to these challenges by enabling knowledge transfer from a source domain with abundant data to a target domain with limited data. By aligning the feature distributions between source and target conditions, transfer learning can mitigate the impact of distribution shifts caused by cross-condition differences, thereby improving diagnostic accuracy even when labeled data are scarce. The primary strategies in transfer learning include fine-tuning, statistical methods, and adversarial approaches [5].

Fine-tuning approaches utilize a diagnostic model from the source domain to transfer learned models or parameters to new target scenarios. In retraining the model for the target task, a relatively small learning rate is typically employed, instead of training from scratch. Chen et al. [19] utilized open-source datasets to pre-train deep learning models, followed by fine-tuning available data collected from real industrial scenarios for the target task. Zhao et al. [20] proposed a transfer learning framework based on a deep multi-scale convolutional neural network (CNN), optimizing the fault diagnosis model through dilated convolutions and global average pooling. The model achieved efficient transfer across different tasks via fine-tuning, enhancing its ability to detect rolling bearing faults under complex conditions. Shao et al. [21] explored the application of non-mechanical datasets, such as ImageNet, to pre-train transfer learning models. The top layers of the pre-trained model were replaced to match the number of target labels, and the model was fine-tuned using bearing fault data for fault diagnosis. In bearing fault diagnosis, it is common for the source and target domains to have different fault labels. To address this label inconsistency, Zhiyi et al. [22] applied a fine-tuning approach by replacing the output layer of the pre-trained model with a new output layer with the same dimensions as the target labels.

The basic idea behind statistics-based transfer learning is to learn domain-invariant representations by minimizing the distributional differences between the source and target domains. Guo et al. [23] proposed a neural-network-based transfer learning method for fault diagnosis, demonstrating that statistics-based transfer learning methods can improve classification accuracy and reduce the training time when only a small amount of target data are available. Zhao et al. [24] introduced a novel transfer learning method using bidirectional gated recurrent units (BiGRU) and manifold-embedded distribution, which proved effective in aligning a limited amount of labeled data. Li et al. [25] presented a two-stage knowledge transfer scheme to address knowledge transfer between different machines. Zhou et al. [26] suggested with the zhou2021deepduring domain adaptation that, in addition to measuring marginal distributions, conditional distributions should also be considered.

Adversarial-based methods, inspired by generative adversarial networks (GANs), aim to extract domain-invariant features by designing classifiers that learn from both source and target domain data [27,28]. Li et al. [29] proposed a deep-learning-based transfer learning method, where adversarial domain training is used to transfer diagnostic knowledge from the supervised data of multiple rotating machines to the target device, thus improving fault diagnosis performance in cases of insufficient training data. Han et al. [30] introduced a multi-domain discriminator to enhance domain-invariant feature extraction, further improving fault diagnosis performance.

While transfer learning exhibits significant potential in various scenarios, direct transfer across different operating conditions often encounters challenges, due to considerable differences in data distribution. Simply transferring a source condition model to a target condition may lead to a significant performance degradation or even negative transfer. Moreover, in cases of extreme sample scarcity, models may become prone to overfitting, hindering effective generalization. Thus, the ability to leverage rich data from source conditions for efficient transfer diagnosis under sample scarcity remains a crucial research topic in fault diagnosis.

To address these challenges, this paper proposes a novel transfer learning framework— the TTLN—which focuses on transferring fault diagnosis models between different operating conditions. This framework leverages the Transformer deep learning model, which has achieved remarkable success in natural language processing. Its core mechanism, a self-attention mechanism, effectively captures long-range dependencies between sequences, demonstrating outstanding performance in handling sequential data. The attention mechanism’s ability to focus on all input parts simultaneously is particularly advantageous for bearing fault diagnosis. Unlike CNNs, which rely on local receptive fields [31], or RNNs, which process data sequentially, the self-attention mechanism in Transformers can directly model long-range dependencies and capture global context for vibration signals. This is crucial for identifying fault-related patterns that may span across distant time steps or manifest in different frequency ranges. The framework employs statistical strategies for model transfer, while aligning both marginal and conditional distributions, ensuring that even with scarce samples in the target condition, the fault knowledge from the source condition can be fully utilized to maintain high diagnostic accuracy. This method not only mitigates the negative impacts of data distribution differences but also alleviates the limitations posed by sample scarcity in model training. The effectiveness of the proposed TTLN model is validated through bearing fault diagnosis examples, highlighting its advantages in scenarios with limited samples.

The main contributions of this study are as follows:

1. A novel TTLN framework is proposed for efficient transfer learning in bearing fault diagnosis, specifically addressing scenarios with limited samples.

2. Experimental validation demonstrated the effectiveness of the TTLN model in multiple bearing conditions, particularly in maintaining high diagnostic precision when samples of target conditions are scarce.

3. This research provides a new perspective on the management of cross-condition fault diagnosis and sample scarcity issues, with significant theoretical and practical application value.

The remainder of this paper is organized as follows. Section 2 introduces the design and implementation of the Transformer-based bearing fault diagnosis model. Section 3 presents the TTLN model, detailing its architecture, transfer strategies, and implementation details. Section 4 provides experimental results, demonstrating the efficacy of the TTLN model in bearing fault diagnosis. Finally, Section 5 concludes the paper and discusses potential future research directions.

2. Fault Diagnosis of Motor Bearings Under Single Operating Conditions Based on Transformer Models

In motor bearing fault diagnosis, traditional methods typically rely on manually designed features and machine learning models for classification, whereas Transformer models demonstrate significant advantages in processing time series data, due to their powerful attention mechanisms [32]. The core of the Transformer architecture lies in the self-attention mechanism [33], which allows simultaneous focus on all parts of the input sequence, effectively capturing long-range dependencies within the signal. Unlike traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs), Transformers do not require sequential processing, thus avoiding issues such as gradient vanishing with long time steps. Furthermore, Transformers do not depend on extensive feature engineering and can directly extract useful features from raw vibration signals through model learning.

Transformers, through their self-attention mechanism, can learn patterns directly from raw sequential data, eliminating the need for manual feature extraction. This stands in contrast to traditional methods, which often involve pre-processing steps such as signal decomposition or handcrafted feature extraction (e.g., frequency components or statistical features). The self-attention mechanism allows Transformers to automatically learn hierarchical representations, making them particularly effective for complex time-series data, such as vibration signals in fault diagnosis tasks. The TTLN algorithm integrates a Transformer-based fault diagnosis model with transfer learning techniques to address the challenges of data scarcity and domain shifts. It employs domain adaptation strategies, such as marginal and conditional distribution alignment, to ensure robust performance even when target condition data are limited.

Bearing fault diagnosis is fundamentally a multiclass classification problem. For a single operating condition from a bearing fault diagnosis task, let the given input dataset be

X = {x^{(1)}, x^{(2)}, . . ., x^{(n)}}

, where

x^{(i)}

denotes the ith sample, and let the corresponding label space be

Y = {y^{(1)}, y^{(2)}, . . ., y^{(n)}}

, where

y^{(i)}

represents the label associated with the ith sample.

The application of Transformer models in motor bearing fault diagnosis primarily involves the processing and feature extraction of vibration signals. A Transformer-based model for diagnosing motor bearing faults in a single operating condition is illustrated in Figure 1. This model first segments the vibration signal into multiple time-step input sequences and then converts each time-step signal into high-dimensional vectors that the model can process through an embedding layer. In this process, positional encoding is added to the input sequences to retain temporal information.

Subsequently, the vibration signals pass through multiple layers of self-attention, where each layer focuses on different time-step features according to varying weight distributions. This enables the model to effectively capture the characteristics of vibration signals under different fault modes. The self-attention mechanism enables the model to assign attention weights to all time steps in the input signal simultaneously, allowing it to focus on the most informative parts of the signal, while filtering out irrelevant noise. This capability is particularly important for bearing fault diagnosis, where fault characteristics (e.g., inner race faults, outer race faults) may exhibit complex patterns across different time scales and frequency ranges. For instance, the vibration characteristics of inner race and outer race faults are typically distributed across different frequency ranges, which the Transformer can capture simultaneously through its multi-head attention mechanism.

After processing through the self-attention mechanism, the signals are forwarded to a feedforward network (FFN), also known as a multilayer perceptron (MLP). The MLP consists of two linear layers and a GeLU activation function; the first layer expands the input dimensions, while the second layer reduces them back to the original dimensions. The primary role of the MLP is to further combine and refine the features extracted from the self-attention mechanism through nonlinear transformations, enhancing the model’s classification capability.

To accommodate the specific scenarios of motor bearing fault diagnosis, the model’s loss function employs cross-entropy loss combined with L2 regularization to prevent overfitting. Additionally, adjustments to the learning rate and batch size are made to optimize the model’s training process, ensuring that it can effectively learn vibration signal features under different fault modes.

In a single operating condition, the Transformer model achieves satisfactory fault diagnosis results. In this controlled environment, the patterns of vibration signals are relatively stable, making fault-type characteristics more apparent. The Transformer quickly and accurately captures key information in fault signals through its self-attention mechanism, thereby reducing the rate of misclassification.

However, in practical applications, variations in operating conditions (such as load, speed, and temperature) may lead to changes in the vibration signal patterns, which are not accounted for in a single operating condition model. Consequently, while the Transformer performs excellently under controlled conditions, its applicability remains limited. To address this issue, subsequent chapters will explore a transfer learning-based approach for multi-condition fault diagnosis, to enhance the model’s generalization capability and adapt to more complex industrial scenarios.

3. Fault Diagnosis of Motor Bearings Under Multiple Operating Conditions Based on Transfer Learning

Fault diagnosis of motor bearings under multiple operating conditions presents greater challenges, due to the significant variations in operating conditions, such as changes in speed, load, and temperature, in practical industrial applications. These factors cause alterations in the distribution and pattern of vibration signals across different operating conditions, making it difficult for models trained under a single condition to generalize effectively. Traditional machine learning and deep learning methods typically require large amounts of data across multiple conditions for training, but in many real-world scenarios, data collection is costly, and data from different conditions are often scarce.

To address this issue, this paper proposes a transfer-learning-based multi-condition fault diagnosis method, utilizing a Transformer model pre-trained under a single operating condition. The TTLN model is designed to adapt to new operating conditions.

In the context of transfer learning, given a feature space X and marginal distribution

P^{X}

, a domain can be expressed as

D = \{X, P^{X}\}

. Correspondingly, a task

T = \{Y, P^{Y | X}\}

consists of two components: a label space Y and a conditional probability distribution

P^{Y | X}

. The labeled source domain and the unlabeled target domain datasets are denoted as

D_{S} = \{(x_{s_{1}}, y_{s_{1}}), (x_{s_{2}}, y_{s_{2}}), \dots, (x_{s_{n_{s}}}, y_{s_{n_{s}}})\}

and

D_{T} = \{(x_{t_{1}}), (x_{t_{2}}), \dots, (x_{s_{n_{t}}})\}

, respectively. Additionally, the output labels for the source domain task and the target domain task must be consistent to ensure that domain knowledge can be transferred across samples between domains.

The core of transfer learning lies in transferring knowledge learned from the source condition to the target condition [18], improving the model’s generalization capability across different conditions. In this study, domain adaptation techniques and fine-tuning strategies were employed to ensure the high accuracy and robustness of the model under multiple operating conditions.

The key to statistics-based transfer strategies is domain adaptation (DA), which emphasizes reducing the gap between the target and source domains, while learning domain-invariant features to achieve distribution alignment and knowledge transfer between the two domains [34]. The domain adaptation process is implemented through the design of the transfer model’s loss function, aiming to reduce the distribution discrepancies between source and target domain data, including both marginal and conditional distribution discrepancies [35], thereby achieving domain adaptation. The goal of domain adaptation is to minimize the differences between the marginal and conditional distributions, while learning feature transformations [36].

In this study, we employ marginal and conditional distribution alignment strategies, specifically using maximum mean discrepancy (MMD) and correlation alignment (CORAL), to bridge the distribution gap between the source and target domains in the motor bearing fault diagnosis task. MMD and CORAL are relatively simple to implement and computationally efficient, making them ideal for real-world industrial applications where fast computation and ease of implementation are critical. While adversarial methods (e.g., GAN-based approaches) and manifold alignment techniques have been successful in domain adaptation tasks, they often require intricate training procedures, are computationally expensive, and may be more difficult to stabilize. For example, adversarial methods involve a generator and discriminator, which can be challenging to optimize, particularly when working with small datasets. Similarly, manifold alignment approaches can be highly sensitive to the choice of alignment functions and may not always scale well to high-dimensional data. In contrast, MMD and CORAL provide a more straightforward and computationally feasible alternative for aligning distributions, particularly in fault diagnosis scenarios, where the focus is on robustness and ease of application. While adversarial methods could potentially improve performance, they would also require careful tuning and may not provide significant benefits in the context of our fault diagnosis task, where the alignment of marginal and conditional distributions is sufficient.

(1) Marginal Distribution Alignment (MDA): In transfer learning, the emphasis is on the proximity of features between the source and target domain data. The aim is to adjust the marginal distributions in the feature space to make the marginal distributions of the source and target domains as similar as possible, thereby improving model performance in the target domain. In this section, MMD and CORAL are combined to create a new marginal distribution discrepancy (MDD) metric, as shown in Equation (1).

M D D (X_{S}, X_{T}) = M M D (X_{S}, X_{T}) + C O R A L (X_{S}, X_{T})

(1)

where

X_{S}

is the feature space of the source domain, and

X_{T}

is the feature space of the target domain.

MMD is the most commonly used distribution distance metric in transfer learning tasks for measuring marginal distribution alignment, defined as shown in Equation (2).

M M D (X_{S}, X_{T}) = {∥\frac{1}{n_{s}} \sum_{X_{S} \in D_{s}} ϕ (X_{S}) - \frac{1}{n_{t}} \sum_{X_{T} \in D_{T}} ϕ (X_{T})∥}_{H}^{2}

(2)

where

n_{s}

and

n_{t}

denote the batch sizes of the source and target domain samples, respectively;

{∥h∥}_{H}

represents the reproducing kernel Hilbert space (RKHS); and

ϕ (\cdot)

denotes the mapping in the Hilbert space, typically determined by a kernel function.

CORAL measures the covariance alignment by calculating the Frobenius norm difference between the covariance matrices of the source and target distributions, as shown in Equation (3).

\begin{matrix} C O R A L (X_{S}, X_{T}) & = \frac{1}{4 d^{2}} {∥C o v_{S} - C o v_{T}∥}_{F}^{2} \\ = \frac{1}{4 d^{2}} \sqrt{{\sum_{i} \sum_{j} (C o v_{S_{i}} - C o v_{T_{j}})}^{2}} \end{matrix}

(3)

(2) Conditional Distribution Alignment (CDA): This addresses class-level discrepancies between the source and target domains, aiming to resolve the mismatch in data distribution between the two domains, as shown in Equation (4).

\begin{matrix} C D D (X_{S}, X_{T}) = \sum_{c = 1}^{C} & (M M D {({X_{S}}^{(c)}, {X_{T}}^{(c)})}^{(c)} \\ + C O R A L {({X_{S}}^{(c)}, {X_{T}}^{(c)})}^{(c)}) \end{matrix}

(4)

where

{X_{S}}^{(c)}

and

{X_{T}}^{(c)}

are feature vectors in the source and target domain feature spaces belonging to class c.

M M D {(\cdot)}^{(c)}

and

C O R A L {(\cdot)}^{(c)}

represent the MMD and CORAL values for class c, and C is the total number of classes. CDA aligns the conditional distributions between the source and target domains within the same class during the training process, reducing domain discrepancies and enhancing the model’s generalization ability in the target domain.

(3) Integrated Loss Function: The optimization objective in transfer-learning-based multi-condition fault diagnosis models is the integrated loss function of the transfer learning model network, which consists of several components: marginal distribution discrepancy, conditional distribution discrepancy, classification loss for the source domain, and classification loss for the target domain [37]. The integrated loss function is a linear combination of these components, where the marginal distribution discrepancy and source domain classification loss are typically assigned higher weights, while the conditional distribution discrepancy and target domain classification loss are given lower weights, to better regulate the model’s performance and generalization, as shown in Equation (5).

\begin{matrix} Loss = ω_{1} L_{y} (X_{S}, Y_{S}) & + ω_{2} M M D (X_{S}, X_{T}) \\ + ω_{3} C D D (X_{S}, X_{T}) + ω_{4} L_{y} (X_{T}, Y_{T}) \end{matrix}

(5)

The integrated loss function is designed to balance multiple objectives, ensuring effective domain adaptation and robust fault diagnosis. Since the source domain contains a complete dataset and label space, it serves as the foundation for feature transfer, and thus the source domain classification loss (

ω_{1} L_{y} (X_{S}, Y_{S})

) is assigned a higher weight (

ω_{1}

) to prioritize learning robust features from the source domain. MDA, computed using MMD and CORAL, is the core of domain adaptation. The term

ω_{2} MMD (X_{S}, X_{T})

minimizes the discrepancy between the marginal distributions of the source and target domains, promoting feature proximity and enabling effective knowledge transfer. CDD, represented by

ω_{3} CDD (X_{S}, X_{T})

, ensures class-level discrimination by aligning the conditional distributions of the source and target domains within each class. This prevents misclassification due to overly small class distances and enhances the model’s ability to generalize across domains. In the target domain, pseudo-labeling techniques are employed to compute the classification loss (

ω_{4} L_{y} (X_{T}, Y_{T})

). However, due to potential errors in pseudo-labels compared to actual labels, the weight of this term (

ω_{4}

) is reduced to enhance the model’s robustness. By carefully balancing these components, the integrated loss function ensures that the model prioritizes domain adaptation, while maintaining high classification accuracy, even in scenarios with limited target domain data.

The complete TTLN model is shown in Figure 2. The transformer is used as the feature extractor in the TTLN model. After pre-training the model with source domain data, the TTLN model is fine-tuned with a small amount of data from the target condition. During fine-tuning, only a portion of the Transformer model’s parameters (the self-attention and MLP layers) are updated, while other layers remain unchanged. This approach reduces dependency on the target condition data and improves the model’s generalization capability, enabling it to perform well under new conditions. To prevent overfitting, appropriate regularization methods—L2 regularization and dropout—are applied during fine-tuning.

To optimize the performance of the TTLN model, we performed a grid search of key hyperparameters, including the number of epochs, learning rate, and batch size. The search space for each hyperparameter was determined based on preliminary experiments and prior knowledge. The optimal hyperparameters were selected based on validation accuracy and training stability.

Regarding the discussion of negative transfer, the TTLN indirectly reduces the risk of negative transfer through the following mechanisms. By simultaneously aligning both marginal and conditional distributions using MMD and CORAL, the TTLN ensures that the feature distributions between the source and target domains are well-matched, reducing the risk of a misalignment that could lead to negative transfer. Additionally, the TTLN dynamically adjusts the weights of marginal and conditional distribution alignment losses during training, prioritizing the alignment of more critical features and reducing the impact of irrelevant or misleading features. Furthermore, the Transformer-based feature extractor in the TTLN is inherently robust to distribution shifts due to its self-attention mechanism, which focuses on the most informative parts of the signal, while filtering out noise and irrelevant variations.

To illustrate the training and optimization workflow, Algorithm 1 presents the pseudocode for the TTLN model. This algorithm outlines the key steps, including model initialization; transfer learning strategies; and the iterative process of distribution alignment, loss computation, and parameter updates. The detailed procedure is as follows:

Algorithm 1 TTLN Model for Fault Diagnosis Transfer Learning

1:: Input:
2:: $D_{s}$ : Source domain dataset (features and labels)
3:: $D_{t}$ : Target domain dataset (features only)
4:: N: Number of training epochs
5:: $α$ : Learning rate
6:: $λ$ : Regularization parameter
7:: Output:
8:: Trained model for fault diagnosis on the target domain
9:
10:: Initialize source domain model $M_{s}$ (pre-trained)
11:: Initialize target domain model $M_{t}$ with random weights
12:: for each epoch = 1 to N do
13:: for each mini-batch $(x_{s}, y_{s})$ in $D_{s}$ do
14:: Forward pass: $M_{s} (x_{s}) \to y_{p r e d_{s}}$
15:: Compute loss $L_{s} = Loss (y_{s}, y_{p r e d_{s}})$
16:: Backward pass: Update $M_{s}$ using $L_{s}$ and $α$
17:: end for
18:: for each mini-batch $(x_{t})$ in $D_{t}$ do
19:: Forward pass: $M_{t} (x_{t}) \to y_{p r e d_{t}}$
20:: Apply statistical transfer strategy: Align $D_{s}$ and $D_{t}$ distributions
21:: Compute transfer loss $L_{t r a n s f e r}$ using cross-domain alignment
22:: Update $M_{t}$ with $L_{t r a n s f e r}$ and $α$
23:: end for
24:: end for
25:: Return: Trained model $M_{t}$ for target domain fault diagnosis

The pseudocode demonstrates the TTLN model’s iterative training process, which is divided into two distinct phases: source domain pre-training, and target domain fine-tuning. During each epoch, both the source and target domain data are processed. In the first phase, supervised learning is performed on the source domain to learn general fault diagnosis features. In the second phase, the pre-trained model is fine-tuned on the target domain data, with only a portion of the model’s parameters (e.g., self-attention and MLP layers) updated to adapt to the target condition. During fine-tuning, a statistical transfer strategy is applied to align the source and target domain distributions. This strategy involves aligning the marginal and conditional distributions between Ds and Dt using a combination of MMD and CORAL. MMD minimizes the distance between the overall feature distributions of Ds and Dt, while CORAL aligns the second-order statistics (covariance) of Ds and Dt within each class. These techniques are integrated into the loss function to ensure robust domain adaptation. By aligning the marginal and conditional distributions using MMD and CORAL, the model reduces the impact of domain shifts and improves the diagnostic accuracy in cross-condition scenarios, even with limited labeled data.

4. Experimental Study

This section validated the proposed model using the CWRU bearing dataset [38] and the PU bearing dataset [39] under various operating conditions, demonstrating the effectiveness of the proposed method.

4.1. Motor Bearing Fault Diagnosis Under Single Operating Condition

Under a single operating condition, the Transformer-based fault diagnosis model was trained on two bearing datasets, with specific details as follows:

(1) CWRU Dataset: The CWRU bearing dataset is a standard dataset widely used in the field of fault diagnosis, provided by the Department of Electrical Engineering and Computer Science at Case Western Reserve University. It was specifically designed for motor bearing fault research and has been extensively utilized to evaluate the performance of machine learning and deep learning models in fault diagnosis. The dataset primarily consists of vibration signals from a set of motor bearings, recorded using accelerometers under various operating conditions. During the experiments, the motor operated at different speeds and under different load conditions, generating vibration signals for both healthy and faulty bearings. The fault signals included nine types of data corresponding to different locations and diameters of faults. The tests simulated four operating conditions generated by running at four different loads: 0 hp, 1 hp, 2 hp, and 3 hp, with the accelerometer sampling frequency set at 12,000 Hz.

(2) PU Dataset: The PU dataset, provided by Christian Lessmeier and others, is aimed at data-driven bearing fault diagnosis. It includes artificially induced fault bearings, real fault bearings caused by accelerated life testing, and healthy bearings, all of which are of the 6203 deep groove ball bearing type. Data collection synchronizes high-frequency signals at 64 kHz, capturing motor current and vibration signals under four different speeds and loads, as shown in Table 1. The dataset contains data for 26 different bearing damage states and 6 healthy states, with high-frequency synchronous collection of motor current and vibration signals across the specified conditions.

For the CWRU dataset, each category consists of 1000 samples, totaling 10 categories and 10,000 samples overall. The original data were segmented using a sliding window technique to augment the fault samples, with overlapping portions between samples; each sample contained 1024 data points. The dataset was split into training, validation, and testing sets in a ratio of 7:2:1, and the model was trained, validated, and tested under the aforementioned four operating conditions.

For the PU dataset, each category has 5000 samples, resulting in 4 categories and a total of 20,000 samples. Unlike the CWRU dataset, the PU dataset has sufficient data length, so the sliding window technique was not used, and there was no overlap between samples; each sample also contained 1024 data points. The dataset was divided into training, validation, and testing sets in a ratio of 7:2:1, and the model was trained, validated, and tested across the four different operating conditions.

Figure 3 illustrates the model training process using the 0 hp condition data from the CWRU dataset. It can be seen that, after the 50th epoch, the loss rapidly decreased to 0, with the accuracy approaching 100.

Common models in the fault diagnosis field, such as CNN, LSTM, and SVM, were selected as comparison models, and experiments were conducted under various conditions for both the CWRU and PU datasets.

For each dataset, the main model and three comparison models were repeated ten times, with the performance of each model summarized in Table 2. In the four operating conditions, the Transformer-based fault diagnosis model achieved average accuracies of approximately 99.58%, 99.55%, 99.99%, and 99.99% on the CWRU test set, and 99.77%, 98.01%, 99.93%, and 99.79% on the PU test set. These results exceeded those of the CNN, LSTM, and SVM comparison models, with the accuracy deviations generally smaller than those of the other models, indicating that the Transformer-based single-condition fault diagnosis model was highly accurate and stable, approaching 100%. Notably, the CNN model displayed relatively higher accuracy and stability, leading to its selection as the comparison model for subsequent experiments.

4.2. Motor Bearing Fault Diagnosis Under Multiple Operating Conditions

In the context of multiple operating conditions, the TTLN model was utilized to transfer the data from Condition 1 for each dataset, validating the model’s capability to migrate from a single condition to other conditions and assessing its generalization ability under varying operating scenarios. Initially, data from Condition 1 were set as the source domain dataset and migrated to the target domain datasets of Conditions 2, 3, and 4.

To ensure the optimal performance of the TTLN model, we conducted a systematic hyperparameter search using a grid search approach, focusing on key parameters such as the number of epochs, learning rate, and batch size. After evaluating various combinations, we selected the optimal settings for the TTLN model: 300 epochs, a learning rate of 0.0005, and a batch size of 128. These settings were chosen based on validation performance, balancing training stability and convergence speed. For instance, a learning rate of 0.0005 was found to prevent overshooting, while ensuring efficient convergence, and a batch size of 128 provided a good balance between computational efficiency and gradient estimation accuracy.

We evaluated the impact of key hyperparameters on the performance of the TTLN model. For instance, increasing the number of epochs beyond 300 did not significantly improve the accuracy but increased the training time, while reducing the number of epochs below 300 led to underfitting. Similarly, a learning rate higher than 0.0005 caused training instability, while a lower learning rate slowed down the convergence. These findings guided our selection of optimal hyperparameters.

The TTLN model was employed to conduct migration training from Condition 1 to Condition 2, tracking the changes in the various losses (overall loss, domain adaptation loss, category alignment loss, source domain loss, and target domain loss) and classification accuracy throughout the training process, as illustrated in Figure 4.

From Figure 4, it is evident that, after a series of iterations, the losses of the TTLN model exhibited a convergence trend, while the testing accuracy for the target domain data consistently improved. After approximately 300 iterations, the testing accuracy stabilized around 99.4%.

To further analyze the model’s performance, we provide confusion matrices for the CWRU datasets (see Figure 5). The confusion matrices reveal that the model achieved high accuracy for most fault types, with only minor misclassifications observed in certain classes, such as ball faults. These misclassifications may be attributed to the similarity in vibration patterns between certain fault types under specific operating conditions.

To evaluate the impact of the weighting factors on model performance, we conducted experiments with different weight configurations. The results, as shown in Table 3, indicated that setting

ω_{1}

higher than

ω_{4}

led to a better generalization, as the model prioritized learning from the source domain, while reducing the influence of potentially noisy pseudo-labels in the target domain. Additionally, a balanced setting of

ω_{2}

and

ω_{3}

ensured effective domain adaptation. Specifically,

ω_{2}

(weight for marginal distribution alignment) played a central role in domain adaptation, as it minimized the discrepancy between the source and target domains using MMD and CORAL, promoting feature proximity and enabling effective knowledge transfer. On the other hand,

ω_{3}

(weight for conditional distribution alignment) served as an auxiliary term, ensuring class-level discrimination and preventing misclassification due to overly small class distances. This balanced approach allowed the model to achieve robust domain adaptation, while maintaining high classification accuracy.

To simulate a sample-scarce environment, the sample size for the target condition was reduced, and the designed TTLN model was utilized for transfer learning from Condition 1 to the other target conditions. Additionally, the direct training results of the Transformer model on the corresponding target conditions, as well as the results of a Transformer model pre-trained on the source domain data and fine-tuned on the target condition data, were compared with the transfer results of the TTLN model, to evaluate the effectiveness of the transfer learning strategy in data-scarce scenarios.

Figure 6 illustrates the change in accuracy of the different models during the sample size reduction process. The Transformer model, without transfer learning, maintained an accuracy above 90% when the dataset sample size exceeded 1000, but this was still lower than that of the TTLN model. When the sample size fell below 1000, the accuracy of the Transformer model rapidly dropped to below 50%. Similarly, the fine-tuned Transformer model (pre-trained on the source domain and fine-tuned on the target domain) showed an improved performance compared to the non-transfer learning Transformer model, achieving an accuracy of around 95% when the sample size exceeded 1000. However, its performance also degraded significantly when the sample size was reduced below 1000, dropping to around 65% accuracy. In contrast, the TTLN model demonstrated remarkable resilience, with only a marginal decline in performance, consistently achieving an accuracy above 95%, even with substantially reduced sample sizes.

This clearly demonstrates that the TTLN model showed strong adaptability in data-scarce situations, achieving a significantly higher accuracy than the directly trained Transformer model and the fine-tuned Transformer model. The Transformer model struggled to adequately learn the fault patterns in the target conditions due to an insufficient sample size, resulting in poor generalization ability. Therefore, the TTLN model effectively identified faults in the target conditions, while minimizing the reliance on large amounts of new condition data.

The training time for the TTLN model varied depending on the dataset and hardware configuration. For the CWRU dataset with 7000 samples, the model typically took approximately 15 min to complete 300 epochs using a single NVIDIA RTX 3090 GPU. To handle large-scale training data, we employed mini-batch gradient descent with a batch size of 64, which balanced computational efficiency and gradient estimation accuracy. Additionally, data augmentation techniques, such as sliding window segmentation, were used to increase the diversity of the training data, without significantly increasing the computational overhead [40].

5. Conclusions

In this study, we propose a transformer transfer learning network (TTLN) for motor bearing fault diagnosis, addressing key challenges such as limited labeled data and varying operating conditions. The main contributions of this work are as follows:

1. Methodological Contribution: We introduced the TTLN, a model that combines domain adaptation and transfer learning to enhance diagnostic accuracy across diverse operational conditions. By aligning both marginal and conditional distributions, the TTLN effectively adapts to both source and target domains.

2. Experimental Validation: Comprehensive experiments on the CWRU and PU datasets demonstrated the effectiveness of the TTLN, achieving an average accuracy of 99.58% and 99.77%, respectively. The TTLN outperformed baseline models, such as CNN and LSTM, particularly in scenarios with limited training data, maintaining over 95% accuracy when the sample sizes fell below 1000, while the baseline models exhibited significant performance degradation.

3. Practical Implications: The TTLN’s robustness in data-scarce environments and across multiple operating conditions underscores its potential for real-world applications, particularly in fault detection systems where obtaining labeled data are costly or impractical. This approach offers a reliable solution for enhancing the performance of diagnostic systems in dynamic and resource-constrained industrial settings.

While this study demonstrates the effectiveness of the TTLN in handling varying operating conditions and limited data, it is important to acknowledge that real-world vibration signals are often contaminated by noise [41,42], which can impact model performance. Noise in raw sensor data can lead to misclassifications, particularly in challenging conditions with scarce labeled samples. Future work will explore strategies to improve the TTLN’s robustness to noise, such as incorporating noise-resilient preprocessing techniques or noise-robust training approaches to further enhance the diagnostic accuracy in industrial applications.

Additionally, it is worth noting that while load variations are an important factor in industrial applications, they were not the primary focus of this study. Instead, our research addressed the broader challenge of cross-condition fault diagnosis, where operating conditions (including load, speed, and temperature) may vary significantly. The proposed TTLN model demonstrated robustness to these variations through domain adaptation strategies, achieving high diagnostic accuracy across different conditions. While this study focused on cross-condition fault diagnosis, the impact of specific factors such as load variations warrants further investigation in future work.

To further improve the computational efficiency, compressed sensing (CS) techniques, such as adaptive step size forward-backward pursuit (ASFBP), could be integrated into our framework. These techniques reduce the dimensionality of the input data while preserving critical fault-related features, thereby reducing training time and memory requirements. For example, ASFBP has been successfully applied in the acoustic emission (AE)-based health state assessment of high-speed train bearings, demonstrating its effectiveness in handling large-scale datasets [43,44,45,46]. Future work could explore the integration of ASFBP with the TTLN model to enhance its scalability and efficiency, particularly in scenarios with large-scale data and limited computational resources.

While the CWRU and PU datasets provided a solid foundation for evaluating the TTLN model, they may not fully capture the complexities and noise present in real industrial environments. In future work, we plan to test the TTLN model on more realistic datasets that incorporate noise, varying operational conditions, and real-life fault patterns, further validating its practical applicability in industrial settings.

While the proposed TTLN demonstrated superior performance in motor-bearing fault diagnosis, its deployment in real-world industrial environments must consider several practical constraints, such as memory usage, inference speed, and the potential need for hardware acceleration. Transformers typically demand more memory than traditional models like CNNs or RNNs, due to their self-attention mechanism, which computes pairwise interactions between all time steps. While this can be a limitation in resource-constrained environments, the use of modern hardware (e.g., GPUs) can mitigate this issue. Additionally, although Transformers may have a higher computational complexity, their parallelizable nature allows for faster inference on GPUs or TPUs, making them suitable for real-time fault diagnosis in industrial settings. However, deploying Transformers may require hardware acceleration, which is increasingly feasible given the growing availability of affordable GPU-based solutions.

In conclusion, this work advances both the theory of transfer learning and domain adaptation, while providing a practical framework for predictive maintenance in industrial systems. The exploration of noise resilience in future work will further strengthen the model’s applicability in real-world environments.

Author Contributions

Conceptualization, M.W.; methodology, J.Z.; software, J.Z.; validation, P.X. and Y.L.; formal analysis, P.X.; investigation, Y.L.; resources, Y.L.; data curation, Y.L.; writing—original draft preparation, M.W. and Y.D.; writing—review and editing, M.W. and P.X. visualization, Y.D.; supervision, T.G.; project administration, T.G.; funding acquisition, Y.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Key Laboratory of Electromagnetic Energy grant number 614221722020501.

Data Availability Statement

There were no new data created.

Conflicts of Interest

The authors declare no conflicts of interest.

References

He, X.; Wang, Z.; Liu, Y.; Zhou, D.H. Least-squares fault detection and diagnosis for networked sensing systems using a direct state estimation approach. IEEE Trans. Ind. Inform. 2013, 9, 1670–1679. [Google Scholar] [CrossRef]
Lei, Y.; Yang, B.; Jiang, X.; Jia, F.; Li, N.; Nandi, A.K. Applications of machine learning to machine fault diagnosis: A review and roadmap. Mech. Syst. Signal Process. 2020, 138, 106587. [Google Scholar] [CrossRef]
Zhao, L.; He, Y.; Dai, D.; Wang, X.; Bai, H.; Huang, W. A Novel Multi-Task Self-Supervised Transfer Learning Framework for Cross-Machine Rolling Bearing Fault Diagnosis. Electronics 2024, 13, 4622. [Google Scholar] [CrossRef]
Liu, X.Y.; He, D.L.; Guo, D.Q.; Guo, T.T. A Novel Method for Fault Migration Diagnosis of Rolling Bearings Based on MSCVIT Model. Electronics 2024, 13, 4726. [Google Scholar] [CrossRef]
Chen, X.; Yang, R.; Xue, Y.; Huang, M.; Ferrero, R.; Wang, Z. Deep transfer learning for bearing fault diagnosis: A systematic review since 2016. IEEE Trans. Instrum. Meas. 2023, 72, 3508221. [Google Scholar] [CrossRef]
Amarnath, M.; Sugumaran, V.; Kumar, H. Exploiting sound signals for fault diagnosis of bearings using decision tree. Measurement 2013, 46, 1250–1256. [Google Scholar] [CrossRef]
Konar, P.; Chattopadhyay, P. Bearing fault detection of induction motor using wavelet and Support Vector Machines (SVMs). Appl. Soft Comput. 2011, 11, 4203–4211. [Google Scholar] [CrossRef]
Tian, J.; Morillo, C.; Azarian, M.H.; Pecht, M. Motor bearing fault detection using spectral kurtosis-based feature extraction coupled with K-nearest neighbor distance analysis. IEEE Trans. Ind. Electron. 2015, 63, 1793–1803. [Google Scholar] [CrossRef]
Yang, Y.; Yu, D.; Cheng, J. A roller bearing fault diagnosis method based on EMD energy entropy and ANN. J. Sound Vib. 2006, 294, 269–277. [Google Scholar] [CrossRef]
Yuwono, M.; Qin, Y.; Zhou, J.; Guo, Y.; Celler, B.G.; Su, S.W. Automatic bearing fault diagnosis using particle swarm clustering and Hidden Markov Model. Eng. Appl. Artif. Intell. 2016, 47, 88–100. [Google Scholar] [CrossRef]
Xiong, Z.; Jiang, H.; Wang, D.; Wu, X.; Wu, K. Bearing Fault Diagnosis Method Based on Osprey–Cauchy–Sparrow Search Algorithm-Variational Mode Decomposition and Convolutional Neural Network-Bidirectional Long Short-Term Memory. Electronics 2024, 13, 4853. [Google Scholar] [CrossRef]
Chen, X.; Zhang, B.; Gao, D. Bearing fault diagnosis base on multi-scale CNN and LSTM model. J. Intell. Manuf. 2021, 32, 971–987. [Google Scholar] [CrossRef]
Shao, H.; Jiang, H.; Zhang, H.; Liang, T. Electric locomotive bearing fault diagnosis using a novel convolutional deep belief network. IEEE Trans. Ind. Electron. 2017, 65, 2727–2736. [Google Scholar] [CrossRef]
Chen, Z.; Li, W. Multisensor feature fusion for bearing fault diagnosis using sparse autoencoder and deep belief network. IEEE Trans. Instrum. Meas. 2017, 66, 1693–1702. [Google Scholar] [CrossRef]
Chu, J.; Wang, H.; Meng, H.; Jin, P.; Li, T. Restricted boltzmann machines with gaussian visible units guided by pairwise constraints. IEEE Trans. Cybern. 2018, 49, 4321–4334. [Google Scholar] [CrossRef]
Zhang, Z.; Zhu, J.; Zhang, S.; Gao, F. Process monitoring using recurrent Kalman variational auto-encoder for general complex dynamic processes. Eng. Appl. Artif. Intell. 2023, 123, 106424. [Google Scholar] [CrossRef]
Shao, H.; Jiang, H.; Zhao, H.; Wang, F. A novel deep autoencoder feature learning method for rotating machinery fault diagnosis. Mech. Syst. Signal Process. 2017, 95, 187–204. [Google Scholar] [CrossRef]
Han, T.; Liu, C.; Yang, W.; Jiang, D. Deep transfer network with joint distribution adaptation: A new intelligent fault diagnosis framework for industry application. ISA Trans. 2020, 97, 269–281. [Google Scholar] [CrossRef] [PubMed]
Chen, Z.; Gryllias, K.; Li, W. Intelligent fault diagnosis for rotary machinery using transferable convolutional neural network. IEEE Trans. Ind. Inform. 2019, 16, 339–349. [Google Scholar] [CrossRef]
Zhao, B.; Zhang, X.; Zhan, Z.; Pang, S. Deep multi-scale convolutional transfer learning network: A novel method for intelligent fault diagnosis of rolling bearings under variable working conditions and domains. Neurocomputing 2020, 407, 24–38. [Google Scholar] [CrossRef]
Shao, S.; McAleer, S.; Yan, R.; Baldi, P. Highly accurate machine fault diagnosis using deep transfer learning. IEEE Trans. Ind. Inform. 2018, 15, 2446–2455. [Google Scholar] [CrossRef]
Zhiyi, H.; Haidong, S.; Lin, J.; Junsheng, C.; Yu, Y. Transfer fault diagnosis of bearing installed in different machines using enhanced deep auto-encoder. Measurement 2020, 152, 107393. [Google Scholar] [CrossRef]
Guo, L.; Lei, Y.; Xing, S.; Yan, T.; Li, N. Deep convolutional transfer learning network: A new method for intelligent fault diagnosis of machines with unlabeled data. IEEE Trans. Ind. Electron. 2018, 66, 7316–7325. [Google Scholar] [CrossRef]
Zhao, K.; Jiang, H.; Wu, Z.; Lu, T. A novel transfer learning fault diagnosis method based on manifold embedded distribution alignment with a little labeled data. J. Intell. Manuf. 2022, 33, 151–165. [Google Scholar] [CrossRef]
Li, X.; Jia, X.D.; Zhang, W.; Ma, H.; Luo, Z.; Li, X. Intelligent cross-machine fault diagnosis approach with deep auto-encoder and domain adaptation. Neurocomputing 2020, 383, 235–247. [Google Scholar] [CrossRef]
Zhou, Y.; Dong, Y.; Zhou, H.; Tang, G. Deep dynamic adaptive transfer network for rolling bearing fault diagnosis with considering cross-machine instance. IEEE Trans. Instrum. Meas. 2021, 70, 3525211. [Google Scholar] [CrossRef]
Zhong, Z.; Zhang, Z.; Cui, Y.; Xie, X.; Hao, W. Failure Mechanism Information-Assisted Multi-Domain Adversarial Transfer Fault Diagnosis Model for Rolling Bearings under Variable Operating Conditions. Electronics 2024, 13, 2133. [Google Scholar] [CrossRef]
Xu, K.; Kong, X.; Wang, Q.; Yang, S.; Huang, N.; Wang, J. A bearing fault diagnosis method without fault data in new working condition combined dynamic model with deep learning. Adv. Eng. Inform. 2022, 54, 101795. [Google Scholar] [CrossRef]
Li, X.; Zhang, W.; Ding, Q.; Li, X. Diagnosing rotating machines with weakly supervised data using deep transfer learning. IEEE Trans. Ind. Inform. 2019, 16, 1688–1697. [Google Scholar] [CrossRef]
Han, T.; Liu, C.; Wu, R.; Jiang, D. Deep transfer learning with limited data for machinery fault diagnosis. Appl. Soft Comput. 2021, 103, 107150. [Google Scholar] [CrossRef]
Gong, L.; Pang, C.; Wang, G.; Shi, N. Lightweight Bearing Fault Diagnosis Method Based on Improved Residual Network. Electronics 2024, 13, 3749. [Google Scholar] [CrossRef]
Dosovitskiy, A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Vaswani, A. Attention is all you need. arXiv 2017. [Google Scholar] [CrossRef]
Qin, Y.; Qian, Q.; Wang, Y.; Zhou, J. Intermediate distribution alignment and its application into mechanical fault transfer diagnosis. IEEE Trans. Ind. Inform. 2022, 18, 7305–7315. [Google Scholar] [CrossRef]
Qian, Q.; Qin, Y.; Luo, J.; Wang, Y.; Wu, F. Deep discriminative transfer learning network for cross-machine fault diagnosis. Mech. Syst. Signal Process. 2023, 186, 109884. [Google Scholar] [CrossRef]
Pan, S.J.; Tsang, I.W.; Kwok, J.T.; Yang, Q. Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 2010, 22, 199–210. [Google Scholar] [CrossRef]
Wen, L.; Gao, L.; Li, X. A new deep transfer learning based on sparse auto-encoder for fault diagnosis. IEEE Trans. Syst. Man Cybern. Syst. 2017, 49, 136–144. [Google Scholar] [CrossRef]
Smith, W.A.; Randall, R.B. Rolling element bearing diagnostics using the Case Western Reserve University data: A benchmark study. Mech. Syst. Signal Process. 2015, 64, 100–131. [Google Scholar] [CrossRef]
Lessmeier, C.; Kimotho, J.K.; Zimmer, D.; Sextro, W. Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification. In Proceedings of the PHM Society European Conference, Bilbao, Spain, 5–8 July 2016; Volume 3. [Google Scholar]
Sun, J.; Liu, Z.; Wen, J.; Fu, R. Multiple hierarchical compression for deep neural network toward intelligent bearing fault diagnosis. Eng. Appl. Artif. Intell. 2022, 116, 105498. [Google Scholar] [CrossRef]
Lu, H.; Zhou, K.; He, L. Bearing Fault Vibration Signal Denoising Based on Adaptive Denoising Autoencoder. Electronics 2024, 13, 2403. [Google Scholar] [CrossRef]
Meng, L.; Xie, J.; Zhou, Z.; Chen, Y. Fault Diagnosis Model for Bearings under Multiple Operating Conditions Based on Feature Parameterization Weighting. Electronics 2024, 13, 2153. [Google Scholar] [CrossRef]
Han, D.; Qi, H.; Wang, S.; Hou, D.; Wang, C. Adaptive stepsize forward–backward pursuit and acoustic emission-based health state assessment of high-speed train bearings. Struct. Health Monit. 2024, 14759217241271036. [Google Scholar] [CrossRef]
Han, D.; Qi, H.; Wang, S.; Hou, D.; Kong, J.; Wang, C. Adaptive maximum generalized Gaussian cyclostationarity blind deconvolution for the early fault diagnosis of high-speed train bearings under non-Gaussian noise. Adv. Eng. Inform. 2024, 62, 102731. [Google Scholar] [CrossRef]
Qi, H.; Han, D.; Hou, D.; Wang, C. A novel acoustic emission sensor design and modeling method for monitoring the status of high-speed train bearings. Struct. Health Monit. 2023, 22, 3761–3784. [Google Scholar] [CrossRef]
Wang, C.; Qi, H.; Hou, D.; Han, D. Coupled vibration–acoustic emission model for high-speed train bearings with local defects. Appl. Acoust. 2024, 224, 110142. [Google Scholar] [CrossRef]

Figure 1. Fault diagnosis model under single operating conditions based on Transformer.

Figure 2. Framework of the TTLN model.

Figure 3. Operating status of motor bearing fault diagnosis model under single operating condition.

Figure 4. Training process of the TTLN model.

Figure 5. Confusion matrix of the TTLN model on the CWRU dataset.

Figure 6. Variation in accuracy with sample size under operating conditions 2, 3, and 4 for the CWRU dataset.

Table 1. Operating conditions of CWRU and PU datasets.

Bearing	Operating Condition	Speed (rpm)	Load	Load Torque
CWRU	1	1797	0 hp	/
	2	1772	1 hp	/
	3	1750	2 hp	/
	4	1730	3 hp	/
PU	1	1500	1 kN	0.7 Nm
	2	900	1 kN	0.7 Nm
	3	1500	1 kN	0.1 Nm
	4	1500	0.4 kN	0.7 Nm

Table 2. Experimental results.

Model	Dataset	OC1	OC2	OC3	OC4	Avg
Transformer	CWRU	99.88 ± 0.30	99.55 ± 0.33	99.99 ± 0.01	99.99 ± 0.01	99.58
Transformer	PU	99.77 ± 0.19	98.01 ± 0.85	99.93 ± 0.03	99.79 ± 0.26	99.58
CNN	CWRU	97.71 ± 1.32	98.23 ± 0.87	98.04 ± 1.49	97.53 ± 1.17	97.94
CNN	PU	98.36 ± 1.63	96.72 ± 0.28	97.86 ± 1.11	98.13 ± 0.74	97.94
LSTM	CWRU	96.31 ± 1.92	97.78 ± 1.03	97.17 ± 1.33	96.42 ± 0.58	96.86
LSTM	PU	97.51 ± 1.22	95.95 ± 1.04	96.55 ± 0.84	97.16 ± 0.84	96.86
SVM	CWRU	88.41 ± 4.52	91.23 ± 3.75	89.72 ± 5.92	92.51 ± 2.67	89.88
SVM	PU	85.87 ± 3.29	87.61 ± 2.73	92.12 ± 3.54	91.53 ± 2.28	89.88

Table 3. Model performance under different weight configurations of the integrated loss function.

Weight Configuration ( $ω_{1}$ , $ω_{2}$ , $ω_{3}$ , $ω_{4}$ )	Accuracy (CWRU)	Accuracy (PU)
(0.6, 0.3, 0.1, 0.05)	99.4%	98.7%
(0.5, 0.4, 0.1, 0.05)	98.9%	98.2%
(0.7, 0.2, 0.1, 0.05)	99.1%	98.5%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, M.; Zhang, J.; Xu, P.; Liang, Y.; Dai, Y.; Gao, T.; Bai, Y. Bearing Fault Diagnosis for Cross-Condition Scenarios Under Data Scarcity Based on Transformer Transfer Learning Network. Electronics 2025, 14, 515. https://doi.org/10.3390/electronics14030515

AMA Style

Wu M, Zhang J, Xu P, Liang Y, Dai Y, Gao T, Bai Y. Bearing Fault Diagnosis for Cross-Condition Scenarios Under Data Scarcity Based on Transformer Transfer Learning Network. Electronics. 2025; 14(3):515. https://doi.org/10.3390/electronics14030515

Chicago/Turabian Style

Wu, Miaoling, Jun Zhang, Peidong Xu, Yingjie Liang, Yuxin Dai, Tianlu Gao, and Yuyang Bai. 2025. "Bearing Fault Diagnosis for Cross-Condition Scenarios Under Data Scarcity Based on Transformer Transfer Learning Network" Electronics 14, no. 3: 515. https://doi.org/10.3390/electronics14030515

APA Style

Wu, M., Zhang, J., Xu, P., Liang, Y., Dai, Y., Gao, T., & Bai, Y. (2025). Bearing Fault Diagnosis for Cross-Condition Scenarios Under Data Scarcity Based on Transformer Transfer Learning Network. Electronics, 14(3), 515. https://doi.org/10.3390/electronics14030515

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bearing Fault Diagnosis for Cross-Condition Scenarios Under Data Scarcity Based on Transformer Transfer Learning Network

Abstract

1. Introduction

2. Fault Diagnosis of Motor Bearings Under Single Operating Conditions Based on Transformer Models

3. Fault Diagnosis of Motor Bearings Under Multiple Operating Conditions Based on Transfer Learning

4. Experimental Study

4.1. Motor Bearing Fault Diagnosis Under Single Operating Condition

4.2. Motor Bearing Fault Diagnosis Under Multiple Operating Conditions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI