Learn Then Adapt: A Novel Test-Time Adaptation Method for Cross-Domain Fault Diagnosis of Rolling Bearings

Li, Wei; Chen, Yan; Li, Jiazhu; Wen, Jiajin; Chen, Jian

doi:10.3390/electronics13193898

Open AccessArticle

Learn Then Adapt: A Novel Test-Time Adaptation Method for Cross-Domain Fault Diagnosis of Rolling Bearings

by

Wei Li

¹,

Yan Chen

²,

Jiazhu Li

¹,

Jiajin Wen

¹ and

Jian Chen

^1,*

¹

Institute of Sound and Vibration, Hefei University of Technology, Hefei 230009, China

²

School of Electronic and Electrical Engineering, Bengbu University, Bengbu 233030, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(19), 3898; https://doi.org/10.3390/electronics13193898

Submission received: 6 September 2024 / Revised: 22 September 2024 / Accepted: 25 September 2024 / Published: 2 October 2024

Download

Browse Figures

Versions Notes

Abstract

:

Cross-domain fault diagnosis enhances the generalization capability of diagnostic models across different operating conditions and machines. Current studies tackle the domain shift problem by adapting the model during training with data from the target domain or multiple source domains. However, a more realistic and less explored scenario is automatically adapting a trained (developed) model at test time (deployment period) using limited normal-condition data. To bridge this research gap, we propose a novel test-time adaptation framework to rapidly and effectively adapt the trained model, which only requires mini-batch test data (normal condition). Specifically, we first transform input signals to informative signal embedding and mitigate its noise with a reconstruction loss. Then, we decompose the signal embedding to the domain-related healthy component and the domain-invariant faulty component to better leverage the normal-condition data. Finally, we adapt the model by re-identifying the normal signals of the target domain during the test stage. Extensive experiments verify the effectiveness of our method, demonstrating performance improvements across public and private datasets.

Keywords:

deep learning; bearing fault diagnosis; cross-domain; test-time adaptation

1. Introduction

Fault diagnosis is crucial for ensuring the reliability and safety of mechanical equipment in modern industry. Rolling bearings are an essential mechanical component in rotating machinery, which can reduce friction between rotating or moving parts [1,2,3]. Traditional studies on rolling bearing diagnosis strongly rely on a simplified assumption; that is, the training (source) data and test (target) data are independently and identically distributed (IID). However, in industrial practice, the common scenario we encounter is out-of-distribution (OOD) generalization. As a result, the performance of diagnostic models tends to degrade when there is a domain shift between the training and testing environments, such as changes in operating conditions or variations in sensor data.

Recently, several methods have introduced transfer learning (TL), domain adaptation (DA), and domain generalization (DG) techniques to mitigate the impact of the discrepancies in data distribution [4]. Transfer learning and domain adaptation leverage labeled or unlabeled data from the target domain to guide the adaptation of a source–domain-trained model. This means that sufficient diagnostic data in the target domain are accessible for model adaptation. Nevertheless, this is not always feasible in practice, and it seriously impedes the large-scale deployment of models to new or unseen mechanical devices. To overcome this problem, some studies adopt domain generalization to train a model using data from single or multiple related source domains, enabling the model to generalize well to any OOD target domain. For example, ref. [3] constructed a causal graph to separate the domain-invariant and domain-related factors in bearing fault signals. Ref. [5] adopted knowledge distillation to train a student model that can extract invariant features across domains from the training data, enhancing the generalization ability in unseen data. However, DG methods do not impose assumptions on the target domain, lacking performance guarantees on unseen domains. This uncertainty makes it difficult to determine when and how well these methods will work in practice.

In this paper, we explore a more realistic industrial scenario for rolling bearing diagnosis known as Test-Time Adaptation (TTA), also referred to as Test-Time Training (TTT). In this scenario, only single or mini-batch test data are used for model tuning prior to inference [4,6]. Specifically, once the model has been trained (developed) in a specific domain, it is typically deployed on a new device in a different domain. Fortunately, we can obtain normal-condition data (i.e., normal vibration signals) from the target domain, which is beneficial for adapting the trained model, i.e., test-time adaptation. This scenario aligns well with industrial practices, and normal-condition data are accessible for model adaptation in many real-world applications.

In the context of rolling bearing diagnosis, we are the first to study the TTA scenario by only relying on normal-condition data to adapt the model. This study lies at the intersection of domain adaptation and domain generalization, facilitating efficient model usage across various devices and environments with minimal test data. The recent work [7] adapts the model test-time using unlabeled target data, including normal conditions and fault signals. In contrast, we only depend on normal-condition data from the target domain, demonstrating much higher practicality. However, how to effectively adapt the fault diagnosis model remains unexplored and faces two crucial challenges:

C1. How to use only the normal-condition data to adapt the model during the test stage. Rolling bearing fault data are often scarce. For new equipment, only normal-condition data are available, with no fault-condition data. This absence of fault information can negatively affect test-time adaptation, as the model struggles to extract helpful faulty information from only normal-condition data, making it difficult to make effective adjustments.

C2. How to avoid irrelevant noise affecting model inference. Diagnostic signals from rolling bearings, such as vibration signals, often contain noise that can degrade signal representation and destabilize the adaptation process, impairing diagnostic performance.

To address these challenges, we propose an efficient test-time adaptation method for roll bearing diagnosis that can rapidly adjust the model to the target domain using only minimal normal-condition data. Specifically, we train an encoder to learn informative signal embeddings and eliminate noise by reconstructing the input signals. Then, we devise a decoupler to separate the signal into the domain-related healthy component and the domain-invariant faulty component. Finally, we adapt the model to the target domain by re-identifying the normal signals in the test samples.

The main contributions of this paper are as follows:

We explore the test-time adaptation scenario by only relying on normal-condition data to adapt the model for the first time.
We propose a test-time adaptation framework including the TTA model and TTA strategy. This framework can effectively adjust the model with only normal-condition data by decomposing the input signals to the domain-related healthy component and the domain-invariant faulty component.
Extensive experiments on public and private datasets of roll bearings demonstrate significant performance improvements compared to various baseline methods. The satisfactory performance reveals that the TTA deployment strategy is feasible in industrial scenarios.

2. Related Works

2.1. Cross-Domain Fault Diagnosis for Roll Bearings

In the task of fault diagnosis, the working environment of mechanical equipment is complex and varied [8,9,10]. This usually leads to distribution shifts between the training distribution (source) and the test distribution (target), posing significant challenges for deep learning models deployed in the wild, such as diagnosis signals captured from different devices, operating conditions, and acquisition devices. Transfer learning aims to transfer knowledge learned from one (or more) problem/domain/task to another different but related one [6]. In TL, target data are required for model fine-tuning for new downstream tasks. Pretraining–fine-tuning is a commonly used strategy for TL, and target data are required for training. Domain adaptation and domain generalization aim to tackle the domain shift problem encountered in new test environments. DA needs to access the target data during model learning, while domain generalization considers the scenarios where target data are inaccessible during model learning. We summarize these related topics in Table 1. Test-time adaptation is on the boundary between DA and DG, and it supports online learning during testing [4]. Intuitively, it uses the information of test samples to adjust the model adaptively during testing to improve the prediction performance in test distribution.

Current cross-domain research studies in roll bearing diagnosis mainly focus on domain transfer learning [2,11,12,13], domain adaptation [10,14], and domain generalization methods [3,5,8]. However, test-time adaptation in roll bearing diagnosis is still unexplored. Zheng et al. [8] propose a deep domain generalization network for fault diagnosis (DDGFD) that adopts a generalization regularization term to learn domain-invariant features across multiple domains. Deep Causal Factorization Networks (DCFGs) [3] utilize a cross-machine bearing diagnosis method that eliminates the necessity of target domain data during training, thereby enhancing domain generalization ability. It utilizes causal inference techniques to disentangle fault representations as causal factors from domain-related representations as non-causal factors. Wen et al. [15] propose a clustering graph convolutional network for cross-domain fault diagnosis of bearings. In this framework, deep clustering and adversarial learning jointly facilitate the knowledge transfer between different bearings, thereby enhancing diagnostic accuracy in the presence of domain shifts. Dual Invariant Feature Domain Generalization (DIFDG) [5] is based on a knowledge distillation framework. In this approach, the original 1D signal undergoes a Fourier transform to extract phase information, which is then input into the teacher network. The teacher network utilizes this phase information to learn internally invariant features and guide the student network in categorizing the fault data.

Divide to Customize and Contrast (DtCC) [7] explores TTA for cross-domain machinery fault diagnosis for the first time. It divides each mini-batch of online unlabeled target data into certain-aware and uncertain-aware sets for subsequent fine-grained online model adaptation. The limitation of this framework lies in its requirement for target data to encompass a range of fault states, thereby diminishing its practicality in industrial contexts.

2.2. Test-Time Adaptation

Test-time adaptation (TTA) aims to adapt a pre-trained model from the source domain to the target domain before making predictions, which leverages a small amount of test data from the target domain. The most related topics are DA and DG, because they also deal with the domain shift problem [4]. On the one hand, TTA closely resembles source-free DA [16], as both assume that source data are inaccessible after model training. On the other hand, TTA differs from DA in that it uses only single or mini-batch test data [17] for model tuning, often conducted in an online manner and without human supervision.

Based on the cases of test data, TTA can be divided into three distinct categories: test-time batch adaptation (TTBA), test-time domain adaptation (TTDA), and online test-time adaptation (OTTA) [18]. TTBA individually adapts the pre-trained model for each mini-batch. In other words, the predictions generated for each mini-batch are not influenced by the predictions from other mini-batches. TTDA is a kind of source-free domain adaptation that utilizes m mini-batches for multi-epoch adaptation before generating predictions. OTTA adapts the pre-trained model in an online manner [19], where each mini-batch can only be observed once. The model can continuously learn knowledge from streaming data. Among the categories mentioned above, the one most closely related to our work is OTTA. However, in the context of fault diagnosis, only health condition data are available during the test stage, making it challenging to achieve the objective of TTA.

3. Methodology

In this section, we provide a detailed introduction of our method, containing a test-time training model and a test-time adaptation strategy. As shown in Figure 1, our proposed model consists of four modules: encoder, decoder, decoupler, and classifier. The encoder is designed to transform vibration signals in a latent space. Then, we use the decoder to reconstruct the signals, utilizing self-supervised learning to ensure that the encoder generates high-quality signal embeddings. To make a diagnosis, we adopt a decoupler to divide the original embeddings into healthy and faulty components, aiming to decompose domain-related baseline signals (like Fourier and wavelet transforms). We argue that fault signals act as disturbances to healthy signals, and the fault components of the same fault type (e.g., inner ring fault, outer ring fault) exhibit similarity across different datasets (domains).

Intuitively, the healthy component represents a normal (baseline) signal, while the faulty component is an additional increment over the healthy component. The complete signal is the sum of both the healthy and faulty components. By only adjusting the decoupled healthy component during testing, our model can quickly adapt to the target domain and effectively identify the bearing fault in other domains. Next, we will introduce the details of each module in our method.

3.1. Encoder

The encoder is designed to embed vibration signals

X

into a low-dimensional latent space

H

. Without loss of generality, we utilize a Multilayer Perceptron (MLP) as the encoder. As illustrated in Figure 2, the encoder consists of seven layers. In the first six layers, each layer is composed of a linear layer, a normalization layer, and an activation function, which can be formulated as follows:

h^{(l)} = R e L U (B N (W^{(l)} h^{(l - 1)} + b^{(l)})),

(1)

where

h^{(l)}

is the hidden representation of input signals at layer l, and

h^{(0)}

is the original input signal.

W^{(l)}

and

b^{(l)}

are the trainable weight matrix and bias vector at layer l.

The last (seventh) layer is a single linear layer used to map the representation to the hidden space and generate the low-dimensional embedding

H

of the input vibration signals. It will be used for subsequent signal reconstruction and fault diagnosis tasks.

3.2. Decoder

The decoder reconstructs the original input signal from the hidden representation using a symmetrical structure for the encoder. This process can be represented as follows:

{\hat{h}}^{(l)} = R e L U (B N (W^{(l)} {\hat{h}}^{(l - 1)} + b^{(l)})),

(2)

After seven layers, we can obtain the reconstructed signals

\hat{X}

from the hidden representation

H

. Then, we construct self-supervised signals for training, implemented via Mean Squared Error (MSE). Specifically, we calculate the MSE loss between the original vibration signals and the output of the decoder. The signal reconstruction process ensures that the signal embedding is informative and preserves the essential information for diagnosis. By optimizing the reconstruction loss, the encoder improves its ability to capture essential features, guiding the training process and preventing underfitting or overfitting. In the adaptation phase, this self-supervised loss can help the encoder learn the distributions of normal signals.

The self-supervised loss can be calculated as follows:

L_{M S E} = \frac{1}{N} \sum_{i = 1}^{N} | | X_{i} - {\hat{X}}_{i} {| |}_{2}^{2},

(3)

where

X_{i}

and

{\hat{X}}_{i}

are the i-th signal in a batch with N samples. Please note that we use a MLP as an example of an encoder and decoder, and other architectures (e.g., transformers, CNNs, etc.) can also be adopted as appropriate.

3.3. Decoupler

Roll bearing fault data are often scarce, and only normal-condition data are typically available for new equipment. To better adapt our model to the new dataset (target domain), we decouple the vibration signal into healthy and faulty components and adjust the model during the test stage to recognize the target-domain healthy component. Notably, this process only uses a limited number of normal condition samples on the test data.

In this work, we hold that the faulty component is an incremental disturbance to the healthy component and is domain-invariant due to the physical nature of the faults. Consequently, the fault components of normal-condition signals should converge to zero, and the fault components for the same types of faults should be similar. To achieve this goal, we propose a decoupler that decomposes the input signal into healthy and faulty components and then adopts a classifier to predict the fault probability for the healthy component and the sum of the healthy and faulty components. We force the classifier to recognize the healthy component as a normal condition while correctly identifying the fault types of the sum of healthy and faulty components. By doing this, our method can separate and capture the desired healthy and faulty components.

The architecture of the decoupler is illustrated in Figure 3, where dual MLPs are devised to process the original hidden representation separately into healthy and faulty components. The proposed decoupler can be formulated as follows:

H_{H C} = W_{2} (R e L U (B N (W_{1} H + b_{1}))) + b_{2};

(4)

H_{F C} = W_{2}^{'} (R e L U (B N (W_{1}^{'} H + b_{1}^{'}))) + b_{2}^{'},

(5)

where

H_{H C}

and

H_{F C}

are hidden representations of healthy and faulty components, and H is the hidden representation of the input vibration signal. During the training phase, we train the decoupler to divide the signal embedding into healthy and faulty components. For normal-condition signals, the faulty component decomposed by the decoupler should approach zero. For faulty signals, the decoupler will decompose it into corresponding healthy and faulty components. In the testing phase, when we adapt our model with normal-condition signals, the decoupler learns the distribution of healthy components. Therefore, when facing the tested faulty signal, the decoupler will adaptively extract the healthy component and the faulty component. This way, our decoupler can separate the two components from the embedded signal during testing.

3.4. Classifier

After obtaining healthy and faulty components of the input signal, we use a classifier to diagnose the input representation. The classifier must complete two tasks: identify the healthy component as normal signals and the sum of the healthy and faulty components (i.e., the original signal) as its corresponding fault category. The classifier maps hidden representations to the fault category space, which can be mathematically expressed as follows:

\hat{y} = R e L U (W_{c} H_{c} + b),

(6)

where

W_{C} \in R^{d \times n}

is the trainable weights, d is the hidden dimension, and n is the number of fault categories.

H_{c}

is the hidden representation of one component. Then, we calculate the cross-entropy loss between the predicted and the ground-truth fault category as follows:

L_{C L S} = - \frac{1}{N} \sum_{i = 1}^{N} y_{i} l o g {\hat{y}}_{i} .

(7)

When

H_{c} = H_{H C}

, the label

y_{i}

is set to 0, which is the same as normal signals, and the loss is called the discrepancy loss. When

H_{c} = H_{H C} + H_{F C}

, the label

y_{i}

is set to the fault category of the input

X_{i}

(i.e., 1 for inner-ring fault and 2 for outer-ring fault), and the loss is called the diagnosis loss.

3.5. Test-Time Adaptation Strategy

During the training stage, we have three optimization objectives for the model: (1) signal reconstruction loss, (2) discrepancy loss, and (3) diagnosis loss. Like other cross-domain methods, we train the model on source domain data. The overall loss function can be written as follows:

L = α L_{M S E} + L_{C L S} + {λ | | Θ | |}_{2}^{2} .

(8)

where

α

and

λ

are scaling factors. We denote the discrepancy loss and diagnosis loss as classification loss

L_{C L S}

.

Θ

is all the trainable parameters. We introduce L2 regularization by adding the squared sum of all trainable parameters to the loss function, preventing overfitting by discouraging large weights, which reduces model complexity. This leads to smoother, more generalized decision boundaries, improving the model’s ability to generalize to unseen data.

During the adaptation (test) stage, we leverage a small number of normal vibration signals to adjust the model, thanks to the decoupled healthy and faulty components. We optimize the same reconstruction loss and discrepancy loss mentioned above to adapt all modules to the target domain (test data). In this way, our method recognizes the domain-invariant faulty component after training and identifies the healthy component of the target domain by adapting the model.

For inference, we use an adjusted decoupler to separate the input into healthy and faulty components and then employ a classifier to accurately identify the fault type based on the sum of the healthy and faulty components.

4. Experiments

4.1. Datasets

In this paper, we adopt three publicly available datasets and a privacy dataset of roll bearing diagnosis to verify the effectiveness of our proposed method. We focus on the most common bearing faults for cross-domain adaptation: the normal condition (NF), inner race failure (IF), and outer race failure (OF).

Dataset A: Case Western Reserve University (CWRU) bearing dataset (https://engineering.case.edu/bearingdatacenter, (accessed on 1 August 2024)). The test stand is equipped with accelerometers to collect vibration data. The dataset covers a range of motor loads of 0 to 3 horsepower and motor speeds of 1797 to 1720 RPM. We use the data of drive end bearings, and the raw signals are collected at 12,000 samples/second. The number of samples in each type of bearing fault is 1200.
Dataset B: Mechanical Failures Prevention Group (MFPT) Society bearing dataset (https://www.mfpt.org/fault-data-sets/, (accessed on 2 August 2024)). This dataset includes baseline conditions, outer-ring fault conditions under various loads, and inner-ring fault conditions collected from bearing test benches. The normal-condition samples are collected under the same load of 270 lbs with a sampling frequency of 97,656 Hz, while the other samples are collected under three different loads (200 lbs, 250 lbs, 300 lbs) with a sampling frequency of 48,828 Hz. To unify the sampling frequency of the samples in this dataset, all the vibration signals are resampled to 12 kHz. The number of samples in each type of bearing fault is 600.
Dataset C: This private dataset is from Hefei University of Technology, China. As shown in Figure 4, the experimental data were collected from the aero-engine bearing test bench in the laboratory. They comprise a spindle testing machine, a refrigeration system, a hydraulic loading system, and a lubrication system. The test objects are single-row cylindrical roller bearings: the NU1010EM (with a detachable inner race) and the N1010EM (with a detachable outer race), both manufactured by NSK. To simulate varying degrees of damage, a laser-marking machine and a wire-cutting machine were used to process healthy bearings, creating single- and multi-point failures with damage dimensions of 9 mm (length) × 0.2 mm (width). During data acquisition, the axial load was set to 2 kN, the motor speed was set to 2000 rpm, and the sampling frequency was set to 20.48 kHz. The number of samples in each type of bearing fault is 2000.

We divide the dataset into training, validation, and test sets in an 8:1:1 ratio. We will terminate the training if the model’s performance on the validation set does not improve within five epochs to avoid overfitting and test its final performance on the test sets. For the TTA evaluation, we train and validate on the source domain and test on the target domain.

The type of input data and the normalization method have a significant impact on the performance of our methods and baselines. To explore the performance upper bounds of various methods, we attempted four different input types mentioned in [20]: time domain input, frequency domain input, time–frequency domain input, and wavelet domain input. Finally, we adopt the frequency domain input as the input type of all methods in our experiments with

[- 1, - 1]

normalization.

4.2. Comparison Methods

We compare our method with various competing methods, which can be divided into three categories: autoencoder, CNN, and RNN.

Autoencoder (AE) [21]: Autoencoders were first introduced in [22] as neural networks that are trained to reconstruct their input. Autoencoders learn two functions (i.e., an encoder and a decoder), which try to reconstruct the input from the output low-dimensional embeddings of an encoder.
Denoising autoencoder (DAE) [23]: Denoising autoencoders make the representations robust to input corruption. They can be seen as a regularization option and can be used for error correction. This architecture disrupts the input with Gaussian noise or erasures via dropout, and the autoencoders are expected to reconstruct clean and repaired input.
Sparse autoencoder (SAE) [24]: To deal with the bias–variance tradeoff, sparse autoencoders enforce sparsity on the hidden representations by using an MSE loss regularized with a sparsity constraint (e.g., KL divergence). Sparse autoencoders minimize the distance between the output of the decoder and the input to train the autoencoder.
Symmetric Wasserstein autoencoder (SWAE) [25]: SWAEs symmetrically align the joint distribution of the data with the latent representation in the autoencoder while incorporating the reconstruction loss into the cost function to balance these two representations.
AlexNet [26]: AlexNet is a deep convolutional neural network that was introduced in 2012. It achieved breakthrough performance in the ImageNet competition, significantly advancing image classification using multiple convolutional layers, ReLU activations, and GPU acceleration. AlexNet sets a new benchmark in computer vision and popularizes deep learning.
ResNet [27]: ResNet is a deep convolutional neural network that introduces residual learning to address the vanishing gradient problem in deep networks. By using shortcut connections, ResNet allows gradients to flow more easily through layers, enabling the training of networks with hundreds or even thousands of layers. This innovation significantly improved performance in image recognition tasks and set new benchmarks in the field. In this work, we use ResNet18 as a baseline.
BiLSTM [28]: Bidirectional Long Short-Term Memory (BiLSTM) networks enhance traditional LSTM models by processing sequences in both forward and backward directions, capturing context from both past and future data. This bidirectional approach improves performance in sequence tasks, such as natural language processing and time-series prediction, by providing a more complete understanding of the data.
c-GCN-MAL [15]: This is a novel deep clustering network using an autoencoder, named as clustering graph convolutional network with multiple adversarial learning (c-GCN-MAL) for cross-domain fault diagnosis, enhancing the GCNs’ clustering and transferring abilities for new datasets.
DCFN [3]: The deep causal factorization network (DCFN) is proposed for cross-machine bearing diagnosis. By leveraging the structural causal model, DCFN identifies the cross-machine generalized fault representations as causal factors and the domain-related representations as non-causal factors. We evaluate it on the single training dataset (treated as multiple source domains due to potential distribution differences).

4.3. Implementation Details

For a fair comparison, we ensure that the hidden dimensions are consistent when encoding the input signal. In other layers of the baseline networks, we referred to the original baseline paper for parameter adjustments as much as possible to achieve their optimal performances. We unify the dimensions of all input and output layers of the baseline to 64. All baselines share the same classifier architecture. We initialize all model parameters using the Xavier initializer [29], and these parameters are optimized using the Adam optimizer. The learning rate is initialized to

0.001

, and the coefficient of ℓ₂ is set to

10^{- 4}

. The batch size is 64, and we train all models in 100 epochs. Considering the later adaptation stage, the model requires a lower learning rate, so we adopt the exponential LR strategy to adjust the learning rate. For the t-th epoch, the learning rate can be calculated as follows:

η^{(t)} = η^{(t - 1)} \cdot γ^{t}

(9)

Other hyper-parameters are determined via a grid search on the validation set. All datasets are split into training, validation, and testing subsets with a ratio of 8:1:1. For test-time adaptation on various datasets, we unified the types of bearing faults into three categories: normal condition (NC), inner race failure (IF), and outer race failure (OF). To implement the test-time adaptation setting, we first train all baselines and our model on a source dataset (e.g., dataset A), then we use a limited number (1, 5, 10, 50, 100) of normal-condition samples in the target dataset (e.g., dataset B) to adapt the pre-trained model. Finally, we test the performance (diagnosis accuracy) of all baselines and our model on the target dataset (e.g., dataset B) to test the models’ effectiveness under test-time adaptation settings. We compare the performance of various models in detail to verify the superiority of our proposed method and strategy.

4.4. Performance Comparison

To ensure the reliability of the experimental results, we test each baseline five times and take the average value. The final average accuracy of all baselines and our method are reported in Table 2, from which we can find that our method outperforms all baselines of three categories, implying the value of our decoupling and training strategies under test-time adaptation settings.

Among all the autoencoder baselines, vanilla AE performs the worst because it only optimizes the reconstruction loss and classifies the bearing faults. It does not adopt any regularization terms to improve the generalization and robustness. Therefore, it cannot generalize or quickly adapt to new data distributions when facing new datasets. DAE and SAE use their regularization terms to improve the model’s generalization performance. Although they cannot rapidly adapt to new data distributions with a few samples, their inherent generalization makes their performance better than vanilla autoencoders under test-time adaptation settings. SWAE performs the best among all the autoencoder baselines, since it jointly optimizes the modeling losses in both the data and the latent spaces with the loss in the data space, leading to the denoising effect. The overall performance of the CNN and RNN baselines is better than the autoencoder thanks to their excellent feature extraction capabilities for sequential data.

Two baselines for domain adaptation and generation are the best due to their cross-domain abilities. The c-GCN-MAL adopts GCN and AE to learn more information about the data themselves and the structured relatedness of the source and target domains, combined with adversarial learning to enhance its domain adaptability. DCFN extracts domain-invariant factors from multiple domains to ensure performance in different domains.

Finally, our method outperforms all comparison methods regarding fault diagnosis accuracy in all test-time adaptation cases. We attribute these significant improvements mainly to decoupling the complete signals into healthy and faulty components. The proposed decoupler learns the normal data distribution on the target dataset with a small number of normal samples, and it enables the classifier to diagnose the correct fault type using similar faulty components.

The performance of all baselines is similar on three independent datasets, but our method still has certain advantages, as shown in Table 3. From a performance perspective, the classification task in the MFPT dataset is more difficult than the other two datasets. From Table 2, it can also be observed that when the target dataset under the test-time adaptation setting is MFPT (B), the performance of each model is worse than when the target dataset is CWRU (A). When the target dataset is C, the performance of each model is the worst. We believe that the private dataset has a more unique distribution than the public dataset, and the model experiences more difficulty when generalizing to this dataset.

5. Analysis Experiments

5.1. Impact of Parameter $α$

As shown in Figure 5, we analyze the training loss weight

α

for test performance. This hyperparameter controls the impact of reconstruction loss, varying from 0 to 1 with a step size of 0.1 on three datasets. The experimental results indicate that selecting an appropriate weight (ranging from 0.3 to 0.6) enhances diagnostic accuracy in TTA settings across all datasets. For instance, in the cases of A→B and A→C, the models achieve optimal results with an accuracy around 0.87 at

α

= 0.3 and 0.59 at

α

= 0.6, respectively. We attribute the performance improvements to high-quality signal embeddings. This is because the encoder–decoder paradigm reconstructs the signal in a self-supervised manner, which can generate informative and denoised signal embeddings.

5.2. Impact of Test Sample Size

We also analyze the impact of test sample size on accuracy, as shown in Figure 6. We vary the number of testing samples (1, 5, 10, 50, and 100) and observe the influence on the model’s adaptation performance across the three datasets. Each sub-figure represents a case of test-time adaptation, corresponding to different combinations of source (Dataset A, B, C) and target (Dataset A, B, C) domains.

Across all cases, there is a general trend of accuracy improvement as the number of testing samples increases. This trend is most noticeable when moving from a small sample size of 1 to 5. However, performance gains tend to saturate as the number of testing samples increases beyond a certain point (e.g., between 50 and 100 samples), suggesting diminishing returns with additional samples. Additionally, the extent of improvement varies depending on the similarity between the source and target domains. For domains that are more similar, fewer testing samples are required to achieve a high accuracy, while for more distant domains, a larger number of samples is necessary. Nonetheless, performance gains diminish after reaching a threshold, implying that there is an optimal number of testing samples beyond which additional data provide minimal benefits.

6. Discussion

In our experiments, we demonstrate that our test-time adaptation method and strategy outperform several fault diagnosis baselines. We attribute these significant improvements primarily to decoupling the complete signals into healthy and faulty components. The encoder and decoder leverage a self-supervised reconstruction task to enhance the encoding ability and generate more accurate hidden representations of the original signals. The proposed decoupler effectively captures the target data distribution solely using a small number of normal samples. By doing this, the classifier can accurately diagnose fault types by utilizing similar faulty components from different domains.

This study has certain limitations that should be addressed and expanded upon in future work. First, we simplified the feature extraction method and used the MLP architecture to encode the raw input signals in order to maintain generality. Alternative feature extraction methods, e.g., wavelet time scattering [30], LSTM, and Resnet, can be explored in the future. Second, we focused on decomposing signals into healthy and faulty components. However, the signal decomposition method can be further extended to capture more granular features or subcomponents, such as transient and stationary elements, to isolate more complex patterns of fault progression. This may enable the prediction of the remaining lifespan of bearings. Third, although our study primarily concentrated on vibration signals for bearing diagnosis, the proposed approach has broader applicability across different signal types and fault diagnosis contexts.

7. Conclusions

In this study, we introduced a novel test-time adaptation method for cross-domain fault diagnosis of rolling bearings. Our approach addresses the challenge of domain shift in industrial settings, where trained models often face new and varying conditions during deployment. By decoupling the signal into domain-related healthy components and domain-invariant faulty components, our method efficiently adapts to new environments using only limited normal-condition data from the target domain.

Extensive experiments on public and private datasets demonstrated the effectiveness of our proposed TTA framework in terms of cross-domain diagnosis accuracy. Specifically, we found (1) that the reconstruction loss can generate better signal embeddings when an appropriate weight

α

is applied; (2) by using normal-condition test samples within the range of 50 to 100, our method achieves notable performance improvements in the target domain. Our study not only provides a robust solution for the adaptation of fault diagnosis models at test time but also opens avenues for further research into more complex model architectures and adaptation strategies. Overall, the proposed TTA method offers a practical and scalable solution to enhance the generalization and reliability of fault diagnosis methods in realistic industrial environments.

Author Contributions

Formal analysis, W.L. and J.C.; investigation, Y.C.; methodology, W.L. and J.C.; project administration, J.L.; resources, J.W. and J.L.; data curation, W.L. and J.W.; supervision, J.L.; validation, W.L. and J.L.; writing—original draft preparation, W.L.; writing—review and editing, J.C. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the project ’On-board Fault Monitoring and Warning System for High-Speed Railway Operation Safety Based on Acoustic and Vibration Signal Feature Recognition’, supported by the Anhui Provincial Department of Science and Technology (No. 711285818079).

Data Availability Statement

This research employed both publicly available and private datasets for experimental studies. The private data used in this study are available from the corresponding author upon request due to privacy concerns.

Conflicts of Interest

The authors declare no conflict of interest.

References

Peng, B.; Bi, Y.; Xue, B.; Zhang, M.; Wan, S. A survey on fault diagnosis of rolling bearings. Algorithms 2022, 15, 347. [Google Scholar] [CrossRef]
Pei, X.; Su, S.; Jiang, L.; Chu, C.; Gong, L.; Yuan, Y. Research on rolling bearing fault diagnosis method based on generative adversarial and transfer learning. Processes 2022, 10, 1443. [Google Scholar] [CrossRef]
Jia, S.; Li, Y.; Wang, X.; Sun, D.; Deng, Z. Deep causal factorization network: A novel domain generalization method for cross-machine bearing fault diagnosis. Mech. Syst. Signal Process. 2023, 192, 110228. [Google Scholar] [CrossRef]
Wang, J.; Lan, C.; Liu, C.; Ouyang, Y.; Qin, T.; Lu, W.; Chen, Y.; Zeng, W.; Philip, S.Y. Generalizing to unseen domains: A survey on domain generalization. IEEE Trans. Knowl. Data Eng. 2022, 35, 8052–8072. [Google Scholar] [CrossRef]
Xie, Y.; Shi, J.; Gao, C.; Yang, G.; Zhao, Z.; Guan, G.; Chen, D. Rolling Bearing Fault Diagnosis Method Based On Dual Invariant Feature Domain Generalization. IEEE Trans. Instrum. Meas. 2024, 73, 3501311. [Google Scholar] [CrossRef]
Zhou, K.; Liu, Z.; Qiao, Y.; Xiang, T.; Loy, C.C. Domain generalization: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 4396–4415. [Google Scholar] [CrossRef]
Zhu, M.; Liu, J.; Hu, Z.; Liu, J.; Jiang, X.; Shi, T. Cloud-Edge Test-Time Adaptation for Cross-Domain Online Machinery Fault Diagnosis via Customized Contrastive Learning. Adv. Eng. Inform. 2024, 61, 102514. [Google Scholar] [CrossRef]
Zheng, H.; Yang, Y.; Yin, J.; Li, Y.; Wang, R.; Xu, M. Deep domain generalization combining a priori diagnosis knowledge toward cross-domain fault diagnosis of rolling bearing. IEEE Trans. Instrum. Meas. 2020, 70, 3501311. [Google Scholar] [CrossRef]
Liang, P.; Wang, W.; Yuan, X.; Liu, S.; Zhang, L.; Cheng, Y. Intelligent fault diagnosis of rolling bearing based on wavelet transform and improved ResNet under noisy labels and environment. Eng. Appl. Artif. Intell. 2022, 115, 105269. [Google Scholar] [CrossRef]
Zhang, Y.; Ji, J.; Ren, Z.; Ni, Q.; Gu, F.; Feng, K.; Yu, K.; Ge, J.; Lei, Z.; Liu, Z. Digital twin-driven partial domain adaptation network for intelligent fault diagnosis of rolling bearing. Reliab. Eng. Syst. Saf. 2023, 234, 109186. [Google Scholar] [CrossRef]
Che, C.; Wang, H.; Fu, Q.; Ni, X. Deep transfer learning for rolling bearing fault diagnosis under variable operating conditions. Adv. Mech. Eng. 2019, 11, 1687814019897212. [Google Scholar] [CrossRef]
Wang, Z.; He, X.; Yang, B.; Li, N. Subdomain adaptation transfer learning network for fault diagnosis of roller bearings. IEEE Trans. Ind. Electron. 2021, 69, 8430–8439. [Google Scholar] [CrossRef]
Li, X.; Jiang, H.; Xie, M.; Wang, T.; Wang, R.; Wu, Z. A reinforcement ensemble deep transfer learning network for rolling bearing fault diagnosis with multi-source domains. Adv. Eng. Inform. 2022, 51, 101480. [Google Scholar] [CrossRef]
Zhang, Y.; Ren, Z.; Zhou, S.; Feng, K.; Yu, K.; Liu, Z. Supervised contrastive learning-based domain adaptation network for intelligent unsupervised fault diagnosis of rolling bearing. IEEE/ASME Trans. Mechatronics 2022, 27, 5371–5380. [Google Scholar] [CrossRef]
Wen, H.; Guo, W.; Li, X. A novel deep clustering network using multi-representation autoencoder and adversarial learning for large cross-domain fault diagnosis of rolling bearings. Expert Syst. Appl. 2023, 225, 120066. [Google Scholar] [CrossRef]
Kundu, J.N.; Venkat, N.; Babu, R.V. Universal source-free domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, DC, USA, 14–19 June 2020; pp. 4544–4553. [Google Scholar]
Iwasawa, Y.; Matsuo, Y. Test-time classifier adjustment module for model-agnostic domain generalization. Adv. Neural Inf. Process. Syst. 2021, 34, 2427–2440. [Google Scholar]
Liang, J.; He, R.; Tan, T. A comprehensive survey on test-time adaptation under distribution shifts. Int. J. Comput. Vis. 2024, 1–34. [Google Scholar] [CrossRef]
Sun, Y.; Wang, X.; Liu, Z.; Miller, J.; Efros, A.; Hardt, M. Test-time training with self-supervision for generalization under distribution shifts. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 13–18 July 2020; pp. 9229–9248. [Google Scholar]
Zhao, Z.; Li, T.; Wu, J.; Sun, C.; Wang, S.; Yan, R.; Chen, X. Deep Learning Algorithms for Rotating Machinery Intelligent Diagnosis: An Open Source Benchmark Study. ISA Trans. 2020, 107, 224–255. [Google Scholar] [CrossRef]
Baldi, P. Autoencoders, unsupervised learning and deep architectures. In Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning Workshop, Bellevue, DC, USA, 28 June–2 July 2011; Volume 27, pp. 37–50. [Google Scholar]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations; MIT Press: Cambridge, MA, USA, 1986; pp. 318–362. [Google Scholar]
Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 1096–1103. [Google Scholar]
Ranzato, M.; Poultney, C.; Chopra, S.; LeCun, Y. Efficient learning of sparse representations with an energy-based model. In Proceedings of the 19th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 4–7 December 2006; pp. 1137–1144. [Google Scholar]
Sun, S.; Guo, H. Symmetric Wasserstein Autoencoders. In Proceedings of the UAI, Online, 27–30 July 2021. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2012, 60, 84–90. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 770–778. [Google Scholar]
Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 1–8. [Google Scholar]
Rezazadeh, N.; de Oliveira, M.; Perfetto, D.; De Luca, A.; Caputo, F. Classification of unbalanced and bowed rotors under uncertainty using wavelet time scattering, LSTM, and SVM. Appl. Sci. 2023, 13, 6861. [Google Scholar] [CrossRef]

Figure 1. The framework of our proposed method.

Figure 2. The architecture of the encoder, where raw vibration signals are encoded as signal embeddings.

Figure 3. The architecture of the decoupler, where the signal embedding is decoupled into healthy and faulty components.

Figure 4. The aero-engine bearing test bench of the private dataset in Hefei University of Technology: (a) the components of the test bench and (b) accelerometer measurement locations.

Figure 5. The impacts of various values of the

α

parameter in all cases.

Figure 5. The impacts of various values of the

α

parameter in all cases.

Figure 6. The impacts of various numbers of testing samples on adapting in all cases.

Table 1. Comparison between test-time adaptation and its related topics, where ✓ indicates “Yes” (available), and ✗ indicates “No” (not available).

Learning Paradigm	Access to Test Data	Online Learning
Transfer learning	✓	✗
Domain adaptation	✓	✗
Domain generalization	✗	✗
Test-time adaptation	✓ ^†	✓

^†: Limited in quantities, like a single example or mini-batch.

Table 2. Diagnosis accuracy (%) comparison with various baselines and three datasets under the test-time adaptation setting. The arrow → represents the adaptation from the source domain to the target domain.

Dataset	AE	DAE	SAE	SWAE	Alex-Net	Res-Net18	Bi-LSTM	c-GCN-MAL	DCFN	Ours
A→B	28.74	31.46	44.47	52.82	32.62	28.93	40.58	77.90	80.78	87.86 ± 0.6
A→C	33.33	33.60	32.43	38.13	33.87	35.06	34.18	41.33	47.81	58.39 ± 1.1
B→A	43.16	45.26	46.84	51.73	51.58	55.26	58.55	81.01	85.16	96.84 ± 0.9
B→C	33.87	35.48	34.87	39.49	34.76	44.12	45.74	52.12	58.07	61.02 ± 1.2
C→A	42.63	51.58	57.89	67.14	65.26	72.11	68.42	80.67	83.24	86.32 ± 0.8
C→B	32.23	30.68	34.37	38.33	44.66	35.53	45.63	55.74	57.33	60.10 ± 1.8

Table 3. The IID diagnosis accuracies (%) of baselines across three datasets.

Dataset	AE	DAE	SAE	SWAE	Alex-Net	Res-Net18	Bi-LSTM	c-GCN-MAL	DCFN	Ours
A	100	100	100	100	100	100	100	100	100	100
B	91.69	93.40	94.95	93.90	92.04	95.34	93.20	95.06	95.81	95.75
C	100	100	100	100	100	100	100	100	100	100

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, W.; Chen, Y.; Li, J.; Wen, J.; Chen, J. Learn Then Adapt: A Novel Test-Time Adaptation Method for Cross-Domain Fault Diagnosis of Rolling Bearings. Electronics 2024, 13, 3898. https://doi.org/10.3390/electronics13193898

AMA Style

Li W, Chen Y, Li J, Wen J, Chen J. Learn Then Adapt: A Novel Test-Time Adaptation Method for Cross-Domain Fault Diagnosis of Rolling Bearings. Electronics. 2024; 13(19):3898. https://doi.org/10.3390/electronics13193898

Chicago/Turabian Style

Li, Wei, Yan Chen, Jiazhu Li, Jiajin Wen, and Jian Chen. 2024. "Learn Then Adapt: A Novel Test-Time Adaptation Method for Cross-Domain Fault Diagnosis of Rolling Bearings" Electronics 13, no. 19: 3898. https://doi.org/10.3390/electronics13193898

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Learn Then Adapt: A Novel Test-Time Adaptation Method for Cross-Domain Fault Diagnosis of Rolling Bearings

Abstract

1. Introduction