Residual Adversarial Subdomain Adaptation Network Based on Wasserstein Metrics for Intelligent Fault Diagnosis of Bearings

Cai, Haichao; Yang, Bo; Xue, Yujun; Xu, Yanwei

doi:10.3390/app14199057

Open AccessArticle

Residual Adversarial Subdomain Adaptation Network Based on Wasserstein Metrics for Intelligent Fault Diagnosis of Bearings

¹

School of Mechanical and Electrical Engineering, Henan University of Science and Technology, Luoyang 471003, China

²

LongMen Laboratory, Luoyang 471003, China

³

Collaborative Innovation Center of High-End Bearing, Luoyang 471000, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2024, 14(19), 9057; https://doi.org/10.3390/app14199057 (registering DOI)

Submission received: 19 August 2024 / Revised: 27 September 2024 / Accepted: 4 October 2024 / Published: 7 October 2024

Download

Browse Figures

Versions Notes

Abstract

:

Subdomain adaptation plays a significant role in the field of bearing fault diagnosis. It effectively aligns the pertinent distributions across subdomains and addresses the frequent issue of lacking local category information in domain adaptation. Nonetheless, this approach overlooks the quantitative discrepancies in distribution between samples from the source and target domains, leading to the vanishing gradient issue during the training of models. To tackle this challenge, this paper proposes a bearing fault diagnosis method based on Wasserstein metric residual adversarial subdomain adaptation. The Wasserstein metric is introduced as the optimized objective function of the domain discriminator in RASAN-W. The distribution discrepancy between the source domain and target domain samples is quantitatively measured, achieving the alignment of the relevant subdomain distributions between the source domain and the target domain. Ultimately, extensive experiments conducted on two real-world datasets reveal that the diagnostic accuracy of this method is significantly enhanced when compared to various leading bearing fault diagnosis techniques.

Keywords:

domain adaptation; fault diagnosis; Wasserstein metrics; deep learning

1. Introduction

High-speed trains usually operate at high speeds and high loads. Ensuring the safety of passengers’ lives and property is the top priority for high-speed train operations. The key to this effort is to ensure the stable operation of high-speed train running section bearings. Since high-speed trains operate under changing working conditions, the bearings will be frequently affected by actual working conditions such as variable loads and variable speeds, resulting in damage to the stability of the bearings. Therefore, it is of great significance to conduct high-efficiency and high-accuracy fault diagnosis of high-speed rail bearings.

The intelligent diagnosis methodology of deep learning (DL) has become an important alternative to traditional fault diagnosis methodologies due to its advantages such as high efficiency, high accuracy, and low professional expertise requirements [1,2,3]. However, intelligent diagnosis methods based on deep learning rely on a large number of accurately labeled datasets, and the prerequisites of independent and identical distribution also need to be satisfied by both training and testing samples [4]. In fact, damaged high-speed rail bearings are not allowed to operate under real working conditions [5]. There are obvious distribution discrepancies between the characteristics of a large number of laboratory bearing failure samples currently available and the characteristics of a small number of samples under actual operating conditions, as shown in Figure 1. A large number of experiments have shown that, when the transfer task is performed between laboratory bearing failure samples and actual operating samples, the DL method cannot achieve satisfactory accuracy [6].

Defining target domain loss through source domain loss is the basic idea of domain adaptation. The key to domain adaptation is to reduce distribution discrepancies between domains. This challenge is also called unsupervised domain adaptation (UDA) when the source domain is rich in labeled instances, contrasted with a paucity of unlabeled instances in the target domain [7]. The field of unsupervised domain adaptation mainly includes the following research directions: Statistics-based domain adaptation approaches focus on aligning data distributions and enhancing knowledge transfer between source and target domains. These methods include Max Mean Discrepancy (MMD) [8], Correlation Alignment (CORAL) [9], Wasserstein distance [10], etc. Long et al. introduced a Deep Adaptive Network (DAN) that employs Multi-Kernel Maximum Mean Discrepancy (MK-MMD) to align the features of samples from the source and target domains across various layers in the Reproducing Kernel Hilbert Space (RKHS) [11]. This approach takes into account the global distribution of the features but may overlook the fine-grained information within the feature data, neglecting certain critical details in the distribution of the sample features. Zhu et al. presented the Deep Subdomain Adaptation Network (DSAN) [12], which utilizes the local maximum mean discrepancy (LMMD) to evaluate and address distribution discrepancies within the domain. By adjusting the feature extractor’s parameters, subdomain-specific distributions can be aligned across different domains.

Meanwhile, adversarial domain adaptation strategies implement a domain discriminator to foster confusion between domains, effectively minimizing relevant subdomain discrepancies. In the Domain Adversarial Neural Network (DANN), a domain discriminator is introduced to blur the boundaries between domain distinctions, while a gradient reversal layer (GRL) is positioned between the feature extraction and the domain discriminator. This setup is designed to enable the feature extractor to learn domain-invariant features during the backpropagation process [13]. The Deep Adversarial Subdomain Adaptation Network (DASAN) method was proposed [14], and the distribution of relevant subdomains is aligned by minimizing the LMMD of the same category between domains, as shown in Figure 2. Reconstruction-based DA methods typically use shared encoders to capture domain-invariant representations while minimizing reconstruction losses in source and target domains to maintain domain-specific information [15]. Cai proposed a Sparse Stacked Denoising Autoencoder (SSDAE), which extracts features from target domain samples by directly inheriting the mapping learned on source domain samples, using only a small number of target domain data pairs [16]. The mapping is fine-tuned to achieve good diagnostic results. However, in the domain adaptation problem discussed above, the primary research focus lies in optimizing the alignment of subdomain distributions among relevant categories. Surprisingly, the quantitative relationship between the sample distributions of the source and target domains in the domain discriminator is often overlooked. In fact, this relationship plays an extremely critical role in domain adaptation.

Addressing the aforementioned challenges, this article proposes a Residual Adversarial Subdomain Adaptation Network based on the Wasserstein metric (RASAN-W). More precisely, this article introduces the Wasserstein metric into subdomain adaptation, facilitating a more accurate quantification of distribution dissimilarity between domains. Moreover, it captures fine-grained domain information and effectively aligns subdomain distributions within the same category. It achieves global domain distribution alignment and empowers this approach to dynamically adjust to diverse data distributions and task demands. The main contributions of our work are as follows:

A deep transfer neural network method based on residual subdomain adversarial adaptation of the Wasserstein metric is proposed and used for bearing fault detection.
The domain-invariant distribution features between relevant subdomains are aligned through LMMD, and the Wasserstein metric is introduced as the optimization target of RASAN-W, which effectively quantifies the discrepancy between the sample distributions of the source domain and the target domain.
A large number of experiments and ablation studies were conducted on multiple bearing fault datasets, and comparative analysis with several mainstream methods verified the effectiveness and superior generalization ability of this method.

2. Proposed Method

2.1. Define

In order to facilitate better reader understanding, this section gives the definition of unsupervised domain adaptation as follows: we are given a source domain

D_{s} = {(x_{1}^{s}, y_{1}^{s}), \dots, (x_{n_{s}}^{s}, y_{n_{s}}^{s}),}

, containing

n_{s}

labeled samples (

y_{i}^{s} \in R^{c}

is the one-hot vector of the label

x_{i}^{s}

and C is the fault in the dataset number of categories), and a target domain

D_{t} = {x_{1}^{t}, \dots, x_{n_{t}}^{t}}

, containing

n_{t}

unlabeled samples

x_{i}^{t}

, where

D_{s}

and

D_{t}

obey the marginal probability distribution and P and Q, respectively. We assume that

D_{s}

and

D_{t}

share the feature and label space

(X^{s} = X^{t}, Y^{s} = Y^{t})

and the marginal distribution

(P (x^{s}) \neq Q (x^{t}))

and the conditional distribution

(P (y^{s} ∣ x^{s}) \neq Q (y^{t} ∣ x^{t}))

are different between

D_{s}

and

D_{t}

. The goal of unsupervised domain adaptation in this article is to learn a mapping

(P (y^{s} ∣ x^{s}) \neq Q (y^{t} ∣ x^{t}))

, which aligns the discrepancies in the distributions in different domains, so that the target loss

L^{t} (f) = E_{(x^{t}, y^{t}) \sim Q} [f (x^{t}) \neq y^{t}]

can be bounded by the combined error of the source loss

L^{s} (f) = E_{(x^{s}, y^{s}) \sim P} [f (x^{s}) \neq y^{s}]

, distribution discrepancies

L (P, Q)

, and ideal assumptions.

The end-to-end architecture proposed in this article for RASAN-W is depicted in Figure 3. Subsequent sections will offer a thorough analysis of every component of the methodology. Moreover, the mathematical theory underlying each step of the methodology will be elucidated.

2.2. Feature Extractor

The Residual Convolutional Neural Network (ResNet) effectively addresses the issue inherent in traditional Convolutional Neural Networks (CNNs), where an overly deep convolutional architecture may exacerbate gradient vanishing or explosion during the deep learning process [17]. In this study, ResNet-50 is deployed as the feature extractor within the RASAN-W framework, demonstrating robust capabilities in extracting fault features from bearing fault samples across both source and target domains. The foundational element of the ResNet-50 model described in this study is the residual block. Within each residual block lies a succession of convolutional layers, a batch normalization layer (BNL) [18], a global average pooling layer (GAP) [19], and the Rectified Linear Unit (ReLU) [20] activation function. The architecture of ResNet-50 includes multiple

3 \times 3

convolutional layers and several residual blocks, and finally outputs feature information through a global average pooling layer. Specifically, assuming that there is an input sample

x_{i}

, the corresponding feature extractor output feature is

G_{f} (x_{i}; θ_{f})

, where

G_{f}

represents the feature extractor and

θ_{f}

is a differentiable parameter in the feature extractor.

2.3. Label Classifier Based on Local Minimum Mean Discrepancy

In the context of unsupervised domain adaptation, the absence of labels for target domain samples makes it impossible to access label information. Therefore, the first step is to leverage the labeled samples from the source domain to train the label classifier within the RASAN-W framework. This label classifier consists of two fully connected layers, employing the ReLU activation function in the first layer and the softmax activation function in the second layer. At the same time, the optimization objective of the label classifier includes the cross-entropy loss function, the LMMD loss function, and the pseudo-label loss function.

2.3.1. Cross-Entropy Loss Function

The function of cross-entropy loss in the label classifier is to measure the discrepancy between the predicted label and the real label and to optimize the differentiable parameters in the label classifier by minimizing the discrepancy to ensure classification accuracy in the bearing fault diagnosis process. The cross-entropy loss function is expressed as follows:

\begin{matrix} L_{cls} (θ_{f}, θ_{y}) & = - \frac{1}{M} \sum^{M} J_{y} ({\hat{y}}_{i}^{s}, y_{i}^{s}) \\ = - \frac{1}{M} \sum^{M} J_{y} (G_{y} (G_{f} (x_{i}^{s}; θ_{f}); θ_{y}); y_{i}^{s}) \\ = - \frac{1}{M} \sum_{i = 1}^{M} \sum_{c = 1}^{C} I [y_{i}^{s} = c] log (G_{y} (G_{f} (x_{i}^{s}; θ_{f}); θ_{y})) \end{matrix}

(1)

where

G_{f} (x_{i}; θ_{f})

is the output features of the feature extractor for input source domain samples

x_{i}^{s}

,

θ_{f}

is the differentiable parameter in the feature extractor, M is the size of the input sample number for each training batch,

G_{y}

is the label classifier with differentiable parameters

θ_{c}

, and

y_{i}^{s}

is the label of the input source domain sample

x_{i}^{s}

.

2.3.2. LMMD Loss Function

In traditional domain adaptation approaches, a primary consideration is employing the MMD or JMMD loss functions to synchronize either the global distribution across different domains [21,22]. However, the alignment of category distributions between these domains is often overlooked, which leads to the loss of fine-grained sample information and subsequently results in reduced accuracy for bearing fault diagnosis methods. To tackle this issue, this study incorporates LMMD as one of the loss functions for the label classifier. LMMD gives precedence to aligning the distributions of the same categories in both the source and target domains [11]. The formula can be represented as follows:

\begin{matrix} L_{l m m d} ≜ E_{c} {∥E_{p^{(c)}} [ϕ (x^{s})] - E_{q^{(c)}} [ϕ (x^{t})]∥}_{H}^{2} = \frac{1}{C} \sum_{c = 1}^{C} {∥\sum_{x_{i}^{s} \in D_{s}} ω_{i}^{s c} ϕ (x_{i}^{s}) - \sum_{x_{i}^{t} \in D_{t}} ω_{i}^{t c} ϕ (x_{i}^{t})∥}_{H}^{2} \end{matrix}

(2)

where

x^{s}

and

x^{t}

are samples from

D_{s}

and

D_{t}

, respectively,

q^{(c)}

and

p^{(c)}

are the distributions of subdomains

D_{s}^{(c)}

and

D_{t}^{(c)}

, respectively. By minimizing Equation (2), the distribution of related subdomains within the same category is narrowed. Parameters

w_{i}^{c}

are defined as weights belonging to the sample

x_{i}

. Given a sample

x_{i}

, the calculation formula of

w_{i}^{c}

is as follows:

w_{i}^{c} = \frac{y_{i c}}{\sum_{(x_{j}, y_{j}) \in D} y_{j c}}

(3)

where

y_{i c}

is the cth element of the vector

y_{i}

, and the source domain sample

x_{i}^{s}

is calculated through the real label

y_{i}^{s}

. However, there is no label

y_{j}^{t}

in the target domain sample

x_{j}^{t}

. In the proposed RASAN-W model, pseudo labels

{\hat{y}}_{j}

are used to calculate

w_{j}^{T c}

.

2.3.3. Pseudo-Label Loss Function

The pseudo-label loss function is utilized during the process of calculating the weights

w_{j}^{T c}

for samples in the target domain. It employs a label classifier

G_{y}

, which has been trained on samples from the source domain, to forecast the labels of the target domain samples

x_{j}^{t}

[23]. However, this approach unavoidably introduces additional losses. To address this issue, pseudo-label learning is implemented. The definition of the pseudo-label loss function is detailed below [24]:

L_{pseudo} = - \frac{1}{M} \sum_{j = 1}^{M} \sum_{m = 1}^{C} p ({\hat{y}}_{j}^{t} = m ∣ x_{j}^{t}) log p ({\hat{y}}_{j}^{t} = m ∣ x_{j}^{t})

(4)

Finally, by combining Equations (1), (2) and (4), the loss function of the label classifier in RASAN-W proposed in this article is obtained:

L_{y} = L_{cls} + λ L_{l m m d} + μ L_{p s e u d o}

(5)

where,

λ

and

μ

are the trade-off parameters of various losses.

2.4. Domain Discriminator Based on Wasserstein Metric

In the original GANs, a generator is defined to map the distribution of extracted noise samples

P_{z}

in the low-dimensional latent space

B

to the distribution of the generated images

P_{g}

in the image space [21]. In order to train the generator, a discriminator is also defined in GANs. Through training, the discriminator considers the false output of the generator as a real sample

P_{r}

. The famous minimax game between the generator

G (\cdot)

and the discriminator

D (\cdot)

is formed. The optimization problem in GANs is expressed as follows [9,25,26]:

arg min_{G} max_{D} (P_{r}, P_{g}) = E_{x \sim P_{r}} log (D (x)) + E_{x \sim P_{g}} log (1 - D (G (z)))

(6)

where

D (x)

represents the probability that comes from real data rather than noise.

G (z)

represents the mapping of the generator in the data space. Undoubtedly, original GANs have made remarkable strides in addressing domain adaptation challenges. Nonetheless, original GANs consistently exhibit instability throughout the training process. In [27], the authors show that minimizing the loss of the generator approximates minimizing the Jensen–Shanno Divergence between

P_{r}

and

P_{g}

when the discriminator is trained to near-optimality, while noting that when the overlap between the distributions can be ignored, the JSD will converge to a fixed value, at which point the gradient of the generator will be zero. To address this issue, the RASAN-W employs a Wasserstein Generative Adversarial Network (WGAN) as a domain discriminator. This approach is designed to minimize the differences in the global distribution across domains. Concurrently, it ensures that the feature distributions of the source domain are closely matched with those of the target domain. This solution is effective in tackling the challenge of gradient disappearance that the domain discriminator in DASAN often encounters during model training.

It is pertinent to underscore that WGAN calculates the similarity between the source and target domain distributions by approximately fitting the Wasserstein distance, rather than the Jensen–Shannon Divergence or Kullback–Leibler Divergence. The Wasserstein metric function is given as follows:

Wass (P_{r}, P_{g}) = sup_{{∥G_{d} (\cdot)∥}_{L} \leq 1} E_{x \sim P_{r}} f (x) - E_{x \sim P_{g}} f (x)

(7)

where sup denotes the supremum bound and L represents the 1-Lipschitz constraint, aimed at constraining the rate of parameter changes within the model, thus mitigating the risks of gradient explosion or vanishing during the training phase [28]. To ensure compliance with the 1-Lipschitz constraint, a gradient penalty term needs to be added:

L_{p e n a l t y} = {(∥\nabla_{\hat{z}} \hat{G_{d} (z, θ_{d})}∥ - 1)}^{2}

(8)

where

\hat{z} = ε G_{f} (x) + (1 - ε) G_{f} (y), ε \in [0, 1)

. Combining Equations (6)–(8), the domain discriminator loss function based on Wasserstein metric is defined as follows:

L_{w} (θ_{f}, θ_{d}) = \frac{1}{M} \sum_{i = 1}^{M} [G_{d} (G_{f} (x_{i}^{s}; θ_{f}); θ_{d}) - G_{d} (G_{f} (x_{j}^{t}; θ_{f}); θ_{d})] - δ L_{p e n a l t y}

(9)

2.5. RASAN-W

The target loss function of the RASAN-W proposed in this article consists of the following four parts:

Cross-entropy loss function $L_{c l s}$ ;
Local maximum mean discrepancy function $L_{l m m d}$ ;
Pseudo label loss function $L_{p s e u d o}$ ;
Wasserstein loss function $L_{w}$ .

Where (1)–(3) are used as the loss function in the label classifier and (4) is used as the loss function in the domain discriminator. Combined with the above loss functions, the target loss function of RASAN-W is as follows:

L = L_{y} - α L_{w}

(10)

where

α

is a trade-off parameter against loss of

L_{w}

. Therefore, the optimization objective of the model can be given by Equations (10)–(12):

({\hat{θ}}_{f}, {\hat{θ}}_{y}) = arg min_{θ_{f}, θ_{y}} L (θ_{f}, θ_{y}, θ_{d}) i n d e n t

(11)

({\hat{θ}}_{d}) = arg max_{θ_{d}} L (θ_{f}, θ_{y}, θ_{d})

(12)

By optimizing Equations (10)–(12), when the model converges, a set of saddle point parameters

{\hat{θ}}_{f}, {\hat{θ}}_{y}, {\hat{θ}}_{d}

will be obtained. At the saddle point, the loss of the label classifier is minimized and the loss of the domain discriminator is maximized. The training strategy of RASAN-W is shown in Algorithm 1, and the network parameters

θ_{f}, θ_{y}, θ_{d}

are updated through the SGD optimizer.

Algorithm 1 Training algorithm for the proposed RASAN-W.

1:: Initialization: Input samples $D_{s}$ and $D_{t}$ . Initialize the parameters $θ_{f}, θ_{y}, θ_{d}$ . Learning rate is $l_{r}$ . Batchsize is M, tradeoff parameter is ${λ, μ, ε}$ .
2:: Training:
3:: for each epoch do
4:: Forward propagation: Random samples $x_{s} = {(x_{i}^{s})}_{i = 1}^{M}$ , $x_{t} = {(x_{j}^{t})}_{j = 1}^{M}$ . Extract features from source and target domain samples layer by layer using Formulas (1) to (7). calculate input sample features $H^{s}$ and $H^{t}$ . calculate subdomain features $z_{i}^{s l}$ and $z_{j}^{t j}$ . Calculate the classification labels of the samples. Calculate the domain labels of the samples.
5:: Back propagation: Calculate the cross-entropy loss using Formula (1), LMMD loss using Formula (2), pseudo-label loss using Formula (4), and Wasserstein metric loss using Formula (9). Utilize the SGD optimization algorithm to update model parameters layer by layer.
6:: end for
7:: Testing: Store all RASAN-W parameters, and predicting labels for target domain sample x^t.

3. Experiments

To comprehensively evaluate the proposed RASAN-W method, we compare it with five fault diagnosis methods using datasets from CWRU and JNU datasets.

3.1. Datasets

CWRU Bearing Dataset: This study utilizes vibration data of bearings sourced from the Bearing Data Center at Case Western Reserve University. Details of the dataset used are outlined in Table 1. The dataset includes data from the drive end bearing, which is a model SKF6205, and the fan end bearing, which is a model SKF6203. The data acquisition was performed at a sampling rate of 12 kHz. There are four motor load states in this dataset, namely 0 HP, 1 HP, 2 HP, and 3 HP. In addition, each load condition also includes three roller faults with different fault sizes, Roller Fault (RF), Inner Ring Fault (IF), Outer Ring Fault (OF), and Normal state (Normal). Therefore, there are ten different failure scenarios for each load. This article constructs four datasets based on the different loads borne by the bearings, namely {

C_{0}, C_{1}, C_{2}, C_{3}

}. Meanwhile, six transfer tasks are designed between these datasets.

JNU Bearing Dataset: The JNU bearing dataset is also used for the transfer task. This dataset uses a Mitsubishi SB-JB induction motor to perform fault diagnosis on the centrifugal fan system. The motor model is a 3.7 KW three-phase induction motor, and the sampling frequency is 50 KHz. The dataset contains three speed conditions 600 rpm, 800 rpm, and 1000 rpm. Each speed condition includes four fault types: Normal state (Normal), Roller Fault (RF), Inner Ring Fault (IF), and Outer Ring Fault (OF), for a total of twelve fault categories. This article constructs three datasets {

J_{0}, J_{1}, J_{2}

} according to the different rotation speeds of the bearings as shown in Table 2, and constructs four transfer tasks among them.

3.2. Transfer Results

3.2.1. Comparison Methods

To verify the effectiveness of RASAN-W, the transfer results of RASAN-W and current bearing fault diagnosis methods were compared. ResNet only uses the feature extractor and label classifier in Figure 3 as the overall network structure. This method only uses source domain samples for training and does not consider target samples. After training is completed, it is tested on target domain samples and does not involve domain adaptation. Deep domain confusion (DDC) is a metric-based domain adaptation method [8]. MMD is used to reduce the discrepancy in first-order statistics and align the domain global distribution. DANN consists of three parts: feature extractor, label classifier, and domain discriminator [13]. During the DANN training process, the feature extractor and label classifier are trained simultaneously to minimize the classification error, and the domain discriminator is trained to maximize the loss of the domain classifier, thus forcing the feature extractor to learn features that are insensitive to domain information. Compared directly with the DANN method, DSAN focuses on solving the discrepancies between subdomains [12]. DSAN introduces LMMD as one of the losses of the label classifier. This measure enables DSAN to focus on interdomain class information while aligning the global distribution. The key idea of DASAN is to introduce the idea of adversarial training into DSAN [14]. At the same time, DASAN introduces pseudo-label loss in the label classifier to further improve the accuracy of fault diagnosis.

In the training phase of the RASAN-W model, the following parameter configuration is employed: To ensure that the label classifier effectively captures fault features and suppress the domain noise during the initial training phase, the trade-off parameters will be incrementally modified

α

:

α = \frac{exp (l \cdot s) - 1}{exp (l \cdot s) + 1}

(13)

where

α

is set to 1 and p is the ratio of the current training batch to the maximum training batch.

α

is set to 0.1,

μ

is selected from {0.001, 0.01, 0.1}, and

δ

is set to 0.1. SGD is selected as the optimizer to update the parameters of the network, the momentum is 0.9, and the weight decay is

5 * 10^{- 4}

.

In the process of introducing the Wasserstein metric to measure the distribution discrepancy between the source domain and the target domain, to avoid the gradient explosion problem that may occur during the model training process, this article uses the L2 norm to clip the gradient. When the L2 norm of the gradient is greater than the threshold, the gradient clipping method is used to clip excessively large gradient values to a specified range. The gradient clipping function is formulated as follows:

\nabla_{w} L_{t o t a l} = \{\begin{matrix} \nabla_{w} L_{t o t a l} & if (∥ \nabla_{w} L_{t o t a l} ∥ \leq G_{n o r m}) \\ \nabla_{w} L_{t o t a l} \times c l i p_{c o e f} & if (∥ \nabla_{w} L_{t o t a l} ∥ \geq G_{n o r m}) \end{matrix}

(14)

\nabla_{w} L_{t o t a l}

represents the gradient of the parameter in the model, and

c l i p_{c o e f}

is the ratio of the norm threshold

G_{n o r m}

and the gradient norm

∥ \nabla_{w} L_{t o t a l} ∥

. The learning rate is set to 0.1. The batch size and training batches are set to 48 and 100, respectively.

3.2.2. Ablation Investigations

To ascertain the superiority of the key components of RASAN-W, when the remaining parameters of RASAN-W are the same, the accuracy of the transfer task is used as a comparison standard, and the method is evaluated by discarding components from our method each time. Specifically, these components include: cross-entropy loss (

L_{c l s}

), cross-entropy loss and maximum mean discrepancy loss (

L_{c l s} + L_{m m d}

), cross-entropy loss and local maximum mean discrepancy loss (

L_{c l s} + L_{l m m d}

), and combinations of cross-entropy loss, local maximum mean discrepancy loss, and pseudo label loss (

L_{c l s} + L_{l m m d} + L_{p s e u d o}

). A comprehensive ablation analysis is performed by performing the transfer task on (

C_{0} \to C_{2}

) on the CWRU dataset, and the results are shown in Figure 4, showing the effectiveness of all proposed modules in our method.

3.2.3. Results of Transfer Tasks at CWRU

For the unsupervised domain adaptation problem described above, this study trains RASAN-W utilizing labeled data from the source domain and unlabeled data from the target domain. To minimize the influence of randomness and to assess the robustness of the model, each domain adaptation task is conducted ten times, and the average accuracy is presented in Table 3. The results indicate that RASAN-W consistently maintains the highest accuracy in the majority of the transfer tasks, with an average accuracy of

99.4 %

. ResNet exhibits the worst bearing fault diagnosis accuracy, with an average accuracy of

73.5 %

. In addition, for DDC (

87.7 %

), DANN (

92.7 %

), DSAN (

97.9 %

), DASAN (

94.2 %

), and RASAN-W (

99.2 %

) and several other domain-adaptive-based methods, the accuracy of bearing fault diagnosis is generally higher than that of the baseline method, which reflects the superior performance of the domain adaptive method when the sample distribution between domains does not conform to an independent homogeneous distribution. Meanwhile, DSAN (

97.9 %

), DASAN (

94.2 %

), and RASAN-W (

99.2 %

) based on subdomain adaptation have obvious advantages over DDC (

87.7 %

) and DANN (

92.7 %

), which ignore the category information, proving the necessity of considering the category information in domain adaptation. In addition, in order to visually analyze and compare the diagnostic performance of each method, the output features of the feature extractor are downscaled to two dimensions using the t-distribution stochastic neighbor embedding (t-SNE) method, and the feature visualization results of multiple methods are shown in Figure 5. From Figure 5a, it can be seen that ResNet not only fails to cluster the same category of sample features in the dataset, but also there is obvious confusion in the separation of different categories of sample features, and in Figure 5c, the same category of sample features can be well clustered, but the separation between different categories of sample features fails, and a large number of features are confused, and these lead to the inability to distinguish the feature categories, resulting in bearing fault diagnosis errors. In Figure 5b,d,e, different category features can be well separated, but the clustering of the same category samples is poor, which reduces the bearing fault diagnosis accuracy. In Figure 5f, it is easy to see that compared with the current mainstream domain adaptive methods, the proposed RASAN-W effectively aligns the global and local feature distributions, which makes the sample features have better interclass differentiation and intraclass tightness, and reduces the differences in the sample distributions that exist between the domains, which effectively improves the diagnostic accuracy of RASAN-W in the bearing fault diagnosis process. Taking the task (

C_{0} \to C_{2}

) as an example, observing the specific errors of fault recognition, the confusion matrix of the experimental results is shown in Figure 6a.

3.2.4. Results of Transfer Tasks at JNU

In order to demonstrate the generalization performance of RASAN-W on different datasets, this paper simultaneously implements four transfer tasks on the Jiangnan University dataset. Similar to the transfer task on the Western Reserve University dataset, five methods were used to compare with the RASAN-W proposed in this paper, and the average accuracies of bearing fault diagnosis are listed in Table 4. On the JNU dataset, ResNet also exhibits the worst bearing fault classification accuracy, and the average accuracy of the transfer task is

92.7 %

; meanwhile, the domain-adaptive-based method diagnosis accuracies are all over

92.7 %

, and the RASAN-W proposed in this paper still maintains the highest accuracy (

96.6 %

), which proves that the proposed method in this paper has the optimal transfer ability and strong robust performance on different datasets. Compared with DDC (

92.8 %

) and DANN (

92.7 %

), DSAN (

92.8 %

), DASAN (

95.2 %

), and RASAN-W (

96.6 %

), which consider category information, still achieve better results, and the necessity of considering category information in the domain adaptation process was verified on different datasets.

The visualization results of the JNU dataset transfer task are shown in Figure 7. Comparing the visualization results in Figure 7f and Figure 7a–e, the RASAN-W proposed in this paper has the best clustering effect of features of samples of the same category and separation effect of features of samples of different categories. It enables the method to effectively migrate knowledge to the target domain through the source domain knowledge and accurately diagnose the health status of the bearing. Meanwhile, the confusion matrix of task (

J_{0} \to J_{2}

) on the JNU dataset is shown in Figure 6b.

4. Conclusions

This article proposes a novel bearing fault diagnosis method termed RASAN-W. On the basis of subdomain adaptation, we introduce the Wasserstein metric in the domain discriminator to precisely measure the disparities across domains, thereby enhancing both the performance and robustness of the model. Even when the sample distributions exhibit significant discrepancies, the differentiable parameters within the model remain optimizable. This article validates the proposed method on the CWRU dataset and JNU dataset to evaluate the effectiveness of RASAN-W in bearing fault diagnosis. Additionally, this article also conducts rigorous ablation studies to verify the superiority of the proposed method. Finally, multiple transfer tasks have shown that the method proposed in this paper improved the average accuracy by (

1.3 %

) compared to the DSAN (

97.9 %

) fault diagnosis method on the CWRU dataset task, which is undoubtedly a great improvement. This demonstrates that the capability of the proposed method to extract transfer fault features with analogous sample distributions and precisely diagnose the bearing condition in the target domain through the bearing fault diagnosis knowledge acquired from the source domain.

Author Contributions

Conceptualization, H.C. and B.Y.; methodology, B.Y.; software, Y.X. (Yujun Xue); validation, Y.X. (Yujun Xue) and Y.X. (Yanwei Xu); investigation, B.Y.; writing—original draft preparation, B.Y.; writing—review and editing, H.C.; visualization, Bo.Yang.; supervision, H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Longmen Laboratory Frontier Exploration Project (LMQYTSKT022).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available in a publicly accessible repository.

Conflicts of Interest

The authors declare no conflicts of interest.

References

LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Zhang, S.; Zhang, S.; Wang, B.; Habetler, T.G. Deep learning algorithms for bearing fault diagnostics-a comprehensive review. IEEE Access 2020, 8, 29857–29881. [Google Scholar] [CrossRef]
Hakim, M.; Omran, A.A.B.; Ahmed, A.N.; Al-Waily, M.; Abdellatif, A. A systematic review of rolling bearing fault diagnoses based on deep learning and transfer learning: Taxonomy, overview, application, open challenges, weaknesses and recommendations. Ain Shams Eng. J. 2023, 14, 101945. [Google Scholar] [CrossRef]
He, M.; He, D. Deep learning based approach for bearing fault diagnosis. IEEE Trans. Ind. Appl. 2017, 53, 3057–3065. [Google Scholar] [CrossRef]
Hoang, D.-T.; Kang, H.-J. A survey on deep learning based bearing fault diagnosis. Neurocomputing 2019, 335, 327–335. [Google Scholar] [CrossRef]
Zhang, W.; Peng, G.; Li, C. Bearings fault diagnosis based on convolutional neural networks with 2-d representation of vibration signals as input. MATEC Web Conf. 2017, 95, 13001. [Google Scholar] [CrossRef]
Chen, Z.; Wu, J.; Deng, C.; Wang, C.; Wang, Y. Residual deep subdomain adaptation network: A new method for intelligent fault diagnosis of bearings across multiple domains. Mech. Mach. Theory 2022, 169, 104635. [Google Scholar] [CrossRef]
Xiao, N.; Zhang, L. Dynamic weighted learning for unsupervised domain adaptation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 15237–15246. Available online: https://ieeexplore.ieee.org/document/9578316/ (accessed on 5 June 2024).
Guo, J.; Chen, K.; Liu, J.; Ma, Y.; Wu, J.; Wu, Y.; Xue, X.; Li, J. Bearing fault diagnosis based on deep discriminative adversarial domain adaptation neural networks. Comput. Model. Eng. Sci. 2024, 138, 2619–2640. [Google Scholar] [CrossRef]
Chen, P.; Zhao, R.; He, T.; Wei, K.; Yang, Q. Unsupervised domain adaptation of bearing fault diagnosis based on join sliced wasserstein distance. ISA Trans. 2022, 129, 504–519. [Google Scholar] [CrossRef] [PubMed]
Long, M.; Cao, Y.; Wang, J.; Jordan, M.I. Learning transferable features with deep adaptation networks. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015. [Google Scholar]
Zhu, Y.; Zhuang, F.; Wang, J.; Ke, G.; Chen, J.; Bian, J.; Xiong, H.; He, Q. Deep subdomain adaptation network for image classification. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 1713–1722. [Google Scholar] [CrossRef] [PubMed]
Ganin, Y.; Lempitsky, V. Unsupervised domain adaptation by backpropagation. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015. [Google Scholar]
Liu, Y.; Wang, Y.; Chow, T.W.S.; Li, B. Deep adversarial subdomain adaptation network for intelligent fault diagnosis. IEEE Trans. Ind. Inform. 2022, 18, 6038–6046. [Google Scholar] [CrossRef]
Wen, L.; Gao, L.; Li, X. A new deep transfer learning based on sparse auto-encoder for fault diagnosis. IEEE Trans. Syst. Man Cybern. Syst. 2019, 49, 136–144. [Google Scholar] [CrossRef]
Cai, R.; Chen, J.; Li, Z.; Chen, W.; Zhang, K.; Ye, J.; Li, Z.; Yang, X.; Zhang, Z. Time series domain adaptation via sparse associative structure alignment. Proc. AAAI Conf. Artif. Intell. 2021, 35, 6859–6867. [Google Scholar] [CrossRef]
Lu, X.; Jiang, Q.; Shen, Y.; Lin, X.; Xu, F.; Zhu, Q. Enhanced residual convolutional domain adaptation network with cbam for rul prediction of cross-machine rolling bearing. Reliab. Eng. Syst. Saf. 2024, 245, 109976. [Google Scholar] [CrossRef]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
Lin, M.; Chen, Q.; Yan, S. Network in network. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010. [Google Scholar]
Long, M.; Zhu, H.; Wang, J.; Jordan, M.I. Deep transfer learning with joint adaptation networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
Long, M.; Wang, J.; Ding, G.; Sun, J.; Yu, P.S. Transfer feature learning with joint distribution adaptation. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 2200–2207. Available online: http://ieeexplore.ieee.org/document/6751384/ (accessed on 3 July 2024).
Luo, X.; Hu, M.; Song, T.; Wang, G.; Zhang, S. Semi-supervised medical image segmentation via cross teaching between cnn and transformer. In Proceedings of the International Conference on Medical Imaging with Deep Learning, Zurich, Switzerland, 6–8 July 2022. [Google Scholar]
Kang, Q.; Yao, S.; Zhou, M.; Zhang, K.; Abusorrah, A. Effective visual domain adaptation via generative adversarial distribution matching. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 3919–3929. [Google Scholar] [CrossRef] [PubMed]
Ragab, M.; Chen, Z.; Wu, M.; Li, H.; Kwoh, C.-K.; Yan, R.; Li, X. Adversarial multiple-target domain adaptation for fault classification. IEEE Trans. Instrum. Meas. 2021, 70, 1–11. [Google Scholar] [CrossRef]
Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved training of wasserstein gans. In Proceedings of the Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein gan. arXiv 2017, arXiv:1701.07875. [Google Scholar]
Adler, J.; Lunz, S. Banach wasserstein gan. In Proceedings of the Annual Conference on Neural Information Processing Systems 2018, Montreal, QC, Canada, 3–8 December 2018. [Google Scholar]

Figure 1. Blue and yellow geometric shapes represent samples in the source domain (a) and target domain (b), with different shapes representing sample categories in different domains and different dashed colors representing different decision boundaries.

Figure 2. Our method utilizes the category information of domain samples to align the global and local feature distributions, resulting in better interclass discrimination and intraclass compactness of sample features, reducing the differences in sample distribution between domains.

Figure 3. The proposed RASAN-W architecture in this article consists of three parts: a feature extractor

G_{f}

(orange), label classifier

G_{f}

(green), and domain discriminator

G_{d}

(blue). The basic framework of the feature extractor is ResNet-50 and the label classifier and domain discriminator are composed of different fully connected layers. The gradient reversal layer is inserted between the feature extractor and the domain discriminator to obfuscate the features of the source domain and the target domain, reducing the differences in feature distribution between domains.

Figure 3. The proposed RASAN-W architecture in this article consists of three parts: a feature extractor

G_{f}

(orange), label classifier

G_{f}

(green), and domain discriminator

G_{d}

(blue). The basic framework of the feature extractor is ResNet-50 and the label classifier and domain discriminator are composed of different fully connected layers. The gradient reversal layer is inserted between the feature extractor and the domain discriminator to obfuscate the features of the source domain and the target domain, reducing the differences in feature distribution between domains.

Figure 4. Ablation study on CWRU dataset (a) and the JNU dataset (b) with different losses.

Figure 5. The visualization of the learned features on the CWRU dataset. (a) ResNet; (b) DDC; (c) DANN; (d) DSAN; (e) DASAN; (f) RASAN-W.

Figure 6. Confusion matrices of the RASAN-W on the CWRU dataset (a) and the JNU dataset (b).

Figure 7. The visualization of the learned features on the JNU dataset. (a) ResNet; (b) DDC; (c) DANN; (d) DSAN; (e) DASAN; (f) RASAN-W.

Table 1. Description of the bearing fault dataset at Case Western Reserve University.

Datasets	Load	Fault Type	Number of Train Sample	Number of Text Sample
$C_{0}$	0 HP	Normal, IF, OF, RF	5600	400
$C_{1}$	1 HP	Normal, IF, OF, RF	5600	400
$C_{2}$	2 HP	Normal, IF, OF, RF	5600	400
$C_{3}$	3 HP	Normal, IF, OF, RF	5600	400

Table 2. Description of the bearing fault dataset at Jiangnan University.

Datasets	RPM	Fault Type	Number of Train Sample	Number of Text Sample
$J_{0}$	600	Normal, IF, OF, RF	1600	400
$J_{1}$	800	Normal, IF, OF, RF	1600	400
$J_{2}$	1000	Normal, IF, OF, RF	1600	400

Table 3. Experimental results of the six transfer tasks using different fault diagnosis methods on CWRU dataset (%).

Methods	$C_{0} \to C_{1}$	$C_{0} \to C_{2}$	$C_{0} \to C_{3}$	$C_{1} \to C_{2}$	$C_{2} \to C_{3}$	$C_{3} \to C_{0}$	Average
ResNet	0.649	0.735	0.740	0.745	0.691	0.713	0.712
DDC	0.908	0.847	0.834	0.933	0.809	0.928	0.877
DANN	0.926	0.921	0.968	0.948	0.851	0.946	0.927
DSAN	0.985	0.968	0.983	0.978	0.973	0.988	0.979
DASAN	0.993	0.965	0.896	0.980	0.851	0.965	0.942
RASAN-W	0.995	1.000	0.985	0.998	0.990	0.975	0.991

Table 4. Experimental results of the four transfer tasks using different fault diagnosis methods on JNU dataset (%).

Methods	$J_{0} \to J_{1}$	$J_{0} \to J_{2}$	$J_{1} \to J_{2}$	$J_{2} \to J_{0}$	Average
ResNet	0.667	0.757	0.802	0.715	0.735
DDC	0.943	0.915	0.933	0.92	0.928
DANN	0.938	0.930	0.968	0.873	0.927
DSAN	0.965	0.905	0.943	0.898	0.928
DASAN	0.935	0.945	0.988	0.940	0.952
RASAN-W	0.980	0.955	0.998	0.930	0.966

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cai, H.; Yang, B.; Xue, Y.; Xu, Y. Residual Adversarial Subdomain Adaptation Network Based on Wasserstein Metrics for Intelligent Fault Diagnosis of Bearings. Appl. Sci. 2024, 14, 9057. https://doi.org/10.3390/app14199057

AMA Style

Cai H, Yang B, Xue Y, Xu Y. Residual Adversarial Subdomain Adaptation Network Based on Wasserstein Metrics for Intelligent Fault Diagnosis of Bearings. Applied Sciences. 2024; 14(19):9057. https://doi.org/10.3390/app14199057

Chicago/Turabian Style

Cai, Haichao, Bo Yang, Yujun Xue, and Yanwei Xu. 2024. "Residual Adversarial Subdomain Adaptation Network Based on Wasserstein Metrics for Intelligent Fault Diagnosis of Bearings" Applied Sciences 14, no. 19: 9057. https://doi.org/10.3390/app14199057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Residual Adversarial Subdomain Adaptation Network Based on Wasserstein Metrics for Intelligent Fault Diagnosis of Bearings

Abstract

1. Introduction

2. Proposed Method

2.1. Define

2.2. Feature Extractor

2.3. Label Classifier Based on Local Minimum Mean Discrepancy

2.3.1. Cross-Entropy Loss Function

2.3.2. LMMD Loss Function

2.3.3. Pseudo-Label Loss Function

2.4. Domain Discriminator Based on Wasserstein Metric

2.5. RASAN-W

3. Experiments

3.1. Datasets

3.2. Transfer Results

3.2.1. Comparison Methods

3.2.2. Ablation Investigations

3.2.3. Results of Transfer Tasks at CWRU

3.2.4. Results of Transfer Tasks at JNU

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI