1. Introduction
Bearings are key components of rotating machinery, and their performance deterioration or failure directly impacts the equipment’s ability to operate effectively. Therefore, the safe operation of a mechanical system depends largely on the smooth operation of the bearings [
1,
2,
3,
4].
Driven by significant breakthroughs in artificial intelligence technology, the successful application of fault diagnosis based on deep learning has garnered widespread attention from scholars worldwide [
5,
6,
7,
8].
However, in real-world engineering, obtaining large volumes of richly labeled fault data is highly challenging due to cost and safety constraints, particularly for rare fault types, where the available field data are often severely limited. Additionally, machinery and equipment are typically operated under variable conditions, resulting in differences in the distribution of measured samples. Consequently, it is challenging to directly apply a model trained on one operating condition to samples from other operating conditions [
9]. Building on this, transfer learning methods are introduced in this context, and the study of unsupervised cross-domain bearing transfer learning for fault diagnosis holds significant practical importance [
10,
11]. The fault diagnosis performance in the target domain can be improved by transferring essential features from the richly labeled data of the source domain to the unlabeled or sparsely labeled data of the target domain [
12]. Therefore, unsupervised domain-adaptive fault diagnosis offers an effective solution to address the issues of data distribution discrepancy and label scarcity.
As a crucial transfer learning strategy, unsupervised domain adaptation effectively addresses various challenges, such as distributional inconsistencies between the source and target domains. In recent years, these topics have garnered significant and sustained attention from researchers. Pei et al. [
13] proposed a method that employs a multi-domain discriminator to achieve the alignment of various data distributions by capturing multimodal structures. Long et al. [
14] proposed a joint adaptation network to align the joint distributions of various domain-specific layers across domains, thereby facilitating the learning of a transfer network. Kang et al. [
15] proposed a adaptive network that explicitly represents both intra-class and inter-class domain differences to generate more discriminative features, addressing the previous issue of class information neglect, which led to feature misalignment and poor transferability. By adding an extra classifier for the target data, Liang et al. [
16] presented an innovative pseudo-labeling framework that lowers classifier bias while improving pseudo-label quality and performance. Nevertheless, when there is a large distribution discrepancy between two domains, conventional transfer learning methods find it difficult to accomplish fine-grained hierarchical alignment, which results in negative transfer. Therefore, it is essential to incorporate simulation data based on fault mechanisms to facilitate more fine-grained domain adaptation for transfer learning.
The fault simulation signals, generated based on the bearing damage mechanism, encompass damage characteristics under various operating conditions and serve as an ideal source of supervisory information. Several studies have utilized fault simulation signals to assist in transfer fault diagnosis. Hou et al. [
17] further modeled faulty vibration signals by combining constructed faulty pulses with measured normal baseline data. Qin et al. [
18] proposed an innovative dynamic model for rolling bearings exhibiting defects. Li et al. [
19] proposed a mathematical model for multi-DOF angular contact ball bearings employing an enhanced iterative method grounded in internal raceway control theory and nonlinear elastic Hertz contact theory. By constructing simulation domains to guide domain-adversarial transfer learning, diagnostic accuracy and model generalization can be improved. Simulation domains help models better learn different features and patterns during training, enabling fine-grained alignment between multiple domains through an effective transfer learning mechanism.
The method proposed in the aforementioned literature demonstrates promising transfer results in fault diagnosis based on simulation data, offering valuable insights and references for research in this field. However, several existing challenges must be addressed to a certain extent in order to further enhance the performance of bearing fault diagnosis.
(1) In most recent research, bearing source-domain fault data are typically real data that are difficult to adapt to the changing demands of fault data under varying operating conditions. However, it is straightforward to generate a large volume of simulation data with comprehensive fault annotations using numerical simulation technologies [
20,
21], thereby reducing reliance on experimental platform data. (2) The alignment of conditional distributions is often overlooked in favor of focusing solely on aligning the marginal distributions between two domains. This oversight may lead to the misclassification of samples near the category boundaries in the target domain [
22]. (3) In domain adaptation, samples are typically assigned equal weights. Even if the source or simulation domain samples differ substantially from the target domain, they are assigned equal weight, which may lead to negative transfer [
23].
This work proposes a multi-adversarial domain transfer learning fault detection algorithm that utilizes bearing dynamics simulation data to address the aforementioned challenges and the current state of the field. The following are the innovations of this study:
Simulated vibration signals representing bearing faults are generated using bearing dynamics equations, and a domain adversarial transfer learning network that integrates bearing simulation data is developed. A loss function embedded with the maximum mean discrepancy metric is formulated, and simulation data are integrated into the design of subdomain classifiers, facilitating fine-grained alignment from the source domain to the target domain, as well as simultaneous alignment of both marginal and conditional distributions in the context of unsupervised fault diagnosis. A domain similarity-guided weight assignment mechanism is proposed to suppress negative transfer by assigning varying weights to each source domain and simulation domain sample, based on their similarity to the target domain sample.
3. Proposed Method
3.1. Simulation Domain Constructed Based on Simulation Data
Based on the definitions of the source and target domains, this paper constructs the simulation domain using the faulty bearing sample set derived from Equation (
6) in
Section 2.2. The simulation data retains the primary fault signal characteristics, serving as a reasonable simplification of the bearing system.
This paper models the ER-16K rolling bearings manufactured by Timken (North Canton, OH, USA) and 6203 rolling bearings manufactured by SKF (Gothenburg, Sweden), which are consistent with those used in the subsequent fault diagnosis experiments. Specific dimensional parameters are provided in
Table 1.
At the same time, the method of generating simulation signals based on dynamic equations is used to illustrate the auxiliary guiding role of bearing simulation characteristics in fault diagnosis. The area formed by these simulation signals is called the simulation domain.
Combining the representation method of source domain and target domain , the simulation domain is represented by . Combining the label predictor and the global domain classifier , a subdomain classifier is proposed as a representation. Combining the representation of the label predictor loss function and the domain classifier , the simulation domain loss is represented by .
3.2. Improved Loss Function Design with Embedded Simulation Domain
In the DANN unsupervised algorithm, the features extracted by the feature extractor are indistinguishable from those of the domain classifiers through “reverse gradient”, which helps reduce the distributional discrepancy between two domains in the feature space. However, this alignment mechanism relies on adversarial training and does not explicitly optimize the distributional discrepancy between two domains. Therefore, in this paper, we use maximum mean discrepancy (MMD [
28]) to explicitly reduce the feature discrepancy between two domain data.
In this paper, the MMD method is introduced to optimize the loss function of the DANN. By embedding this formula into the loss function of the classical DANN for optimization, it effectively promotes the domain adaptation of the DANN in unsupervised transfer learning. The MMD formula is as follows:
Additionally, to further avoid negative transfer while achieving fine alignment, this paper builds subdomain classifier
by comparing with global classifier
. The global domain classifier evaluates the similarity between two domains, whereas the subdomain classifier measures the similarity between real and simulated data. The simulated data are generated with the same operational parameters as those of the target domain. Therefore, these data can serve as a subdomain to aid in achieving finer alignment during domain adaptation. The loss function for the global domain classifier
is defined as follows:
The loss function of the subdomain classifier
is defined as follows:
In this equation,
and
are the parameters of
and
, respectively, and the subdomain classifiers are aligned with the conditional distribution, while the global domain classifiers are aligned with the marginal distribution. Equation (
4) is updated with the addition of the subdomain classifier
to the following regularized equation:
Building on the previous discussion, the optimization goal of the domain classifier in this paper is to achieve domain fitness alignment by integrating the subdomain classifier, the global domain classifier, and the maximum mean discrepancy. This effectively suppresses the negative transfer problem. The complete loss function after integration is as follows:
In the equation, hyperparameters are employed to adjust the weights of the maximum mean discrepancy, subdomain classifier, and global domain classifier. This optimization function seeks to enhance the alignment between two domains by minimizing both the domain classification loss and the model alignment loss.
3.3. Development of Sample Weight Allocation Mechanisms
To adapt to the target domain, traditional unsupervised domain adversarial adaptation techniques often assign the same weight to each sample from the source and simulation domains. However, samples from the simulation and source domains can differ significantly from those in the target domain. The transfer learning fault diagnostic model may lead to negative transfer if the equal weight allocation approach is maintained.
Given that domain classifiers find it more challenging to distinguish samples with high similarity between different domains, while samples with large differences are more easily distinguished by domain classifiers, their weight allocation mechanism can be computed based on the domain prediction errors of the source domain [
29] and simulation domain samples. The specific weight
of the
ith sample in the source domain is as follows:
The simulation domain sample weight allocation mechanism is defined as follows. The specific weight
of the
ith sample in the simulation domain is as follows:
After applying min–max normalization, the normalized weight
of
is as follows:
In this equation, where
and
. After applying min–max normalization, the normalized weight
of
is as follows:
In this equation, where
and
. By substituting the normalized weight
into Equation (
10), the new global domain classifier cross-entropy loss
is redefined follows:
By substituting the normalized weight
into Equation (
11), the new subdomain classifier cross-entropy loss
is redefined as follows:
By substituting Formula (18) and Formula (19) into Formula (13), the final improved loss function
can be obtained as follows:
To some extent, the weight allocation process for simulation and source domain samples can promote positive transfer while mitigating negative transfer.
3.4. Model Architecture and Optimization Methods
Figure 3 provides a detailed description of the model architecture and workflow proposed in this study. The feature extractor
extracts features from the input data. Subsequently, the losses for the global domain classifier
, the subdomain classifier
, and the label predictor
are computed. Concurrently, the optimization objective is to maximize the losses
and
of the domain classifier while minimizing the loss
of the label predictor to facilitate the extraction of domain-invariant features.
The optimization problem is to determine the parameters
,
,
, and
that satisfy the given conditions.
The model employs the adversarial loss from the global domain classifier for adversarial training. On the other hand, misalignment of the bearing health status features across different classes in the feature space may arise due to global adversarial alignment. To address this issue, the model further introduces subdomain classifier adversarial loss to ensure the alignment of distributions for same-category samples from different domains and reduce category misalignment. Additionally, the maximum mean discrepancy is embedded to optimize the loss function. Finally, a mechanism for allocating weights to source and simulation domain samples is introduced to suppress negative transfer and promote positive transfer.
To handle large-scale data and reduce computational costs, thereby accelerating the model’s convergence, the SGD algorithm is used to update the model parameters
,
,
, and
.
The parameters of the feature extractor, label predictor, global domain classifier, and subdomain classifier are denoted as , , , and , respectively, where denotes the learning rate. These parameters are further updated by categorization loss , global domain adversarial loss , subdomain adversarial loss , and MMD distance .
4. Experiments
4.1. Simulation Dataset Description
As an example, three distinct types of bearing faults are considered from the bearing dataset at Paderborn University. Based on Formula (6) in
Section 2.2, ODE45 was employed for numerical simulation to generate fault data representing the healthy state, inner race fault, and outer race fault. A simulation domain dataset was constructed by collecting simulated vibration acceleration signals.
Figure 4 presents an example of the simulation signal for the 6203 bearing. As an example, the simulation data of the 6203 bearing with an outer race fault highlight its characteristic frequency as follows:
In the frequency domain representation of the simulation data, the outer race fault exhibits a peak at 76.36 Hz, which aligns closely with the theoretical calculation results. It shows that the simulation data can well summarize the kinematic characteristics of rolling bearings.
4.2. Introduction to the Dataset
Relevant experiments were performed in this study on two publicly available datasets. One of these cases (Case 1) utilizes a publicly available, experimentally validated bearing fault diagnosis dataset verified and supplied by Huazhong University of Science and Technology (HUST) in Wuhan, China [
30].
Figure 5 shows the experimental setup employed for the dataset. In this case, the ER-16K bearing was chosen for experimental analysis. The fault conditions in this experiment were artificially preset, with the source domain rotation speed being 20 Hz and the target domain rotation speed being 30 Hz. The health status of a bearing includes seven conditions: medium inner race fault, medium ball fault, medium outer race fault, severe inner race fault, severe ball fault, severe outer race fault, and normal. A detailed description of this case, regarding its source domain, target domain, and simulation domain, is provided in
Table 2.
Case 2 utilizes a failure dataset from a bearing test bench provided by Paderborn University (PU) in Paderborn, Germany [
31].
Figure 6 shows the experimental setup used in the Paderborn bearing test bench. The bearing utilized is a rolling bearing of type 6203 with a sampling frequency of 64 kHz, and the failure data are obtained by an accelerated life test to reflect the real damage situation without distinguishing the size of the failure, which covers three different states: normal, inner race fault, and outer race fault. A detailed description of this case, regarding its source domain, target domain, and simulation domain, is provided in
Table 3.
4.3. Introduction to the Experimental Setup and Comparison Methods
This research compared the proposed method to five different methods to verify its superiority. To train the model, the SGD optimization algorithm was employed. The experiment was conducted ten times under each condition. In the two scenarios, the proposed method’s primary parameters were as follows: the learning rate was 0.001, the batch size was 32, and the number of iterations N was 120.
(1) In order to verify the necessity of using transfer learning for fault diagnosis, the proposed method was compared with the convolutional neural network (CNN).
(2) To evaluate the performance of the proposed model, a series of comparisons were made with traditional transfer learning fault diagnosis methods. These include JAN, CDAN, MADA, and FMIA [
32] which are denoted as Method 3 to Method 6 in turn.
4.4. Analysis of the Experimental Results
In this study, we employed average accuracy (
Table 4), iteration accuracy (
Figure 7 and
Figure 8), confusion matrices (
Figure 9 and
Figure 10), and F1 scores (
Figure 11) to evaluate the diagnostic validity of each approach on the target domain test set. To further evaluate this method, using t-SNE distribution (
Figure 12 and
Figure 13), the adaptation performance of different methods in the feature space is intuitively demonstrated.
From the above conclusions, we can draw several meaningful conclusions:
(1) A comparative analysis of Method 1 with Methods 2–6 reveals that Method 1 (the proposed method) achieves higher average accuracy and a lower standard error of the mean (SEM). Method 1 demonstrates superior generalization ability in the unsupervised setting, indicating that Method 1 has an advantage in mitigating negative transfer, while the lower SEM value also shows the superiority of Method 1 in terms of stability. The average accuracy of Method 1 in unsupervised scenarios was 96.436% and 89.457%, respectively, while the other methods had average accuracies of up to 92.635% and 79.855% only. This result further verifies the effectiveness and stability of Method 1 in negative transfer suppression and cross-domain transfer.
(3) Comparing the confusion matrices of the fourth experiment of the six methods in Cases 1 and 2, as shown in
Figure 10, Method 1 improves the diagnostic accuracies of the healthy, inner, and outer circles by 22%, 2%, and 4%, respectively, compared to MADA (Method 5), reflecting the validity of the proposed method.
This study employed the t-distribution algorithm for visualization purposes. In the visualization of t-SNE, for Case 1, the defective bearings are categorized into two groups based on the severity of the fault: 1, and 2. Specifically, they are labeled as Normal, Inner-1, Inner-2, Ball-1, Ball-2, Outer-1, and Outer-2, respectively, in the t-SNE visualization indicating seven cases of different fault types and fault levels. In Case 2, the source domain is denoted by the label suffix “src”, while the target domain is represented by the prefix “tra”. “Inner” represents the bearing inner race fault, “Norm” indicates the normal state, and “Outer” corresponds to the outer race fault. The results of the six methods are shown in
Figure 12 and
Figure 13.
According to the results of t-SNE compared with other methods, the feature distribution of the proposed method exhibits a clearer clustering structure, and the boundaries between different fault types are more obvious, with less overlapping between features. This shows that the proposed method can better distinguish various classes of fault features while maintaining intra-class feature tightness, which enhances the effectiveness of feature adaptation. This is because the proposed method involves a global classifier and subdomain classifier, which can realize fine-grained alignment of the marginal and conditional distributions. Additionally, the adaptive weight allocation mechanism can effectively suppress negative transfer.
4.5. Ablation Experiment
To further analyze the contribution of the MMD module, the subdomain classifier module comprising simulation data and the weighting allocation mechanism module of the proposed method to the transfer learning fault diagnosis, ablation experiments were conducted to verify its effectiveness.
Network A: Removal of the weighting mechanism module, subdomain classifier module, and MMD module; Network B: Removal of the weighting mechanism module and subdomain classifier module; Network C: Removal of the weighting allocation mechanism module; Network D: Proposed method, as shown in
Table 5.
Initially, Network A was used as a base model for transfer learning bearing fault diagnosis. Subsequently, additional modules were progressively introduced to assess their performance improvements. Network B incorporated the MMD module into Network A. The improved loss function designed to embed the MMD helps to reduce the difference in feature distribution between the source and target domain data. Network C constitutes a subdomain classifier module based on Network B by adding simulation domain data, which serve as effective supervisory data that guarantee the lower limit of the information transfer effect during the confrontation process. To promote positive transfer and suppress negative transfer, Network D incorporated a weight allocation mechanism module into Network C.
Based on the above experiments, conclusions can be drawn:
(1) As shown in
Table 6, the average diagnostic accuracy of the proposed method (Network D) shows an improvement of 11% and 16% in two cases when compared to the original DANN (Network A), reflecting the effectiveness of the proposed method. The DANN method with the weighting allocation mechanism module, MMD module, and subdomain classifier module in Network D outperforms the Network A method in both cases. The results show that the proposed method has a good effect in bearing transfer learning.
(2) As shown in the comparison of the ablation experiments of the three network structures of Network A, Network B, and Network C in Cases 1 and 2, as shown in
Figure 14,
Figure 15 and
Figure 16, the combination of subdomain classifier and global domain classifier enhances the cross-domain adaptability of the model through fine-grained alignment. Meanwhile, the designed improved loss function embedded in the MMD helps to reduce the feature distribution discrepancy between domains, thereby improving the overall diagnostic accuracy for each category. As shown in
Figure 17, the accuracy of Network B in the normal state is 3% higher than that of Network A, while the accuracy of Network C in the normal state is further improved by 15% compared to Network B.
(3) By comparing the performance of four indexes, namely, accuracy, F1 score, recall rate, and precision rate, of the two network structures, Network C and Network D in Cases 1 and 2, as shown in
Figure 18 and
Figure 19, it can be seen that the designed sample weight allocation mechanism significantly enhances the stability of the model while improving the recognition performance of the diagnostic model. This mechanism can adaptively assign weights to the source domain samples and simulation domain samples, thus effectively suppressing negative transfer and promoting positive transfer.
The results show that the MMD module, simulation domain, and weight allocation mechanism in the proposed method not only effectively suppress negative transfer but also improve the effectiveness and stability of unsupervised cross-domain transfer.
4.6. Experimental Results of Noise Immunity
Vibration and friction between bearing components can generate considerable noise in real-world working environments. These noises interfere with the collection of vibration signals by sensors, thus masking fault information within the signal. To evaluate the robustness and effectiveness of the proposed model in a noisy environment that more closely reflects real-world conditions, Gaussian noise of varying intensities is introduced into the test signal to assess the model’s anti-noise capabilities. The signal-to-noise ratio (SNR) is a key indicator used to measure the relationship between signal strength and noise strength and is frequently employed to assess signal quality under noise interference conditions. It is the ratio of signal power to noise power, as shown below:
where
is the effective power of the signal and
is the effective power of the noise. In this study, four types of Gaussian noise intensities were included, with signal-to-noise ratios of 6 db, 4 db, 0 db, and −2 db, respectively. These noise intensities increase from mild to severe noise levels to simulate various working conditions, ranging from slight interference to severe signal pollution in bearing fault diagnosis.
Through the comparison of the above results, the following conclusions can be drawn:
(1) As illustrated in
Figure 20 and
Figure 21, with the introduction of noise, the accuracy of the six network models decreases to some extent on the two datasets, and the difficulty in characterizing model diagnosis gradually increases. Simultaneously, as the noise intensity increases further, diagnostic accuracy continues to decline. This is due to noise interfering with the model’s feature extraction, making it difficult to distinguish between fault and normal state features. Furthermore, noise weakens the model’s ability to consistently identify feature patterns, thereby impairing the generalization performance of the diagnostic model. Among the six model algorithms, our method demonstrates superior noise resistance. After introducing 4 dB noise, the accuracy of our method in the two cases is 92.62% and 83.56%, respectively, which exceeds the diagnostic accuracy of the other five methods. This further confirms that our method exhibits robustness and performance stability under noise interference conditions and can maintain a certain level of diagnostic accuracy under complex working conditions.
(2) A comparison of the model accuracy of the six methods in two cases, as shown in
Figure 20 and
Figure 21, indicates that our method exhibits better noise resistance performance than the other five methods when high noise is added, compared to low noise. Following the introduction of 0 dB noise, the accuracy of our method decreased by 8.08% and 9.78% in the two cases, respectively. Under high noise conditions, the accuracy decrease was less than that observed for the other five methods.
(3)
Figure 22 and
Figure 23 compare the confusion matrices of all methods under two different cases in the noise-resistant experiment. In
Figure 22, compared with MADA, the diagnostic accuracy of our method for medium ball bearing fault, medium outer race fault, severe inner race fault, severe ball bearing fault, and normal is improved by 11%, 15%, 5%, 6%, and 6%, respectively, which reflects the effectiveness and stability of the proposed method.
The results demonstrate that the MMD module, simulation domain, and weight distribution mechanism in our method not only effectively suppress negative transfer but also enhance the model’s robustness against noise to some extent.
5. Conclusions
This study presents an information-assisted multi-adversarial domain transfer learning method for fault diagnosis in rolling bearing dynamics simulation. The method aims to diagnose rolling bearing faults under various operating conditions. The method suppresses negative transfers and promotes positive transfers to improve model performance. Initially, a subdomain classifier and a global domain classifier are constructed. In the subdomain classifier, a dynamic equation is constructed to generate simulated vibration data containing extensive bearing fault label information. These data substitute the label prediction for the target domain and facilitate the alignment of both the marginal and conditional distributions. Simultaneously, the enhanced MMD loss function aims to reduce the differences between the feature distributions of the source and target data. Finally, adding a sample weight allocation mechanism can effectively suppress negative transfer.
In the context of unsupervised fault diagnosis, the proposed model effectively learns more generalized data features. By introducing simulated signals generated by the bearing dynamics equation, the algorithm’s lower bound is enhanced, effectively suppressing negative transfer and significantly improving the model’s stability. The results indicate that the proposed method achieves higher accuracy and lower SEM (standard error) on the two datasets with values of 89.457 ± 1.385 and 96.436 ± 1.264, respectively. After multiple rounds of training, the accuracy is improved to a certain extent compared with other methods. More importantly, this method shows better stability in indicators such as accuracy, F1 score, and recall rate, and the SEM is lower. At the same time, the model has better noise resistance and shows significant advantages in suppressing negative transfer.
This work investigated an innovative cross-condition transfer learning method by generating bearing simulation fault data. In the future, the authors will further explore methods for generating simulation fault data for other critical mechanical components, such as bearings, and related transfer learning fault diagnosis algorithms.