1. Introduction
Rolling bearings are widely used in industrial equipment. Their health is closely related to machinery and equipment failure prediction and health management (PHM). For this reason, the condition monitoring of rolling bearings has received attention from users at different stages. Condition monitoring for bearings generates a large amount of data. Deep learning facilitates the creation of end-to-end rolling bearing fault diagnosis models, effectively processing this large amount of data. The application of extensive monitoring data [
1] is anticipated to enhance machinery and equipment’s fault prediction and health management (PHM).
Neural network methods have become a hot topic in the field of rolling bearing fault diagnosis [
2,
3,
4]. The advancement of rolling bearing fault diagnosis parallels the rapid development of machine learning theories and techniques [
5]. Early researchers primarily focused on utilizing signal processing methods to extract features containing fault information. For instance, in the early stages of bearing failure, there may be normal temperature, slightly increased noise, and slightly elevated total vibration velocity and acceleration. A significant rise in vibration spike energy can be observed, and empirical modal decomposition (EMD) or signal kurtosis can be used to extract fault characteristics. These extracted features can then be input into a shallow neural network, for fault diagnosis, such as an artificial neural network (ANN). Currently, the deep learning methods focus on constructing end-to-end fault diagnosis models, including Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), and Auto-Encoders (AEs). The earliest CNN models were proposed by LeCun et al. [
6], followed by the development of the VGG Net [
7] and the 152-layer ResNet [
8], which illustrate the evolution of deep networks. Lei et al. [
5] introduced classical deep learning into the field of fault diagnosis of rotating machinery to the extent of bearings. Lei et al. led the craze of using deep neural networks to complete bearing fault diagnosis.
However, the remarkable success of deep learning in bearing fault diagnosis relies on a key assumption. This assumption is that the dataset used to train the neural network (source domain) follows the same distribution as the dataset in the target application scenario (target domain). In practice, this assumption often does not hold true in engineering applications. Transfer learning is anticipated to address this challenge. The core issue in transfer learning is handling the distributional differences between the source and target domain data. Essentially, this involves reducing the discrepancies between the marginal and conditional distributions of the source and target domain data. To enhance transfer performance, Gretton et al. [
9] improved the Maximum Mean Discrepancy (MMD) to Multiple Kernel Maximum Mean Discrepancy (MK-MMD), which better characterizes the differences between the source and target domains. Li et al. [
10] employed an optimal integrated deep transfer network to automate the classification of different faults in rolling bearings. Yu et al. [
11] designed a generalized convolutional neural network (BCNN) with incremental learning capability for fault detection in industrial processes [
12].
The methods proposed in the above literature have been well relocated in unsupervised domain adaptation (UDA); however, there are still some challenging issues. (1) In the absence of sufficient real data support, the significant difference in bearing fault diagnosis information under different operating conditions leads to the problem of the insufficient robustness of traditional methods in cross-condition diagnosis. (2) The introduction of simulated data provides an opportunity to consider the effective integration of real and simulated data, as well as the establishment of an effective transfer learning mechanism to deal with fine-grained alignment between multiple domains. (3) In the environment of bearing fault diagnosis under variable operating conditions, traditional methods are prone to negative migration due to the lack of a clear supervisory method to guide model learning, which leads to poor model performance in the target domain.
We aim to build a model that can reflect the characteristics of the bearing itself, serving as a guide for troubleshooting. The domain that contains certain knowledge of the mechanism of the transferred object is called the model domain. Representing the source domain as
and the target domain as
, we use
to denote the model domain. Generally, the source domain, such as the bearing failure data collected in the laboratory, contains many fully labeled data. In most scenarios, the target domain is the opposite; it typically has either fully unlabeled data (UDA) or a few labeled samples (Semi-Supervised Domain Adaptation) [
13]. Deep learning models trained on the source domain often suffer performance degradation when applied to the target domain. This degradation is due to differences between the source and target domains. Transfer learning applied to the field of bearing fault diagnosis helps to build diagnostic models with generalization capabilities. These models can adapt to variations in operating conditions, loads, and components [
14].
Based on the preceding discussion, transfer learning can be guided by analyzing the target object under diagnosis, such as a bearing, to acquire model domain data or probability functions. Taking rolling bearings as an example, a common approach involves analyzing the damage mechanism and creating analytical models of the bearings. McFadden et al. [
15] identified shock vibrations generated by a rolling element passing through a single point of failure in the inner ring as an impulse sequence function. Some newer studies [
16,
17,
18,
19] have shown that the fault excitation in a faulty bearing consists of a cyclic impulse force of equal amplitude with a smaller random variable, i.e., a random slipping phenomenon. The above is to further refine the difference between the simulated and actual signals by establishing a differential equation set method, which considers factors such as the random slipping of rolling elements, lubricant film stiffness, damping, the high-frequency resonance of the bearing housing, etc. However, the smaller the difference, the more complicated the differential equation set, making it difficult to apply to transfer learning. Based on ABAQUS display dynamics, Liu et al. [
19] analyzed the changes in contact stresses and contact stress distribution of rings, rollers, and cages during operation.
Computer simulation software methods can reveal new phenomena that are difficult to obtain through theoretical analysis or experimental observation. They can also provide the intuitive bearing dynamic response and faulty bearing vibration signals, which are directly applicable to bearing fault diagnosis. This paper’s contributions are summarized as follows:
- (1)
The proposal is to develop a domain-adversarial transfer learning network that fuses bearing simulation data. This network should be able to cope with significant differences in bearing fault signal features under different operating conditions. Furthermore, it should improve the robustness and reliability of the model for fault diagnosis under variable operating conditions.
- (2)
A novel transfer learning mechanism for bearing simulation signals is proposed, comprising a global domain classifier and a subdomain classifier based on simulation signals. This mechanism supports the precise alignment of multiple domains, thereby promoting positive migration, mitigating negative migration, and enhancing the model’s generalization capability under different operating conditions.
- (3)
The optimal supervision method for simulation signals is determined. The reconstruction of the loss function and the design of the domain adversarial network serves to effectively guide the model in learning the target domain’s fault characteristics; this can improve the robustness and reliability of the model and resolve the issue of a tendency toward negative transfer.
2. Theoretical Foundation
Some researchers have carried out related studies and made certain progress in UDA.
The domain adversarial neural network (DANN) [
20] aims to integrate domain adaptation and deep feature learning into a single training process. The goal is to include domain adaptation in representation learning. This ensures that classification relies on invariant features. These features should have similar distributions in both the source and target domains. The system’s structure is illustrated in
Figure 1, primarily comprising Feature Extractor
, Domain Classifier
, and Label Predictor
.
Given a labeled sample set
in the source domain and an unlabeled sample set
in the target domain, the loss of its label predictor is
The optimization objective for the source domain is as follows:
In this equation, represents the label prediction loss for the ith sample and is an additional term used to prevent overfitting of the neural network. The regularizer is optional and its weight is determined by hyperparameter .
The loss of the domain classifier
is as follows:
The loss of an adversarial transfer network is composed of two parts: the label predictor loss (training loss of the network) and the domain discrimination loss. The DANN total objective function is
In this, the parameters of the label predictor are updated by minimizing objective function
and maximizing objective function
to update the parameters of the domain classifier. According to Ganin et al. [
20], Equations (
1)–(
4) can be summarized.
3. Proposed Architecture
3.1. Model Domain Construction Based on Kinetic Finite Element Models
The set of faulty bearings obtained by the modeling approach defines the model domain sample set, while the model domain sample set contains the main factors that influence the fault signal. This simplification of the bearing system is reasonable. The construction of the model domain is based on kinetic finite element models.
We establishes a three-dimensional solid finite element model of a healthy rolling bearing based on the explicit algorithm. The analysis considers the conditions of frictional contact, velocity, and load to examine the variation in vibration acceleration in the vertical direction of the bearing. Using the healthy rolling bearing model, we simulated local spalling faults in the outer ring, inner ring, and rolling element by setting unit defects to construct the model domain dataset.
The paper models 6205 and 6203 rolling bearings for subsequent fault diagnosis.
Table 1 shows the main dimensional parameters of these bearing models, which are also used in conjunction with the test cases.
The modeling process for the ten types of faulty bearings is the same as that for the three types of faulty bearings. Using the Case Western Reserve University bearing dataset drive end bearing decile failure data as an example,
Figure 2 shows the three-dimensional model of the bearing dynamics. The model consists of the inner ring, outer ring, rolling element, and cage.
Figure 3 shows the fault parts after meshing, assuming constant stiffness and damping of the main components, and with the fault parts being the inner ring, outer ring, and rolling body. For normal, inner-ring, outer-ring, and rolling body failures, the fault sizes are 7 mils, 14 mils, and 21 mils, respectively. There are a total of nine types of failure and one normal condition, which are represented by bar finite element meshes of varying sizes.
The experimental bearing failure manifested itself as grooves produced by electrical discharge machining (EDM) and this simulation mimics the same artificial damage as the experiment. The finite element model was meshed using Hypermesh software (version:2021.2) so that each mesh width was 7 mil. Bearing failures of different sizes exhibited grooves with a circumferential length of 7 mil and an axial depth of 7–21 mil.
The delineated mesh was submitted to ANSYS (Version:2020 R2) for finite element calculations. The equation of motion for a rolling bearing, taking into account the damping effect, is provided.
where
M is the system mass matrix; and
,
, and
are the position coordinate vector, velocity vector, and acceleration vector of the node, respectively.
,
,
, and
C, are the load vector, the internal force vector, the hourglass resistance vector, and the damping matrix, respectively. The explicit center difference method is used to solve the time integrals of the system equations.
3.2. Subdomain Classifier and Global Domain Classifier Design
Current domain adversarial neural networks feature a domain classifier. This classifier aligns the overall distribution of the source and target domains. However, it does not consider the alignment of the corresponding categories. This led to confusion between the data from the source and target domains and affected the discriminant component, resulting in misclassification and misalignment. This paper aims to tackle the issue by introducing simulation data and constructing a multimodal structure. This structure is expected to achieve the fine-grained alignment of multiple domain classifiers corresponding to different data distributions. This alignment will be at the category level.
When faced with the task of transferring multiple categorical labels, it is crucial to consider the direction of transfer. This means ensuring that the features of samples from the same category are aligned. Existing studies can be summarized into two main categories [
21]: (a) instance reweighting, which reuses source domain samples using a weighting technique; and (b) feature matching, which implements subspace learning through subspace geometries. To achieve the fine-grained alignment of different data distributions based on multiple domain classifiers, Pei et al. [
22] proposed a local domain classifier for the number of categorical categories. This classifier is used to handle domain adaptation for each category and optimize the conditional probability distribution. Yu et al. [
21] proposed a deep adversarial network model that can dynamically adjust the relationship between edge and conditional distributions by introducing a conditional domain discriminant block and an integrated dynamic adjustment factor. Zhu et al. [
23] defined subdomains based on class labels and grouped the same class into a subdomain.
The current method for fine-grained alignment in unsupervised domain adaptation depends on predicting labels for the target domain data. However, this approach does not entirely diminish the reliance on the source domain data during the transfer process, despite recognizing the phenomenon of negative transfer. Therefore, to achieve fine-grained alignment while avoiding negative transfer, it is necessary to introduce new data with label predictions that are independent of the source and target domain data. It is important to note that the model domain can provide labeled information that reveals a multimodal structure. By using modeling, we can acquire a labeled sample set of model domains that share the same working conditions as the target domain. We can then substitute the gradient inversion results, which are predicted by the labels of the target domain data in the subdomain classifier, with the labeled model domains. Please refer to
Figure 4 for a visual representation. After label matching, the model domain and source domain’s subdomain classifier computes the domain classification loss for each batch. Similarly, the global domain classifier computes the domain classification loss between the source domain and the target domain.
The loss of the global domain classifier
is
A binary variable of the domain label is denoted by
. The loss of subdomain classifier
is defined as
In this case, the domain label denoted by is no longer the source or model domain. However, the calculation can still consider as a binary label to determine whether it is from the source domain.
Referring to Equation (
3) gives the following regularizer
where
The optimization objective of the domain classifier is to combine the subdomain classifiers and the global domain classifiers.
The weights of the global domain classifier and subdomain classifier are adjusted using hyperparameters and .
The DANN model includes only the global domain classifier . Fine-grained alignment methods, such as Multi-Adversarial Domain Adaptation (MADA) and Dynamic Adversarial Adaptation Network (DAAN), rely on subdomain classifier samples that are the model’s labeled predictions of the target domain. Our proposed methodology, however, is not entirely reliant on labeled predictions; hence, it effectively mitigates negative transfers to a greater degree.
The neural network structure responsible for domain classification is shown in
Figure 5. The global domain classifier and subdomain classifier use the same network structure to distinguish whether the inputs belong to the same domain or not. The domain classifier contains Convolutional Layers, a Self-Attention Module, and Fully Connected Layers. Its output is compared with the real domain label to calculate the binary cross-entropy loss. The binary cross-entropy loss is calculated by comparing the output of the domain classifier with the real domain labels.
or
is outputted by either the global domain classifier
or subdomain classifier
.
The Self-Attention Module is added to distinguish between different domains. The Self-Attention Module enhances the ability of the domain classifier to focus on important features, and the domain classifier is better able to capture complex dependencies in the input data, thus improving the efficiency of the domain adaptation task. The dual classifier framework ensures a comprehensive and detailed domain adaptation process. The same architecture simplifies design and implementation while maintaining high performance.
3.3. Improved Loss Function Design for Embedded Model Domains
The subdomain classifier indicates the similarity between the real data and the model domain for each fault. The global domain classifier reflects the similarity between the source and target domains. The model domain data is based on simulations conducted under the same operating conditions as the target domain. It reflects the degree of similarity between the source and target domains in terms of categories to some extent. The subdomain classifier aligns the conditional distribution, while the global domain classifier aligns the marginal distribution. To enhance the feature representation of the model and better express differences in data distribution in high-dimensional space, we have opted for MK-MMD instead of the conventional MMD metric.
One improvement over previous methods is the ability to simultaneously promote positive transfers of relevant data and mitigate negative transfers of irrelevant data. This is achieved by introducing a model domain sample set A instead of the network’s labeled predictions of target domain samples. Therefore, the complete optimization objective of Equation (
4) can be rewritten as follows:
The subdomain classifier against loss is utilized to achieve conditional distribution alignment from the source domain to the target domain, while the global domain classifier against loss is employed to achieve edge distribution alignment from the source domain to the target domain.
3.4. Model Structure and Optimization Methods
Figure 6 illustrates the fundamental architecture of the network model proposed in this study. The model comprises three modules: a feature extractor, a classifier, and a domain classifier.
In each epoch, the feature extractor processes the source domain samples. Then, the label predictor’s loss , the global domain classifier’s loss in comparison with the source domain samples, and the subdomain classifier’s loss in comparison with the model domain samples are computed.
To extract domain-invariant features
f, we aim to minimize the label predictor loss
while maximizing the domain classifier losses
and
. The feature extractor
’s parameters
are learned by maximizing
and
to ensure domain invariance. The domain classifiers
and
’s parameters are learned by minimizing the domain classifier loss. To improve the network’s feature representation, we have introduced CBAM [
24] to the feature extractor.
Due to the large amount of data, an iterative algorithm is required to reduce the computational cost and accelerate the convergence. We chose the SGD algorithm as the optimizer for parameter updating to improve the training efficiency and performance of the model.
The optimization problem is to find the parameters
,
,
, and
that jointly satisfy
The global domain classifier adversarial loss was used to reduce the difference between the source and target domains in the feature space. Still, it may misalign different kinds of bearing health state features, resulting in negative migration. The subdomain classifier adversarial loss is used to align the distribution of samples of the same type from different domains, updating parameters
,
,
, and
as
where
is the learning rate. In the forward propagation, the label predictor
calculates
through Equation (
1). The global domain classifier
compares the source and target domain feature vectors and calculates
through Equation (
6). The subdomain classifier
compares the feature vectors of the model and the source domains and calculates
through Equation (
7). In the backpropagation, the SGD optimizer is based on Equation (
16) to update
,
,
, and
.
The work described above has achieved three objectives:
- (1)
The accuracy of predictions is maximized.
- (2)
Alignment of edge distribution from the source domain to the target domain.
- (3)
The approach utilizes network-independent model domain data instead of predicting labels for the target domain. Each subdomain classifier matches corresponding kinds of source and model domain data to achieve conditional distributional alignment from the source domain to the target domain.
4. Experiments and Analysis
4.1. Description of Datasets
The method’s effectiveness is demonstrated through two cross-domain bearing fault diagnosis cases.
(1) Case 1
Case 1 is an experimental bearing dataset for condition monitoring (CM) based on vibration and motor current signals provided by the University of Paderborn, Germany [
25].
Figure 7 displays the test rig, which comprises several modules arranged from left to right: motor, torque measuring shaft, rolling bearing test module, flywheel, and load motor. The test bearing model is a 6203 rolling bearing and the vibration signals are sampled at 64 KHz. The signals contain three health conditions: normal, internal fault, and external fault.
Table 2 provides further details on the three domains.
(2) Case 2
The CWRU bearing dataset came from Case Western Reserve University (CW-RU) ElectrotechnicsLab [
26].
Figure 8 shows the recorded vibration signals of the bearing’s inner ring, outer ring, ball failure, and normal bearing at 0, 1, 2, and 3 hp loads. The vibration signals of the drive end (DE) were used in this paper with a sampling frequency of 12 KHz.
Table 3 provides detailed information on the three domains.
4.2. Model Domain Dataset
The data in the model domain consist of bearing vibration signals from four states obtained through the finite element simulation method. The simulation method is illustrated in
Figure 9 and the simulation parameters are consistent with those of the target domain. The bearing failures in the experiments were manifested as grooves of different depths produced by the EDM and the finite element simulation modeled the same deepening bearing failures along the radial direction as in the experiments, i.e., the types of failures in the experiments were the same as those in the EDM. The experiments were the same as those in the simulations. The signal processing method used in the model domain is consistent with that of the source and target domains. Using the rolling bearing simulation data for the 14 mil bearing with a failed outer ring as an example,
Figure 10 displays the envelope signals of the vibration acceleration signals and their spectra collected when the rotary axis has an angular velocity of 1796 rad/min and a radial force of 2000 N.
During the uniform acceleration and uniform loading stage, the rolling bearing experiences a radial load of 2000 N while rotating at a fixed speed of 1796 rad/min. After 0.005 s, the rotational speed is fixed and the time-domain waveform presents a series of equal-amplitude pulse shapes. When the bearing surface produces local defects, an impact excitation force is generated each time the defective part contacts other parts during rotation. This impact has a clear periodicity, resulting in sharp peaks in the spectrogram. The ball pass frequency outer race (
BPFO) is calculated as
where
d represents the diameter of the rolling body,
D represents the diameter at the center of the rolling body,
represents the contact angle in the radial direction, n represents the number of rolling bodies, and
N represents the rotational speed of the shaft.
Note: The rolling bearing operates without sliding, and its geometry remains constant. Additionally, the outer ring of the bearing is fixed and does not rotate.
Using the 6205 deep groove ball bearing as an example, the simulated spectrum shows obvious spikes around 107.305 Hz and octave frequency, indicating the model’s effectiveness in simulating the fault state of rolling bearings.
4.3. Experimental Setting
To determine the effectiveness of the proposed method, it was compared with seven alternative methods.
- (1)
To verify the necessity of the transfer strategy, the proposed method (denoted as Method 1) is compared with a standard 2DCNN (denoted as Method 2).
- (2)
In order to reflect the superiority of the proposed method, it is compared with the classical UDA methods, including DANN, deep adaptation network (DAN), and Multi-Adversarial Domain Adaptation (MADA), which are denoted as Method 3 to Method 5.
- (3)
To demonstrate the respective effectiveness of the model domain sample participation with the improved neural network, comparisons were made with the DANN with only the model domain added and with the DANN with only the improved network structure, denoted as Methods 6 and 7, respectively.
- (4)
Industrial processes face noise interference [
27,
28], and to reflect the anti-interference capability of the proposed method, the training results are compared with the data with Gaussian noise added, denoted as Method 8.
To reduce the effect of random initialization, 10 replications of the experiment were performed in each case. In two cases, the main parameters of the proposed method are set as follows: The learning rate is 0.0001, the batch size is 16, the iteration number N is 150, and the optimizer is SGD. A grid search method is employed to identify the optimal parameter combinations, which are then used for parameter tuning. The model parameter settings for the proposed and compared methods are presented in
Table 4.
The proposed method’s specific training and testing flow is illustrated in
Figure 11. During the training process, the global domain classifier aims to produce domain-invariant features by achieving domain alignment through adversarial training. The simulation data serves as a kind of “pseudo-label” to assist the subdomain classifiers in achieving more precise feature alignment and enhancing generalization performance.
4.4. Analysis of the Comparison Results
To determine the effectiveness of the proposed method, it was compared with seven alternative methods.
In this paper, we compare the diagnostic effectiveness of each method on the target domain test set using average accuracy (
Table 5), average F1 score (
Figure 12), accuracy over iterations for different methods (
Figure 13), and confusion matrix (
Figure 14 and
Figure 15).
In the confusion matrix, normal, slight (7 mil), moderate (14 mil), and severe (21 mil) inner-ring faults, ball faults, and outer-ring faults in Case 1 are set to 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, respectively. In Case 2, the labels for normal, inner-ring faults, and outer-ring fault states are set to 0, 1, and 2, respectively. The abscissa is the predicted label and the ordinate is the real label.
The following main conclusions can be drawn:
- (1)
When comparing Method 1 with Methods 2–5, it is evident that Method 1 has a higher average accuracy and a lower standard error of the mean. In unsupervised scenarios, the method of learning demonstrates stronger generalization capabilities. Methods 2 and 3 can be considered models trained on the source domain, directly predicting the target domain. These methods show better post-transfer results in terms of accuracy, F1 score, and other dimensions, reflecting the inhibitory nature of the proposed method on negative transfer. Method 1 achieves an average accuracy of 99.435% and 86.667% in the two scenarios, respectively, while the remaining methods achieve an average accuracy of 98.039% and 83.406%.
- (2)
Comparison of Method 1 and Methods 3, 5, 6, and 7 shows that the diagnostic accuracy of the fault categories is improved by the subdomain classifier and the global domain classifier. This is achieved by optimizing the marginal and conditional probability distributions, respectively, and avoiding negative transfers. When comparing Method 7 and Method 3, the accuracy increased from 79.167% to 79.870% in Experiment 1 and from 86.239% to 86.360% in Experiment 2 with the introduction of the MK-MMD metric. This improvement enhances the ability to discriminate inter-domain variability. Method 1 had diagnostic accuracies of 86.667% and 99.435% in both cases, while the highest diagnostic accuracies of the other methods were 83.406% and 99.040%.
- (3)
By comparing the unsupervised cross-domain diagnostic iteration accuracies of the eight methods in Cases 1 and 2, it is evident that Method 1 and Method 8, which involves training a neural network with added Gaussian noise, exhibit faster convergence, higher diagnostic accuracy, and greater stability after convergence. The simulation data may be inaccurate for real-world scenarios. However, it is supervised data with a lower-bound guarantee. Adapting samples from the source domain that are too different from the target domain may lead to negative transfers. The proposed method maximally suppresses negative transfers and facilitates positive transfers by characterizing the common features of the source, target, and simulation domains.
The dimensionality of the output feature vector is reduced using t-distributed stochastic neighborhood embedding (t-SNE) to reflect the feature adaptation performance of the proposed method.
The feature adaptation results of the eight methods for two cases in one experiment are shown in
Figure 16 and
Figure 17. The label annotation Outer denotes bearing outer-ring failure, Inner denotes bearing inner-ring failure, Norm denotes normal bearing, and Ball denotes bearing ball failure.
For Case 1, the feature adaptation results for the source and target domains in one experiment of eight methods are shown in
Figure 16, where the label prefix S denotes the source domain and the prefix T denotes the target domain.
For Case 2, the faulty bearings in the case are classified into three classes, 1, 2 and 3, according to the size of the faults, and
Figure 17 shows the feature adaptation results of the test set in the target domain for one experiment of the eight methods.
Based on the comparison results, it can be seen that the proposed method shows better results in deep feature extraction common to both source and target domains. This indicates that it has better domain-invariant properties. This is due to the subdomain classifier involved in the simulation data, which achieves features of the same class of samples to be aligned, i.e., facilitates the conditional distribution alignment, not just the source and target domains’ indistinguishable feature alignment. Meanwhile, comparing with MADA, the introduction of simulation data avoids relying on the label prediction of the target domain data, which further suppresses the negative transfer.