1. Introduction
In the modern manufacturing industry, bearings play a key role, which not only support mechanical transmission but also effectively reduce the friction between parts [
1]. The stability and safety of bearings have a direct and important impact on the normal operation of mechanical equipment [
2]. In view of this, the monitoring of the operating status of bearings and the study of fault detection are extremely important. Deep learning (DL) has been widely used in the field of bearing fault diagnosis and has become a popular direction in industrial research due to its ability to automatically extract feature signals, reduce the reliance on manual experience and powerful nonlinear modeling capabilities [
3,
4,
5]. Numerous cutting-edge intelligent diagnostic techniques based on deep learning, such as the deep Boltzmann machine (DBM) [
6], graph neural network (GNN) [
7], and physical information neural network (PINN) [
8], are emerging. These methods break through the limitations of traditional data-driven methods and significantly improve the accuracy and reliability of bearing fault identification. However, in the real world, it is not easy to obtain a large amount of suitable and real fault data due to the limitations of various factors, such as high cost of data collection, low and unpredictable probability of fault occurrence, and restricted access to data [
9]. The results of many current studies are highly dependent on a large amount of failure data. In view of this, the results of the above studies do not have good applicability in the working environment of real machines [
10].
In order to solve the problem of difficulty in obtaining a large number of fault samples, commonly used data generation or data augmentation methods include generative adversarial networks (GANs) and their variants [
11], and diffusion models [
12]. These methods help to improve model training by generating new data samples or augmenting existing data to improve the accuracy of fault diagnosis. Although GANs perform well in generating data, there may still be large differences between the distribution of generated data and the distribution of real data, resulting in the generated data not being able to replace fully real data for model training. The training process of diffusion models usually requires a large amount of computational resources and time, especially when generating high-quality and diverse data.
Some scholars have shifted their research focus to meta-learning, transfer learning, or the combination of meta-learning and transfer learning. Meta-learning, also known as “learning to learn”, aims at training situational tasks by using only a small amount of sample data so that it can quickly adapt to new tasks and has good generalization ability [
13], thus showing a broad application prospect in many fields. Under the meta-learning paradigm, Zhang et al. [
14] proposed a small-sample bearing fault diagnosis method based on model-agnostic meta-learning (MAML), which solves the problem of scarcity of fault samples in real industrial scenarios by means of a meta-learning strategy that optimizes model parameters. Li et al. [
15] proposed a multiscale weighted integration model based on a light gradient boosting machine, which solves the problem of insufficient feature representation and weak model generalization in small-sample fault diagnosis by fusing multiscale features with a dynamic weight allocation mechanism. Wang et al. [
16] proposed a small-sample bearing fault diagnosis method based on a self-embedding transformer, which solves the problem of traditional methods relying on manual feature engineering and poor model interpretability through adaptive feature embedding with an interpretable attention mechanism. More interestingly, Wang et al. [
17] designed an interesting impulse neural network-oriented, brain-like learning algorithm based on the learning mechanism of a biological neural system and introduced a meta-learning strategy to apply it to the sample-scarce bearing fault diagnosis task. The core strength of transfer learning is its powerful ability to transfer knowledge. It is able to efficiently apply existing knowledge to new tasks even when data is scarce, and the source domain is significantly different from the target domain. This ability makes transfer learning show unique value when facing complex and changing real-world application scenarios [
18]. Liang et al. [
19] proposed an innovative diagnostic method based on a hybrid convolutional neural network model combined with an incremental migration learning strategy, which is able to adapt to different processing conditions. Experimental results show that this method increases the average accuracy rate substantially. Zhang et al. [
20] applied the transfer learning strategy in the framework of a finite element integrated neural network, and not only verified its significant effect in improving computational efficiency but also confirmed the effectiveness of the strategy in many different scenarios, such as elasticity, elastoplastic, and multimaterials. In addition, some researchers have skillfully blended the strengths of meta-learning and transfer learning. Meta-learning is good at quickly adapting to new tasks, while transfer learning excels in knowledge transfer, and the combination of the two provides new ideas and methods for solving complex problems. This fusion strategy shows strong adaptability and efficiency when facing scenarios with scarce data, significant domain differences, and rapidly changing tasks. For example, the augmented meta-migration learning method proposed by Ma et al. [
21] is able to achieve high-accuracy bearing fault diagnosis using a small amount of data under changing operating conditions. Experimental results show that the average accuracy of fault diagnosis reaches 95.2% after the introduction of augmented meta-migration learning. Zhong et al. [
22] proposed an innovative cross-domain fault diagnosis method, which is based on the powerful ability of meta-learning to quickly adapt to new domains. On this basis, they further introduced a domain adaptation strategy, which effectively reduces the distribution difference between the source and target domains. Through this clever combination, the model is not only able to quickly learn the features of new tasks but also significantly improves the accuracy of cross-domain fault diagnosis, which provides new ideas and methods for solving complex and changing fault diagnosis problems.
Overall, meta-learning and transfer learning have indeed made significant progress in addressing the challenge of difficult access to fault samples. However, most of these advances rely on datasets of human-induced faults that have relatively standardized and controlled fault patterns. In contrast, faults in real industrial environments tend to be more complex, diverse, and unpredictable and differ significantly from artificially simulated faults. Although human-induced fault data can provide a basis for research to a certain extent, it is difficult to comprehensively cover the complexity of real faults due to their strong normality. Therefore, it cannot be directly applied to the actual industrial machine fault diagnosis. Nevertheless, the research direction of combining meta-learning with transfer learning still shows great potential and has achieved relatively leading results in related fields. However, to be truly applied to real industrial scenarios, further exploration is still needed to bridge the gap between human-induced faults and natural faults better.
Inspired by this, this paper hypothesizes that human-induced fault data contains some of the characteristics of natural faults. If the diagnostic knowledge from human-induced fault data can be used appropriately and effectively transferred to natural fault diagnosis, the challenge of obtaining expensive equipment fault data in real-world scenarios is expected to be significantly alleviated. Therefore, this paper proposes a domain-adaptive meta-relation network (DAMRN) for bearing fault diagnosis. DAMRN is a meta-learning framework designed for human-induced fault-to-natural fault knowledge transfer, aiming to solve the problem of cross-domain discrepancy between laboratory-simulated faults (human-induced faults) and real-scenario faults (natural faults) and, thus, to efficiently utilize the human-induced fault data in support of the diagnosis of natural faults. Specifically, through meta-task scenario training, DAMRN captures task-irrelevant general features from human-induced fault samples, enabling the model to adapt to target domain tasks quickly. Secondly, an explicit alignment and implicit adversarial complementary domain adaptation strategy is set, effectively reducing the domain discrepancy between human-induced faults and natural faults. This combination not only fully leverages the rapid adaptation capabilities of meta-learning to new tasks but also effectively transfers knowledge through domain adaptation techniques.
Compared with previous work, our contributions are as follows:
- (1)
This paper proposes a new solution path for human-induced fault to natural fault knowledge transfer, which is used to solve the problem of not being able to obtain a large number of fault samples for real-world precision devices.
- (2)
DAMRN is a meta-learning framework for cross-domain fault diagnosis, which incorporates multigranularity domain adaptation mechanisms, including explicit distributional alignment and implicit adversarial learning, based on the new task adaptation capability of meta-learning. The design aims to address the cross-domain robustness challenge under the new task.
- (3)
We demonstrate the feasibility of DAMRN with extensive experimental validation on two human-induced fault datasets and one natural fault dataset.
Next, the sections of this paper are set up as follows:
Section 2 describes in detail the underlying theory of relational networks.
Section 3 describes the implementation details of DAMRN in detail.
Section 4 describes the experimental dataset and gives the experimental results with discussion.
Section 5 concludes this work.
2. Meta-Relation Network
Meta-learning, also known as “learning to learn”, is centered on equipping models with the ability to adapt quickly to new tasks, much in the same way that humans learn [
23]. For example, when children learn about a new animal, they can often recognize it accurately from a small number of pictures or videos rather than from a large number of samples. Traditional machine learning methods usually rely on a large number of data samples for model training, and learn the mapping relationship between features and labels directly, which is highly relevant. However, these models often need to be retrained when the application environment changes, which is particularly inconvenient when data collection is difficult. In contrast, meta-learning accumulates task experience by training on multiple tasks, thus enabling rapid adaptation when faced with new tasks. The more mainstream meta-learning models include the Siamese network [
24], prototypical network [
25], relation network [
26], MAML [
27], and matching network [
28], etc. These models achieve fast learning and adaptation to new tasks with a small number of data samples through different mechanisms, such as learning the similarity between samples and optimizing the initialization parameters of the model.
Before detailing the method proposed in this paper, it is necessary to briefly introduce the relation network because DAMRN is constructed based on the relation network. The structure of the relation network is shown in
Figure 1, which is mainly composed of two parts: the feature extractor
and the relation metric
.
In the learning process of the relation network, the dataset is divided into support set
and query set
, where
and
denote the input samples, and
and
denote the category labels of the samples, respectively. The main role of the support set is to generate prototype features for each category; this process is achieved by superimposing all the sample features under the same category in the support set, i.e., the features of each category are summed element by element to obtain a prototype representation of the category. The query set is then used as a training sample to evaluate the performance of the model on new samples and update the model parameters accordingly. Specifically, the model predicts the category of a query sample by calculating the relationship scores between the features of the samples in the query set and the prototypical features of each category, i.e., it classifies the samples by learning the relationships between them. The process is described as follows: First,
maps
and
to a uniform feature space to obtain
and
. Then, the features of the support set samples and the query set samples are combined to obtain the combined feature
. Finally,
is input into the relationship metric to calculate the relationship score
:
where
is the concatenation operation.
Unlike Siamese networks, prototypical networks, and matching networks, these networks typically use a predefined fixed similarity metric function to measure the similarity between features. For example, Siamese networks extract features through subnetworks with shared weights and use the Euclidean distance metric function to compute the similarity between two samples. Prototypical networks are classified by computing the distance between the class prototype and the new samples, and commonly used distance metrics include the Euclidean distance and the cosine distance, among others. Matching networks, on the other hand, perform classification by calculating the similarity between the support set and the query set, usually using cosine similarity as the metric function. However, it is clear that these functions cannot cover all types of relationships. In this work [
26], the authors demonstrate through a simple experiment that artificially designed metric functions tend to fail when dealing with nonlinear complex relationships. In contrast, relation networks use a convolutional neural network to train to obtain a learnable nonlinear similarity metric function that is able to measure different relationships more efficiently. This learnable similarity measure has enabled the relation network to achieve significant performance gains in few and zero-shot classification tasks, setting state-of-the-art records in several domains.
In addition, the
N-way
K-shot strategy of the relation network guides the model to learn task-independent generic feature representations so that it can better adapt to new tasks [
29,
30]. Therefore, in order to achieve knowledge migration from human-induced faults to natural faults, this paper proposes a DAMRN based on relation networks. In the next section, this paper introduces the specific structure and implementation of DAMRN in detail.
4. Experimental Results and Discussion
4.1. Experimental Data
CWRU [
33]: To validate the effectiveness of the DAMRN, this paper selected data from Case Western Reserve University (CWRU) as the source domain data, which is the easily collected human-induced fault dataset. This dataset is derived from a precision-designed bearing fault simulation platform, as shown in
Figure 4. A single point of damage was simulated by EDM technology. Then, acceleration sensors are used to collect the vibration signals of the faulty bearings, and the sampling frequency is set to 12 kHz. Three types of samples are selected for this work, namely, inner-ring damage, outer-ring damage, and normal state, and the sample parameters are detailed in
Table 1.
PU [
34]: Natural faults were selected from the dataset provided by Paderborn University. The PU’s electromechanical drive system test platform (shown in
Figure 5) consists of core components such as electric motors, torque sensing shafts, bearing test modules, inertial flywheels, and load motors. The platform uses deep groove ball bearings of type 6203 as test objects. The data on natural faults are obtained through accelerated life experiments. The data acquisition system synchronously records three-axis vibration signals, motor current signals, and mechanical parameters to form a multidimensional monitoring system, in which the vibration acceleration signals and motor current signals are sampled at a high frequency of 64 kHz, the mechanical parameters (radial load, rotational speed, and torque) are recorded at a frequency of 4 kHz, and the temperature monitoring is collected at a low frequency of 1 Hz. The study confirms that natural damage samples are more closely related to actual degradation mechanisms than human-induced fault samples [
35], highlighting their unique value. The sample parameters are detailed in
Table 2.
4.2. Experimental Details
4.2.1. Sample Processing Details
In the sample preparation stage, human-induced fault samples are used for training and natural fault samples are used for testing. Particularly worth mentioning is that because the data length is too short, this study refers to the related works [
36,
37,
38] and introduces the sliding window resampling technique (the window step size is 100 data points, and the sampling length is 2048 points), which can obtain a large amount of data. The specific data processing flow is shown in
Figure 6. The statistics of the number of training and testing samples are detailed in
Table 3, where 700 samples of each class are randomly selected for the training set, totaling 2100 samples each, and 100 samples of each class are randomly selected for the testing set.
4.2.2. Experimental Environment
In this study, a strictly controlled experimental environment is used to build the computational experimental platform, and the training system is constructed based on the NVIDIA GeForce RTX 3060 graphics processor and Pytorch 1.40 deep learning framework. In terms of hyper-parameter optimization, the optimal configuration is determined by grid search: the batch size is fixed at 64 to balance the memory consumption and gradient stability, and the learning rate is calibrated to 0.001 by pre-experiment. The model training period is 200 epochs to ensure that the model fully converges while avoiding the risk of overfitting. In order to enhance the statistical reliability of the experimental results, the experiments were repeated five times independently under each identical condition, and the average value was finally taken as the index for assessing the performance.
4.3. CASE1: Human-Induced Fault to Natural Fault Transfer in the Same Machine
In this section, we focus on the knowledge migration capability of DAMRN from human-induced faults to natural faults in the same machine environment, constructing a D → E migration scenario. In order to fully evaluate the performance of DAMRN, we compare it with a variety of other approaches, including the classical deep network model WDCNN, as well as the baseline models in the field of transfer learning, M_MMD [
39], Fine-tuning (FT) [
40], and DANN [
32]. In addition, we introduced SOTA methods in the field of migration learning, such as MRN [
41], S(t) [
38], and TRN [
42], to ensure that we can more accurately measure the strengths and weaknesses of DAMRN in knowledge migration tasks. All experiments were repeated five times, and the average classification accuracy (%) was used as the evaluation index; the specific results are shown in
Figure 7.
From the experimental results in
Figure 7, it can be seen that DAMRN achieves a classification accuracy of 99.62% in the D → E transfer scenario, which is higher than the other compared methods. This indicates that DAMRN has a clear advantage in the task of knowledge transfer from human-induced faults to natural faults in the same machine environment. This result shocked us because it is difficult to achieve this accuracy according to our previous work experience. We determined the reliability of this set of experimental results after several inspections as well as comparing the performance of the methods. Undoubtedly, the typical deep learning method WDCNN has the lowest accuracy (84.17%) among all methods and is 15.45% lower than DAMRN. This indicates that WDCNN is weak in adapting to the differences between the source and target domains when dealing with knowledge transfer tasks. The average classification accuracy of M_MMD is 89.95%, which is improved over WDCNN but still lower than other methods. This is because M_MMD relies only on the statistical alignment of multiple MMDs, which leads to alignment failure when the higher-order feature distributions of artificial and natural faults differ significantly. The average classification accuracy of FT was 94.39%, indicating the effectiveness of the fine-tuning approach in the knowledge transfer task. However, FT requires a small amount of target domain labelled data to operate effectively. DANN, MRN, S(t), and TRN also achieved good results.
Overall, DAMRN is significantly superior in knowledge transfer from human-induced faults to natural faults in the same machine environment. Its high accuracy indicates that DAMRN can effectively adapt to the differences between human-induced faults and natural faults for efficient knowledge transfer. In contrast, other methods such as DANN, MRN, S(t), and TRN, although showing some effectiveness in some aspects, are still lower than DAMRN in terms of overall performance.
4.4. CASE2: Human-Induced Fault to Natural Fault Transfer Across Machines
In real-world scenarios, high-end equipment such as aero-engine spindles and medical CT machine bearings are strictly prohibited from human damage to obtain fault data due to their importance and value. In view of this, this section focuses on knowledge transfer under different equipment conditions to validate the practical performance of the proposed approach in knowledge transfer from human-induced faults to natural faults across machines. Specifically, low-cost testbeds (such testbeds allow manual fault implantation) are used to construct source domain training sets. The diagnostic knowledge acquired from low-cost testbeds is migrated to high-value target devices by means of feature space mapping. This research direction is highly compatible with realistic conditions and not only has high research value but also faces large difficulty challenges. Three different sets of data, A, B, and C, shown in
Table 1, are used as the source domain, and high-value device E, listed in
Table 2, is used as the target domain. Four migration scenarios are constructed: single-source domain transfer (A → E/B → E/C → E) and multi-source domain joint transfer (ABC → E). The specific results are shown in
Table 4, with bold representing the optimal performance.
From the experimental results in
Table 4, it can be seen that DAMRN achieves optimal diagnostic accuracy (94.03–98.49%) in all transfer tasks, with an average improvement of 2.95 percentage points in average accuracy compared to the suboptimal method TRN. In addition, in single-source domain transfer scenarios (e.g., C → E), DAMRN (95.79%) improves by 24.59% over the traditional domain adaptation method, DANN (71.20%), which proves the effectiveness of DAMRN in mitigating feature distribution bias caused by the differences between different devices. The joint transfer of multiple source domains (ABC → E) resulted in a significant improvement in the performance of all methods, with DAMRN reaching a peak accuracy of 98.49%. This validates that the complementary nature of multiclass fault data enhances model generalization.
Compared to Case 1, some of the methods, such as WDCNN, FT, and DANN, show severe performance degradation, indicating the very high difficulty of cross-device transfer. However, the performance of DAMRN, TRN, and MRN still maintains high accuracy, indicating that their transfer strategies still keep running effectively in the more difficult scenarios.
4.5. Impact of K-Shot
The learning process of DAMRN follows the meta-learning criterion of
N-way
K-shot. In order to fully evaluate the impact of
K-shot on model performance, this paper designed a series of experiments to observe the performance of DAMRN in different tasks by varying the value of
K. The experimental results are shown in
Figure 8, which demonstrates in detail the average classification accuracy of DAMRN on each task (A → E, B → E, C → E, D → E) with different
K settings.
As we know from
Figure 8, the average classification accuracy of DAMRN on all tasks generally shows an increasing trend as the number of shots increases. This indicates that more samples in the support set help the model to measure the relationship between the source and target domains. Because the test samples are compared one by one with each of the
K samples of each category in the support set, the average of the
K scores of each category is taken, which avoids the effect of a single outlier sample. Specifically, the accuracy of D → F is consistently close to 100%. The accuracy of A → F gradually increases from 94.13% to 99.11% from 1 to 10-shot. The rest of the transfer tasks show similar improvements. However, from 6 to 10-shot, the accuracy of DAMRN tends to stabilize on all tasks, indicating that the model has already reached a high level of performance with a certain number of samples, and that further increase in the number of samples has a limited effect on the performance improvement.
4.6. Validation of the Effectiveness of the Transfer Strategy
The excellent knowledge transfer capability of DAMRN is mainly attributed to two key factors: firstly, the meta-learning paradigm of relation networks, which itself possesses a strong task-independent feature learning capability and can quickly adapt to new tasks; and secondly, the domain adaptation method we designed in DAMRN, which further enhances the performance of the model. To examine the designed domain adaptation method, this paper designed four sets of comparison experiments, as shown in
Table 5.
See
Figure 9. In all the tasks (A → E, B → E, C → E, D → E), the performance of the Baseline is relatively low, especially in the A → E and B → E tasks, where the gap with the optimal method is significant, suggesting that relying solely on the meta-learning framework cannot effectively overcome the problem of domain bias. Compared to the Baseline, MMD-Only improves its performance in all tasks. Specifically, MMD-Only improves significantly over Baseline in the A → E and B → E tasks but improves limitedly in the C → E task, even close to the Baseline level, reflecting that its explicit alignment relying on statistical distributions is insufficiently adaptive to some of the complex domain differences. Compared to MMD-Only, Adversarial-Only outperforms MMD-Only in the B → E and D → E tasks, suggesting that adversarial learning can dynamically capture the local features of domain differences; however, it is weaker than MMD-Only in the A → E task, probably due to the instability of adversarial training. DAMRN outperforms the other three sets of experiments in all tasks, showing that the simultaneous use of MMD and adversarial domain adaptation can reduce the discrepancy between the source and target domains more efficiently, validating the complementary nature of explicit and implicit alignment. In addition, for both explicit and implicit alignment, we use an example to describe it in
Section 3, paragraph 2.
4.7. Additional Discussion
If the DAMRN model can be successfully promoted and applied in practical industrial environments, it will bring a series of profound and positive impacts on the intelligent diagnosis and maintenance of machines. By utilizing real-time monitoring and warning functions, the downtime and maintenance costs caused by equipment failures can be effectively reduced. In real industrial applications, the computational requirements of the DAMRN model depend mainly on requirements, such as data volume and real-time performance. Taking the experimental hardware used in this study (detailed information is given in
Section 4.2.2) as an example, the average time taken by DAMRN to diagnose a sample is about 2.09 × 10
−3 s in this hardware configuration. Meanwhile, based on the sample acquisition method described in
Section 4.2.1, the time to generate a sample to be tested for the two experimental platforms, CWRU (with a sampling rate of 12 kHz) and PU (with a sampling rate of 64 kHz), is about 8.34 × 10
−3 and 1.56 × 10
−3 s, respectively. Based on these time data, it can be seen that the DAMRN is able to meet the real-time requirements of the CWRU experimental platform; however, in the case of the high sampling rate of 64 kHz in the PU experimental platform, the problem of backlogged data streams will occur. In other words, for those industrial equipment monitoring scenarios with high requirements for both sampling rate and real-time, the performance requirements of the equipment running the DAMRN model will have to be further increased. However, for industrial equipment monitoring scenarios with lower sampling rates and real-time requirements, DAMRN can easily meet their needs. Overall, however, the application of compression techniques to the DAMRN model is necessary to adapt to the needs of different industrial applications better.
In addition, the maintenance of the model is the key to ensuring its long-term reliable operation. In practical applications, it is necessary to regularly monitor model performance, such as prediction accuracy, response time, etc., in order to identify and solve performance degradation problems in a timely manner. At the same time, with the accumulation of data and continuous changes in the business environment, it is necessary to update, optimize, and fine-tune the model based on the continuous learning framework to ensure that it always maintains the best operating state so that it can better adapt to the ever-changing application requirements. It must be clarified that current research focuses on specific fault types and datasets, which is mainly due to the preliminary nature of the research and the scope of available data. However, in order to widely apply the DAMRN model to practical industrial scenarios, it is necessary to conduct a comprehensive and in-depth evaluation of its performance on a wider range of workshop datasets and fault types.