A Domain Generation Diagnosis Framework for Unseen Conditions Based on Adaptive Feature Fusion and Augmentation

Zhang, Tong; Chen, Haowen; Mao, Xianqun; Zhu, Xin; Xu, Lefei

doi:10.3390/math12182865

Open AccessArticle

A Domain Generation Diagnosis Framework for Unseen Conditions Based on Adaptive Feature Fusion and Augmentation

by

Tong Zhang

¹,

Haowen Chen

¹,

Xianqun Mao

^1,*,

Xin Zhu

¹ and

Lefei Xu

²

¹

Marine Design and Research Institute of China, China State Ship-Building Corporation Limited, Shanghai 200011, China

²

School of Traffic & Transportation Engineering, Central South University, Changsha 410004, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(18), 2865; https://doi.org/10.3390/math12182865 (registering DOI)

Submission received: 29 June 2024 / Revised: 31 August 2024 / Accepted: 12 September 2024 / Published: 14 September 2024

(This article belongs to the Topic AI and Data-Driven Advancements in Industry 4.0)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Emerging deep learning-based fault diagnosis methods have advanced in the current industrial scenarios of various working conditions. However, the prerequisite of obtaining target data in advance limits the application of these models to practical engineering scenarios. To address the challenge of fault diagnosis under unseen working conditions, a domain generation framework for unseen conditions fault diagnosis is proposed, which consists of an Adaptive Feature Fusion Domain Generation Network (AFFN) and a Mix-up Augmentation Method (MAM) for both the data and domain spaces. AFFN is utilized to fuse domain-invariant and domain-specific representations to improve the model’s generalization performance. MAM enhances the model’s exploration ability for unseen domain boundaries. The diagnostic framework with AFFN and MAM can effectively learn more discriminative features from multiple source domains to perform different generalization tasks for unseen working loads and machines. The feasibility of the proposed unseen conditions diagnostic framework is validated on the SDUST and PU datasets and achieved peak diagnostic accuracies of 94.15% and 93.27%, respectively.

Keywords:

intelligence fault diagnosis; domain generation; unseen working conditions; feature fusion

MSC:

68T07

1. Introduction

As industry progresses towards informatization and intelligence, advanced intelligent fault diagnosis (IFD) models have garnered more attention [1]. Currently, various IFD models have demonstrated satisfactory performance, helping to reduce unplanned downtime and enable predictive maintenance [2]. These models are highly appreciated for their autonomy and high accuracy, which greatly enhance the efficiency of equipment maintenance and management [3].

However, there can be limitations on the use of intelligent methods for practical industrial applications. In order to achieve a high accuracy of fault diagnosis, these traditional IFD methods often require a large number of labeled data which have the same distribution as the testing data [4,5]. Owing to variations in operational conditions and interference from environmental noise, rotating machines frequently operate under diverse working scenarios, significantly altering the distribution of collected data. Consequently, when confronted with practical industrial issues, there is often an insufficiency of labeled data to fulfill the model’s training requirements [6]. Furthermore, given the high cost of human labor, it is impractical to gather a vast amount of fault samples and generate corresponding ground-truth labels under all potential working conditions for model training [7]. Hence, cross-domain diagnosis poses a significant challenge in developing reliable and precise IFD methods.

Recently, significant efforts have been invested in developing domain adaptation (DA) models that strive to overcome the domain shift issues stemming from diverse conditions and foster the successful deployment of IFD models with limited labeled data. The central concept of DA methodologies revolves around harnessing the discrepancy between the training source domain and the testing target domain by aligning the distributions of fault features in a higher-dimensional space and extracting domain-invariant features for diagnostic purposes [8]. The differences between traditional data-driven fault diagnosis and cross-domain intelligent fault diagnosis are shown in Figure 1a,b [9]. These models aim to generalize the knowledge learned from the limited data collected from a single machine (referred to as the source domain) to enable cross-domain fault diagnosis on varying machines or the same machine operating under differing conditions (designated as the target domain). Lu et al. [10] first applied a model-based learning technique to solve DA issues for fault diagnosis under different working conditions. Shao et al. [11] proposed an adversarial domain adaption method based on deep transfer learning for rolling elements bearing insufficient labeled data issues. Xiong et al. [12] utilized a Wasserstein gradient-penalty generative adversarial network with a deep auto-encoder for intelligent fault diagnosis. Zhao and Shen [13] implemented a local class cluster module to explore the domain-invariant representation space and obtain discriminative representation structures to enhance the model’s generation ability. Xu et al. [14] used Support Vector Data Description (SVDD) as the unknown fault identification method, and optimized the relevant parameters using a Particle Swarm Optimization (PSO) algorithm. Liao et al. [8] constructed a deep semi-supervised domain generalization network (DSDGN) which employed causal learning techniques to acquire causal invariant fault information across different machines. Despite the potential of DA-based IFD models, there persist obstacles in their application to contemporary industrial processes. The core purpose of these models is to minimize the divergence between training and testing data through techniques such as distance metrics and subspace alignment. It is evident that obtaining adequate data from the target domain is crucial for guiding the distribution adaptation process and enabling effective knowledge transfer during model training [15]. However, in many industrial scenarios, acquiring data from target machines or operating conditions prior to their occurrence is not feasible.

Domain generation (DG) is a method of transfer learning which endeavors to generalize a trained model, leveraging a diverse array of source datasets, to accommodate unseen target domains. It focuses on extracting key domain-invariant feature representations from specific source domains that are both discriminative across various classes and resilient to domain shifts, thereby enhancing the robustness and adaptability of data-driven approaches to varying conditions [16]. Domain generation-based models can be categorized as data manipulation [17], representation learning [18], and learning strategies [19]. Li et al. [20] pioneeringly integrated domain generalization into the realm of intelligent fault diagnosis, leveraging adversarial training and metric learning techniques to extract generalized features that transcend specific domains. In a parallel effort, Han, et al. [21] advocated minimizing the triplet loss as a means to foster intra-class cohesion and inter-class discernibility, thereby bolstering the generalization prowess of fault diagnosis models. Chen et al. [22] devised an innovative domain-regressive framework that centered around learning domain-agnostic representations, tailored specifically for rotating machinery fault diagnosis. Zheng et al. [23] innovatively fused prior diagnostic knowledge with a deep domain generalization network, augmenting it with an instance-based discriminative loss. This integration yielded a robust framework capable of performing cross-domain fault diagnosis across disparate bearing datasets, showcasing remarkable adaptability and generalization. Although these DG methods do not require prior knowledge of the target domain, one significant limitation persists which is that DG methods require a massive and varied collection of samples from multiple source domains.

In practical industrial applications, acquiring useful samples from varying working conditions can be challenging, often restricting utilization to data sampled under a single working condition. Meanwhile, the absence of target data during the training phase inherently elevates the likelihood of overfitting to the source data, posing a significant challenge. Consequently, DG-powered IFD models ought to prioritize strategies that enhance the integration of data acquired under diverse operational conditions into a cohesive, unified representation. This approach aims to mitigate the risk of overfitting and enhance the models’ ability to generalize across varying scenarios.

To this end, we proposed a novel domain generation framework for unseen condition fault diagnosis which comprises an Adaptive Feature Fusion Domain Generation Network (AFFN) and a Mix-Up Augmentation Method (MAM). The key of the AFFN and MAM is to learn both domain-specific representation and domain-invariant representation and fuse them dynamically in a unified deep neural network. The main contributions of this study are summarized as follows:

A domain generation-based fault diagnosis framework for unseen conditions is proposed to complement the online health management operations of industrial applications for robust and generalized fault recognition.
A novel domain generation network, the Adaptive Feature Fusion Domain Generation Network (AFFN), is proposed, which can learn both domain-invariant and domain-specific deep representations to enhance the generalization capability of the framework.
A well-designed data Mix-up Augmentation Method (MAM) is integrated into the framework, which fully utilizes the existing data to extend the model’s exploration and learning boundaries to improve the fit of the unseen domain.

The organization of the remainder of this paper is outlined as follows: Section 2 provides an introduction to pertinent techniques and background knowledge. Section 3 delves into the structure and specifics of the proposed diagnostic framework for unseen conditions. Section 4 presents the experimental validation and real-world application in detail. Finally, Section 5 brings this paper to a conclusion.

2. Preliminaries

2.1. Domain Generalization Problem Definition

The unseen domain diagnostic targeted to be solved in this paper can be regarded as a cross-domain classification; with this in mind, the utilization of DG has emerged as a prominent research trend. As depicted in Figure 1c,d, the training of the DG diagnostic model utilizes multiple source domain datasets and involves subsequently applying it directly to target diagnostic tasks. In research manuscripts detailing the deposition of extensive datasets into publicly accessible databases, it is imperative to clearly indicate the repository where the data reside and furnish the corresponding accession numbers. If, at the point of submission, the accession numbers are yet to be secured, authors should make a note of their intention to provide these details during the review process. Ultimately, it is mandatory that these numbers are made available prior to the manuscript’s publication, ensuring transparency and accessibility for the scientific community.

Let

x = \{x_{1}, x_{2}, \dots, x_{n}\}

be a set of observations from the feature space

X

and

y = \{y_{1}, y_{2}, \dots, y_{n}\}

be a set of

n

corresponding class labels from the output space

Y

, with

P_{X Y}

denoting the joint distributions. Herein, the source component is denoted as “

S

” and the target component is donated as “

T

”. The

M

source domain is defined as

D^{S} = {\{D_{m}^{S}\}}_{m = 1}^{M}

, where

D_{m}^{S} = {\{(x_{i}^{S, m}, y_{i}^{S, m})\}}_{i = 1}^{N_{s, m}}

denotes the

m^{t h}

domain and the labeled observations in all source domains are sufficiently available for model training. The joint distribution of the source domain is different, which is

P_{X Y}^{m_{1}} \neq P_{X Y}^{m_{2}}

[6]. Similarly, the unseen and unlabeled target domain in the training stage is denoted as

D^{T} = {\{D_{m}^{T}\}}_{m = 1}^{M}

; apparently, the joint distribution of the target domain is different from any other source domain, namely

P_{X Y}^{T} \neq P_{X Y}^{S, m}

. Hence, the goal of domain generalization is to learn a robust and generalizable predictive function

h : X \to Y

from

M

training source domains to achieve a minimum prediction error on an unseen test domain

D^{T, t e s t}

(i.e.,

D^{T, t e s t}

cannot be accessed in training).

\min_{h} E_{(x, y) \in D^{T, t e s t}} [l (h (X), Y)]

(1)

where

E

is the expectation and

l (\cdot, \cdot)

is the loss function.

2.2. Data Augmentation

It is widely recognized that by increasing the variety and volume of training data, the generalization performance of intelligent models can be enhanced. When dealing with a finite set of training data, data augmentation stands as one of the most prevalent and straightforward methods to enrich the diversity of the training dataset.

When carrying out classification tasks, it is common practice to define the neighborhood or vicinity of a sample as the set encompassing its horizontal reflections or linear transformations. Notably, data augmentation has consistently demonstrated its ability to enhance the generalization capabilities of models. Zhang et al. [24] introduced a mix-up learning methodology that involves combining two observations in a linear fashion, smoothing the transition between decision boundaries. This approach involves training a neural network on convex combinations of pairs of examples and their corresponding labels.

μ (\tilde{x}, \tilde{y}) = \frac{1}{n} \sum_{j}^{n} E_{λ} [δ (\tilde{x} = φ (x_{i}, x_{j}), \tilde{y} = φ (y_{i}, y_{j}))]

(2)

where

φ (a, b) = λ a + (1 - λ) b

, and

λ \sim B e t a (α, α)

for

α \in (0, \infty)

.

δ (\tilde{x}, \tilde{y})

is a Dirac mass centered at

(\tilde{x}, \tilde{y})

. The mix-up hyper-parameter

α

controls the strength of interpolation between feature–target pairs. Also, the mix-up reduces the memorization of corrupt labels, increases the robustness to unseen examples, and stabilizes the training of generalization networks.

3. Methods

3.1. Unseen Conditions Diagnostic Framework

The proposed unseen conditions diagnostic framework is as depicted in Figure 2, and mainly consists of three parts: data augmentation, adaptive feature network training, and online diagnostic application. The utilized MAM method is for data and feature augmentation, including feature augmentation, label fusion, and domain mix-up. The constructed adaptive feature network aims to is to learn both domain-specific representation and domain-invariant representation and fuse them.

3.1.1. Data Augmentation and Fusion

The mix-up method was proposed by Zhang et al. [24] to generate a new feature–label pair based on the weighted sum of features and the corresponding one-hot label vector. However, the origin proposed data augmentation method only takes the data level into consideration, resulting in limited volume and features. In this paper, the mix-up method has been extended to label the space and domain space. Thus, the lack of information on different fault classes and domains is compensated for and the diversity of the category and domain space is increased.

As defined previously, 0

x_{i}

is the observations from the feature space

X

,

y_{i}

is the corresponding fault classes in the label domain

Y

, and

d_{i}

is the domain labels in

D

. For all the samples in the source domain (i.e.,

D = {\{D_{m}^{S}\}}_{m = 1}^{M}

), data augmentation can be realized by the following equations:

\{\begin{matrix} x_{i}^{g} = λ x_{j} + (1 - λ) x_{k} \\ y_{i}^{g} = λ y_{j} + (1 - λ) y_{k} \\ d_{i}^{g} = λ d_{j} + (1 - λ) d_{k} \end{matrix}

(3)

where

λ \sim B e t a (α, α)

for

α \in (0, \infty)

, and the original

x_{j}

and

x_{k}

are randomly selected form the overall set of source domains; moreover, the class label

y_{i}^{g}

and domain label

d_{i}^{g}

are no longer one-hot vectors. The normalized distribution of

B e t a (α, α)

is generally expressed as

B e t a (λ∣ a, b) = \frac{Γ (2 α)}{2 Γ (α)} [λ {(1 - λ)]}^{α - 1}

(4)

In the domain and label fusion process, the empirical distribution is utilized to measure augmentation quality, which can be expressed as

P_{δ} (x, y, d) = \frac{1}{n} \sum_{i = 1}^{n} δ (x = x_{i}, y = y_{i}, d = d_{i})

(5)

where

δ (x = x_{i}, y = y_{i}, d = d_{i})

is the Dirac mass centered at

(x_{i}, y_{i}, d_{i})

, and

n

is the number of all augmented samples. Hence, in order to achieve the goal of label fusion and domain mix-up, the expectation of the extended generalization can be concluded as

\{\begin{matrix} m i n E_{λ} [δ (x_{i}^{g} = φ (x_{j}, x_{k}, λ), y_{i}^{g} = φ (y_{j}, y_{k}, λ))] \\ m a x E_{λ} [δ (x_{i}^{g} = φ (x_{j}, x_{k}, λ), d_{i}^{g} = φ (d_{j}, d_{k}, λ))] \end{matrix}

(6)

where

φ (a, b, λ) = λ a + (1 - λ) b

, and

λ \sim B e t a (α, α)

; the equation above indicates that, at the minimum discrepancy of augmented features for fault features, fault labels can be well fitted, and at the maximum of domain discrepancy for unseen domain exploration, domain boundaries can be fused and extended.

3.1.2. Adaptive Feature Fusion Network

The Adaptive Feature Fusion Network (AFFN) excels in its capacity to concurrently harness both domain-invariant and domain-specific features, marking a pivotal advancement. In contrast, the domain-specific representation learning module delves into capturing the unique nuances and distinguishing attributes of each source domain, thereby augmenting the network’s generalization prowess. Conceptually, the AFFN is meticulously crafted with four distinct yet interconnected modules: the feature extraction module (embodied by orange blocks), the fault classification module (represented by blue blocks), the domain-specific representation learning module (highlighted in yellow), and the domain-invariant learning module (depicted in green), as elegantly illustrated in Figure 2.

The learning objective of AFFN can be expressed as follows:

L = L_{c l s} + β L_{d s r} + γ L_{d i r}

(7)

where

L_{c l s}

is the classification loss,

L_{d s r}

is the loss of domain-specific representation learning, and

L_{d i r}

is the loss of domain-invariant representation learning.

β

and

γ

are the tradeoff hyperparameters.

(1): The feature extraction module serves the purpose of distilling features from the row sensor data. Two CNN layers accompanied by max-pooling operations are employed to extract these features. Notably, these layers are designed to be shared across all training domains, thus minimizing the number of parameters. For classification, the cross-entropy is taken as the classification loss:

$L_{c l s} = - \frac{1}{N} \sum_{i = 1}^{N} y_{i} \log P (y_{i}∣ x_{i})$

(8)

where $N = \sum_{k = 1}^{K}, n_{k}$ is the number of samples from all the training domains.
(2): The domain-specific representation learning module specializes in acquiring features that are tailored to each domain; hence, it is not shared and remains unique to each domain. Following feature extraction, $K$ fully connected (FC) layers are implemented for each domain. Additionally, a weighting function has been devised to integrate the specific information of each domain. Given new test data $x$ , their feature $z$ is formulated as

$z = \sum_{k = 1}^{K} w_{k} f_{k} (f_{e} (x))$

(9)

where $w_{k}$ ( $w_{k} > 0 and \sum_{k = 1}^{K} w_{k} = 1$ ) is the weight on domain $D^{k}$ , indicating the similarity between the data in domain $D^{k}$ and the input data under unseen conditions. $f_{k}$ is the feature learning function of domain $D^{k}$ and $f_{e}$ is the shared feature extraction function.

The combination of

f_{k}

and

f_{e}

guarantees that the initial layers tend to acquire general features, whereas the later layers gravitate toward domain-tailored features. During the training phase, the domain label

d_{k} \equiv k

for each sample is known beforehand, thus allowing for the computation of the domain-specific loss for each domain k to be concluded as

L_{d s r}^{k} = \frac{1}{n_{k}} \sum_{i = 1}^{n_{k}} l (f_{d} (f_{e} (x_{i})), d_{k})

(10)

Then, the total domain-specific loss can be obtained by averaging the losses on all domains:

L_{d s r} = \frac{1}{K \times n_{k}} \sum_{k = 1}^{K} \sum_{i = 1}^{n_{k}} l (f_{d} (f_{e} (x_{i})), d_{k})

(11)

3.2. Training and Application

During training, the data augmentation and fusion method enrich the data, label, and domain spaces. The feature extractor captures lower-level features and feeds them to domain-specific branches. The domain classifier uses these features to fuse domain-specific characteristics and learns domain-invariant representations. These fused features then enable activity classification.

For application, the model parameters are fixed, and domain-specific and invariant representations are derived for an unseen target domain. Without knowing the target domain’s label, the domain classifier’s output indicates the similarity between the target and source domains. The label loss in the source domain and the domain fusion loss are calculated as in every epoch. The optimizer Adam is utilized to optimize the parameters of the feature extractor and, as part of the parameters of domain classifiers, the learning rate is set as 0.001. The explicit steps of adversarial transfer learning are shown in Table 1. The Learning Rate Schedule is set as decreasing by a factor of 0.1 every 20 epochs. The Adam optimizer is used in the neural network optimization process. The Adam optimizer combines the advantages of Adagrad’s goodness in dealing with sparse gradients and RMSprop’s goodness in dealing with non-smooth objectives, and is able to automatically adjust the learning rate, converge faster, and perform better in complex networks. Considering the amount of data in each training batch as a proportion of the overall amount of data, and the savings in training time, the batch size is set as 128 and the training epochs is set as 100.

The complete learning process of the whole framework for unseen faults is summarized in Algorithm 1.

Algorithm 1: MAM-AFFN for unseen fault diagnosis.

Input:

K

source domains

D_{1}^{S}, \dots, D_{K}^{S}

for training, training steps

t

and

β, γ .

Output: Classification results on test domain

D^{T}

.
Data, label, and domain

(X, Y, D | {D_{1}^{S}, \dots, D_{K}^{S}}

) augmented by MAM;
Randomly initialize the model parameters

θ

,

i

= 0;
while

i < t

do
1. Sample a batch

B = {B_{1}, \dots, B_{K}}

from

K

domains;
2. Extract the lower-level features

f_{e} (x)

by the feature extractor;
3. Extract the domain-specific features

f_{k} (f_{e} (x))

by

K

domain-specific fully connected layers;
4. Calculate the domain-specific loss

L_{d s r}

and output the weight for each source branch;
5. Calculate the domain-invariant loss

L_{d i r} .

6. Fuse the domain-specific weight according to Equation (9).
7. Calculate the total loss of AFFN according to Equation (7).
8. Update the model parameter

θ

, using

A d a m W

.
end while
Application on the target data of unseen domain

D^{T}

.
return Classification results of unseen domain

D^{T}

.

4. Experiments

Without compromising on generality, we delve into the challenging yet crucial task of fault diagnosis for rotating bearings, where the fault characteristics exhibit periodicity and regularity. To assess the effectiveness of the general framework, we undertook two experimental case studies leveraging test rigs from SDUST [25] and the Paderborn University bearing dataset (PU dataset) [26] for bearing fault diagnosis, as shown in Figure 3. Within these datasets, we designated a specific working condition or machine as the unseen target domain, serving as the benchmark for evaluating the model’s performance. The normalized method was utilized for data preprocessing, which keeps the data size within the same interval to ensure effective convergence of the model. The number of each dataset of training categories was controlled as 150 for data balancing. Meanwhile, the data pertaining to the remaining conditions were utilized for model training. The first scenario involved a constant domain shift where the rig operated at varying but constant speeds, whereas the second scenario scrutinized a constant domain shift where the rig functioned under numerous conditions with varying rotational speeds and loads. These experiments validated the efficacy of the MAM-AFFN method, which employs domain generalization principles, for demanding industrial applications.

4.1. Experiment Setting

4.1.1. Description of SDUST Dataset and Case 1

The SDUST dataset comprised a motor, a shaft coupling, a rotor, a testing bearing, a gearbox, and a brake. Specifically, the bearing type used was N205EU (manufactured by SKF Co., Goteborg, Sweden), and data were gathered under four distinct health conditions: normal (NOR), inner ring fault (I), rolling element fault (B), and outer ring fault (O). Additionally, four varying working conditions were examined at speeds of 1000, 1500, 2000, and 2500 rpm. The experimental setup included four diagnostic scenarios for unseen working conditions across domains, namely T1000, T1500, T2000, and T2500, as shown in Table 1.

4.1.2. Description of PU Dataset and Case 2

The PU dataset’s test rig was a comprehensive assembly, primarily encompassing a motor, a torque measurement shaft, a dedicated bearing test module, a flywheel, and a load motor. This setup facilitates the evaluation of seven distinct health conditions, including the baseline normal (N) state, three levels of inner-race faults (IF1, IF2, IF3) that progressively escalate in severity, two grades of outer-race faults (OF1, OF2) showcasing varying degrees of damage, and a complex compound fault (CF) scenario, which integrates both inner and outer race faults (IF and OF).

Bearings exhibiting genuine damage were sourced from an accelerated lifetime testing procedure, ensuring the authenticity of fault conditions. Vibration signals were meticulously captured under four diverse operational scenarios, each characterized by specific rotational frequencies (Hz), load torques (Nm), and radial forces (N). The data acquisition was performed at a high sampling frequency of 64 kHz, ensuring precision and accuracy in capturing the dynamic behavior of the bearings. These operational scenarios collectively defined four distinct domains: P1, P2, P3, and P4, leading to four diagnosis cases, as outlined in Table 2. Each category in unseen working conditions comprises 2000 samples.

4.1.3. Optimization and Compared Methods Description

The optimization methodology was tailored to maximize performance on the validation set during training. For deep generalization (DG) tasks, we employed a 10-fold cross-validation approach, dividing the data into 70% training and 30% validation subsets randomly. Balancing computational efficiency and resource utilization, we adopted a batch size of 128. To combat overfitting, we incorporated an early stopping criterion with a patience threshold of 10 epochs. Our training routine limited the maximum number of epochs to 50, leveraging Stochastic Gradient Descent (SGD) with an initial learning rate (LR) of 0.001. A dropout rate of 0.5 was employed for regularization. Furthermore, to guarantee model convergence, we employed an exponentially decaying learning rate schedule, commencing at 0.001 and adaptively adjusting as training progressed.

lr = initial_lr \cdot {(ε \cdot e p o c h + 1)}^{γ}

(12)

where

i n i t i a l_l r

is the initial learning rate, epoch is the completed training epochs.

ε

and γ are two parameters that control the rate of LR’s change.

For comparative analysis, several cutting-edge techniques for domain generalization (DG) were evaluated, encompassing MLDG [27], ANDMask [25], and CORAL [16]. Additionally, drawing inspiration from Matsuura and Harada [28], we devised a deep model incorporating adversarial and entropy losses, referred to as DAE, and another deep model without generalization capabilities, labeled DeepNG. The integration of Mixup and DAE within an MDAE model aimed to demonstrate the effectiveness of the domain-based discrepancy metric module.

4.2. Experimental Results

4.2.1. Performance Comparisons

The experimental results of the proposed method and the comparison methods under eight tasks in Case 1 (SDUST dataset) and Case 2 (PU dataset) are shown in Figure 4. The details are further listed in Table 3.

A few conclusions that can be drawn from the analysis are as follows: Firstly, the baseline DeepNG approach achieved significantly lower average test accuracies of 76.46% and 80.63% in the two scenarios, respectively. This suggests that there was interference among the data distributions during training, which negatively impacted the model’s generalization capabilities. Moreover, DG methods employing MLDG, ANDMask, and CORAL demonstrated improvements over the basic DAE and MADAE models. Specifically, in Case 1, they achieved average accuracy gains of 1.57%, 12.06%, and 10.05%, respectively, while in Case 2, the gains were 4.04%, 5.27%, and 3.49%. These methods were designed to mitigate distributional discrepancies between source domains and learn representations that are invariant to domain shifts. Finally, the proposed method exhibited the best diagnostic performance in nearly all DG tasks, achieving peak average accuracies of 94.15% and 93.27% in both experimental cases.

Furthermore, the model demonstrated superior stability compared to other comparison methods in most tasks. Figure 4 illustrates the classification accuracy of various diagnostic tasks for Case 1 and Case 2, providing a visual aid for comparing the diagnostic results. Figure 5 illustrates the confusion matrix of proposed methods in four diagnostic tasks of Case 1. A plausible explanation for this finding is that the data distribution without loads differs significantly from that with loads, thereby exacerbating the issue of distribution shift. Furthermore, the proposed MAM model surpasses other models in terms of both average accuracy across all tasks. In essence, the empirical findings validate the viability and efficacy of the proposed model.

To mitigate the potential bias stemming from the unique characteristics of training samples, the final analysis presents the mean and variance of five independent experiments. As evident from the graphical representation, our proposed method consistently outperforms others in diagnosing the seven distinct bearing failure scenarios, underscoring its effectiveness and robustness.

4.2.2. Feature Visualization and Analysis

For an intuitive and detailed comparison, the t-SNE technique [29,30] was adopted on one trial of Task T1. Four key methodologies—DeepNG, DAE, ANDMask, and CORAL—were selected for a visual assessment of their feature representations. As depicted in Figure 6a–e, the visualizations reveal distinct insights. DeepNG (Figure 6a) exhibits weak inter-class separation, indicating a potential for misdiagnosis due to category confusion. Both DAE (Figure 6b) and ANDMask (Figure 6c) show improved inter-class separation but struggle with intra-class clustering, evidenced by fragmented clusters within the same class across domains. This can lead to misclassification near decision boundaries. In contrast, our proposed approach exhibits remarkable harmony in feature distribution, significantly enhancing both inter- and intra-class coherence. This improvement can be attributed to two possibilities: significant variations between Task T1 and other conditions, or the subtle characteristics of I04 and R2 samples in T1, making them prone to misclassification.

Drawing from the aforementioned analysis, we propose integrating fault mechanism and signal processing expertise as a promising avenue to further refine our approach, a direction we aim to explore in our forthcoming endeavors. Notably, our current method has demonstrated its prowess in extracting domain-invariant feature representations across diverse distributions, elucidating its exceptional generalization capabilities from Table 1.

4.2.3. Hyperparameter Sensitivity Analysis

The performance of the proposed MAM-AFFN model depends on the hyperparameters

β

and

γ

in the combined loss function as shown in Equation (7). These parameters strike a delicate balance between domain-specific and domain-invariant modules, pivotal for maintaining diagnostic accuracy amidst intricate engineering scenarios. Given the substantial data volume, we resorted to the grid search approach for parameter optimization to search

β

and

γ

from {0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5,10,50} and {0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, 500}, respectively; the values of

γ

selected were ten times those of

β

due to domain-invariant representation being more ambiguous and difficult to access. The weight coefficients were chosen from the 100 possible combinations of the two sets, utilizing the accuracy index as the benchmark to identify the optimal pairing. We provide a detailed account of this process using the SDUST dataset as an example.

Figure 7 illustrates the MAM-AFFN’s accuracy across varying hyperparameter combinations. Notably, the accuracy experiences significant fluctuations with alterations in

γ

, while

β

exerts a more subtle and consistent effect. This outcome suggests that the domain-based discrepancy metric module associated with

γ

holds a pivotal role in the framework, outweighing the influence of the module corresponding to

β

. Therefore, we adopted the optimal combination (

β

,

γ

) = (5, 300) in this experiment for the relatively best results.

The hyperparameter

α

in the data augmentation module regulates the extent of interpolation between original feature vector pairs. The coefficient

λ

, derived from the Beta (

α

,

α

) distribution, as shown in Equation (4), remains constant for each minibatch, after which the mix-up method is implemented on the shuffled minibatch. Consequently, we explored the influence of gradually varying α from 0.1 to 1 on cross-domain diagnostic performance. For comparison, we also considered fixed λ values ranging from 0.1 to 1. Figure 8 depicts the findings on the SDUST dataset, revealing a superior average accuracy achieved by the ten value settings in the fixed α group compared to the fixed λ group. This outcome is attributed to the fact that diverse λ values sampled from the Beta (α, α) distribution with a constant α enhance the variability of the original data, subsequently enhancing the model’s generalization capabilities. As a result, α = 0.7, which yielded the highest accuracy, was selected for this study.

5. Conclusions

In this paper, a new DG model, MAM-AFFN, is proposed to solve the problem of poor domain generalization ability under unseen working conditions in extreme real industrial applications. A domain generation framework is introduced for fault diagnosis in unseen conditions, encompassing an Adaptive Feature Fusion Domain Generation Network (AFFN) and a Mix-up Augmentation Method (MAM) that applies to both data and domain spaces. The framework incorporates thoughtfully designed modules for data augmentation, domain discriminators, and domain-based discrepancy metrics, resulting in a well-integrated domain generation diagnosis application for unseen domain conditions. The feasibility of MAM-AFFN was successfully demonstrated through a range of fault diagnosis generalization tasks conducted on two publicly available datasets as well as a practical real-world scenario. Based on the comprehensive experimental results, several significant conclusions can be inferred:

(1): A thoughtfully crafted data Mix-up Augmentation Method (MAM) is integrated within the framework, effectively leveraging the available data to broaden the model’s exploration and learning horizons, ultimately enhancing its adaptability to unseen domains.
(2): The comparison results with the six other methods prove the superiority of MAM-AFFN for extracting both domain-invariant and domain-specific deep representations to enhance the generalization capability of the unseen fault diagnosis.
(3): The experimental results in eight cases of bearing diagnosis experiments from two bearings public datasets demonstrate the reliability and generalization of the proposed method, as the proposed MAM-AFFN produces higher classification accuracy compared with the traditional model and some other state-of-the-art DG-based IFD methods.

In essence, the proposed MAM-AFFN model adeptly addresses domain shift challenges stemming from varying workloads and even disparate machines, accurately executing IFD tasks under unseen conditions. Furthermore, the model’s flexibility allows operators to customize adjustable thresholds, achieving a practical balance between performance and computing resources for real-world applications.

Author Contributions

Conceptualization, T.Z. and L.X.; methodology, L.X.; software, X.Z. and X.M.; validation, T.Z. and L.X.; formal analysis, H.C.; investigation, X.M. and H.C.; data curation, X.M.; writing—original draft preparation, L.X. and X.M.; writing—review and editing, L.X.; visualization, T.Z. and L.X.; supervision, H.C.; project administration, T.Z.; funding acquisition, X.M. and T.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Defense Basic Scientific Research Program grant number 2022601C009.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Tong Zhang, Haowen Chen, Xianqun Mao and Xin Zhu were employed by the China State Ship-Building Corporation Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Yu, X.L.; Zhao, Z.; Zhang, X.; Zhang, Q.; Yilong, L.; Sun, C.; Chen, X. Deep-Learning-Based Open Set Fault Diagnosis by Extreme Value Theory. IEEE Trans. Ind. Inform. 2022, 18, 185–196. [Google Scholar] [CrossRef]
Lu, S.X.; Gao, Z.; Xu, Q.; Jiang, C.; Zhang, A.; Wang, X. Class-Imbalance Privacy-Preserving Federated Learning for Decentralized Fault Diagnosis with Biometric Authentication. IEEE Trans. Ind. Inform. 2022, 18, 9101–9111. [Google Scholar] [CrossRef]
Li, X.; Zhang, W.; Xu, N.X.; Ding, Q. Deep Learning-Based Machinery Fault Diagnostics with Domain Adaptation Across Sensors at Different Places. IEEE Trans. Ind. Electron. 2020, 6, 6785–6794. [Google Scholar] [CrossRef]
Li, Q.; Shen, C.; Chen, L.; Zhu, Z. Knowledge mapping-based adversarial domain adaptation: A novel fault diagnosis method with high generalizability under variable working conditions. Mech. Syst. Signal Process. 2021, 147, 107095. [Google Scholar] [CrossRef]
Fan, Z.H.; Xu, Q.; Jiang, C.; Ding, S.X. Weighted quantile discrepancy-based deep domain adaptation network for intelligent fault diagnosis. Knowl.-Based Syst. 2022, 240, 13. [Google Scholar] [CrossRef]
Wang, J.D.; Lan, C.; Liu, C.; Ouyang, Y.; Qin, T.; Lu, W.; Chen, Y.; Zeng, W.; Yu, S.P. Generalizing to Unseen Domains: A Survey on Domain Generalization. IEEE Trans. Knowl. Data Eng. 2023, 35, 8052–8072. [Google Scholar]
Xie, J.S.; Guo, Z.; Wang, T.; Jinsong, Y. A diagnostic framework with a novel simulation data augmentation method for rail damages based on transfer learning. Struct. Health Monit. Int. J. 2023, 22, 3437–3450. [Google Scholar] [CrossRef]
Liao, Y.X.; Huang, R.; Jipu, L.; Chen, Z.; Li, W. Deep Semisupervised Domain Generalization Network for Rotary Machinery Fault Diagnosis Under Variable Speed. IEEE Trans. Instrum. Meas. 2020, 69, 8064–8075. [Google Scholar]
Zhao, C.; Zio, E.; Shen, W.M. Domain generalization for cross-domain fault diagnosis: An application-oriented perspective and a benchmark study. Reliab. Eng. Syst. Saf. 2024, 245, 18. [Google Scholar] [CrossRef]
Lu, W.N.; Liang, B.; Cheng, Y.; Meng, D.; Yang, J.; Zhang, T. Deep Model Based Domain Adaptation for Fault Diagnosis. IEEE Trans. Ind. Electron. 2017, 64, 2296–2305. [Google Scholar] [CrossRef]
Shao, J.J.; Huang, Z.W.; Zhu, J.M. Transfer Learning Method Based on Adversarial Domain Adaption for Bearing Fault Diagnosis. IEEE Access 2020, 8, 119421–119430. [Google Scholar] [CrossRef]
Xiong, X.; Hongkai, J.; Xingqiu, L.; Niu, M. A Wasserstein gradient-penalty generative adversarial network with deep auto-encoder for bearing intelligent fault diagnosis. Meas. Sci. Technol. 2020, 31, 26. [Google Scholar] [CrossRef]
Zhao, C.; Shen, W.M. Adaptive open set domain generalization network: Learning to diagnose unknown faults under unknown working conditions. Reliab. Eng. Syst. Saf. 2022, 226, 12. [Google Scholar] [CrossRef]
Xu, E.R.; Li, Y.; Peng, L.; Mingshun, Y.; Liu, Y. An unknown fault identification method based on PSO-SVDD in the IoT environment. Alex. Eng. J. 2021, 60, 4047–4056. [Google Scholar] [CrossRef]
Fan, Z.H.; Xu, Q.; Jiang, C.; Ding, S.X. Deep Mixed Domain Generalization Network for Intelligent Fault Diagnosis Under Unseen Conditions. IEEE Trans. Ind. Electron. 2024, 71, 965–974. [Google Scholar] [CrossRef]
Zhao, C.; Shen, W.M. Adversarial Mutual Information-Guided Single Domain Generalization Network for Intelligent Fault Diagnosis. IEEE Trans. Ind. Inform. 2023, 19, 2909–2918. [Google Scholar] [CrossRef]
Peng, X.; Qiao, F.C.; Zhao, L. Out-of-Domain Generalization from a Single Source: An Uncertainty Quantification Approach. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 1775–1787. [Google Scholar] [CrossRef] [PubMed]
Zhao, Q.; Yu, W.T.; Ji, T.Y. Style Elimination and Information Restitution for generalizable person re-identification. J. Vis. Commun. Image Represent. 2024, 98, 11. [Google Scholar] [CrossRef]
Zhao, Y.Q.; Cheung, N.M. FS-BAN: Born-Again Networks for Domain Generalization Few-Shot Classification. IEEE Trans. Image Process. 2023, 32, 2252–2266. [Google Scholar] [CrossRef]
Guo, Z.; Wang, T.; Xie, J.; Yang, Y.; Peng, Q. A Deep Transfer Learning-Based Open Scenario Diagnostic Framework for Rail Damage Using Ultrasound Guided Waves. IEEE Trans. Instrum. Meas. 2024, 73, 1–17. [Google Scholar] [CrossRef]
Han, T.; Li, Y.F.; Qian, M. A Hybrid Generalization Network for Intelligent Fault Diagnosis of Rotating Machinery Under Unseen Working Conditions. IEEE Trans. Instrum. Meas. 2021, 70, 3520011. [Google Scholar] [CrossRef]
Chen, L.; Li, Q.; Shen, C.; Zhu, J.; Wang, D.; Xia, M. Adversarial Domain-Invariant Generalization: A Generic Domain-Regressive Framework for Bearing Fault Diagnosis Under Unseen Conditions. IEEE Trans. Ind. Inform. 2022, 18, 1790–1800. [Google Scholar] [CrossRef]
Zheng, H.L.; Yang, Y.; Xu, M. Deep Domain Generalization Combining a priori Diagnosis Knowledge Toward Cross-Domain Fault Diagnosis of Rolling Bearing. IEEE Trans. Instrum. Meas. 2021, 70, 11. [Google Scholar] [CrossRef]
Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez Paz, D. Mixup Beyond Empirical Risk Minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar] [CrossRef]
Jia, S.X.; Wang, J.; Han, B.; Zhang, G.; Wang, X.; He, J. A Novel Transfer Learning Method for Fault Diagnosis Using Maximum Classifier Discrepancy with Marginal Probability Distribution Adaptation. IEEE Access 2020, 8, 71475–71485. [Google Scholar] [CrossRef]
Li, X.; Zhang, W.; Ding, Q.; Sun, J.Q. Intelligent rotating machinery fault diagnosis based on deep learning using data augmentation. J. Intell. Manuf. 2020, 31, 433–452. [Google Scholar] [CrossRef]
Chen, K.Y.; Zhuang, D.; Chang, J.M. Discriminative adversarial domain generalization with meta-learning based cross-domain validation. Neurocomputing 2022, 467, 418–426. [Google Scholar] [CrossRef]
Matsuura, T.; Harada, T. Domain Generalization Using a Mixture of Multiple Latent Domains. arxiv 2019, arXiv:1911.07661. [Google Scholar] [CrossRef]
Van der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Yang, B.; Wang, T.; Xie, J.; Yang, J. Deep Adversarial Hybrid Domain-Adaptation Network for Varying Working Conditions Fault Diagnosis of High-Speed Train Bogie. IEEE Trans. Instrum. Meas. 2023, 72, 1–10. [Google Scholar] [CrossRef]

Figure 1. Illustration of (a) traditional data-driven fault diagnosis and (b) cross-domain intelligent fault diagnosis. The values inside the symbols denote the class label for each sample. (c) Domain adaption problem definition and (d) domain generation problem definition.

Figure 2. Structure of the proposed unseen conditions diagnostic framework.

Figure 3. Illustration of (a) the SDUST experimental platform and (b) the PU bearing test-rig.

Figure 4. Classification accuracy of the different diagnostic tasks of Case 1 and Case 2.

Figure 5. The diagnostic accuracies under four unseen working conditions in Case 1 utilizing the proposed method.

Figure 6. Visualization of the t-SNE embeddings under unseen target working conditions: (a) DeepNG, (b) DAE, (c) ANDMask, (d) CORAL, (e) the proposed method.

Figure 7. Diagnosis accuracy obtained using different values of the hyperparameters

γ

and

β

.

Figure 7. Diagnosis accuracy obtained using different values of the hyperparameters

γ

and

β

.

Figure 8. The strength of interpolation between data pairs significantly impacts the diagnosis performance.

Table 1. Case 1: diagnostic cases from the SDUST dataset.

Diagnostic Cases	Seen Domain Condition	Unseen Domain Condition
T1	1500 r/min, 2000 r/min, 2500 r/min	1000 r/min
T2	1000 r/min, 2000 r/min, 2500 r/min	1500 r/min
T3	1000 r/min, 1500 r/min, 2500 r/min	2000 r/min
T4	1000 r/min, 1500 r/min, 2000 r/min	2500 r/min

Table 2. Case 2: diagnostic cases from the PU dataset.

Domain	Rotational Frequency	Load	Cases	Seen Domain	Unseen Domain
p1	25 Hz	1000 N	R1	p2, p3, p4	p1
p2	15 Hz	400 N	R2	p1, p3, p4	p2
p3	15 Hz	1000 N	R3	p1, p2, p4	p3
p4	25 Hz	400 N	R4	p1, p2, p3	p4

Table 3. Diagnostic accuracy (%) of Case 1 and Case 2.

	Tasks	DeepNG	DAE	MLDG	ANDMask	CORAL	MADAE	Proposed
Case1	T1	59.73 ± 1.43	59.85 ± 2.48	65.8 ± 3.05	66.51 ± 2.2	73.75 ± 1.91	73.27 ± 3.64	93.03 ± 2.17
	T2	84.65 ± 2.72	83.14 ± 1.28	94.95 ± 2.9	94.81 ± 3.27	95.19 ± 1.52	94.48 ± 2.86	95.6 ± 2.64
	T3	86.28 ± 2.47	89.51 ± 3.21	96.28 ± 1.98	96.76 ± 1.8	98.35 ± 0.84	97.27 ± 0.74	98.06 ± 1.42
	T4	83.19 ± 3.89	83.62 ± 4.67	84.98 ± 6.26	83.96 ± 3.6	90.42 ± 0.76	86.06 ± 2.49	91.37 ± 2.94
	Average	76.46 ± 2.63	78.03 ± 4.66	88.52 ± 3.55	86.51 ± 2.72	88.42 ± 1.26	88.77 ± 2.93	94.15 ± 2.54
Case2	R1	75.92 ± 1.38	92.75 ± 0.69	90.26 ± 2.87	85.86 ± 1.59	90.61 ± 2.17	91.73 ± 1.26	97.6 ± 0.92
	R2	83.26 ± 2.42	72.64 ± 3.57	70.19 ± 4.15	82.47 ± 3.73	83.74 ± 3.33	77.54 ± 3.89	92.36 ± 3.02
	R3	87.33 ± 3.07	90.59 ± 2.26	92.57 ± 3.42	93.52 ± 1.24	84.98 ± 0.94	96.02 ± 1.96	97.29 ± 1.55
	R4	72.04 ± 1.53	78.73 ± 2.59	60.61 ± 4.11	65.73 ± 3.97	86.17 ± 3.12	79.39 ± 2.78	89.81 ± 3.31
	Average	80.63 ± 2.10	84.67 ± 2.27	79.41 ± 3.64	85.89 ± 2.63	87.38 ± 2.39	87.17± 2.47	93.27 ± 2.21

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, T.; Chen, H.; Mao, X.; Zhu, X.; Xu, L. A Domain Generation Diagnosis Framework for Unseen Conditions Based on Adaptive Feature Fusion and Augmentation. Mathematics 2024, 12, 2865. https://doi.org/10.3390/math12182865

AMA Style

Zhang T, Chen H, Mao X, Zhu X, Xu L. A Domain Generation Diagnosis Framework for Unseen Conditions Based on Adaptive Feature Fusion and Augmentation. Mathematics. 2024; 12(18):2865. https://doi.org/10.3390/math12182865

Chicago/Turabian Style

Zhang, Tong, Haowen Chen, Xianqun Mao, Xin Zhu, and Lefei Xu. 2024. "A Domain Generation Diagnosis Framework for Unseen Conditions Based on Adaptive Feature Fusion and Augmentation" Mathematics 12, no. 18: 2865. https://doi.org/10.3390/math12182865

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Domain Generation Diagnosis Framework for Unseen Conditions Based on Adaptive Feature Fusion and Augmentation

Abstract

1. Introduction

2. Preliminaries

2.1. Domain Generalization Problem Definition

2.2. Data Augmentation

3. Methods

3.1. Unseen Conditions Diagnostic Framework

3.1.1. Data Augmentation and Fusion

3.1.2. Adaptive Feature Fusion Network

3.2. Training and Application

4. Experiments

4.1. Experiment Setting

4.1.1. Description of SDUST Dataset and Case 1

4.1.2. Description of PU Dataset and Case 2

4.1.3. Optimization and Compared Methods Description

4.2. Experimental Results

4.2.1. Performance Comparisons

4.2.2. Feature Visualization and Analysis

4.2.3. Hyperparameter Sensitivity Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI