FSN: Feature Shift Network for Load-Domain (LD) Domain Generalization

Chen, Heng; Zhao, Erkang; Jia, Yunpeng; Shi, Lei

doi:10.3390/app14125204

Open AccessArticle

FSN: Feature Shift Network for Load-Domain (LD) Domain Generalization

School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(12), 5204; https://doi.org/10.3390/app14125204

Submission received: 13 May 2024 / Revised: 7 June 2024 / Accepted: 13 June 2024 / Published: 14 June 2024

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Conventional deep learning methods for fault detection often assume that the training and the testing sets share the same fault domain spaces. However, some fault patterns are rare, and many real-world faults have not appeared in the training set. As a result, it is hard for the trained model to achieve desirable performance on the testing set. In this paper, we introduce a novel domain generalization, Load-Domain (LD) domain generalization, which is based on the analysis of the Case Western Reserve University (CWRU) bearing dataset and takes advantage of the physical information of this dataset. For this scenario, we propose a feature shift model called Feature Shift Network (FSN). FSN is trained for feature shift on adjacent source domains and finally shifts target domain features into adjacent source domain feature space to achieve the purpose of domain generalization. Furthermore, through the hybrid classification method, the generalization performance of the model on unseen target domains is effectively improved. The results on the CWRU bearing dataset demonstrate that FSN is better than the existing models in the LD domain generalization. Furthermore, we have another test on the rotated MNIST, which also shows FSN can achieve the best performance.

Keywords:

fault diagnosis; domain generalization; deep learning

1. Introduction

The complexity of machinery has been increasing over the past few years, and the traditional methods of mechanical fault detection cannot keep up with the level of automation of machinery. This paper focuses on exploring the mechanical faults in rotating machinery, and bearing failure detection is the primary focus of research on real-time rotating machinery detection.

Deep learning is undoubtedly the most widely used method for data-driven fault diagnosis. Deep neural network algorithms have achieved remarkable success in a variety of fields since the theory of deep learning was established [1]. Although it is typically assumed that the training set and the testing set share the same fault pattern space while training neural networks, this simply means that the feature data used for training contains all potential operating conditions. Despite our greatest efforts in dataset collecting, there are still undiscovered faults in actual industrial manufacturing. Models trained on existing operating conditions may not necessarily perform well in diagnosing faults in different operating conditions [2]. To address the issue of real-world operating conditions that may not be represented in the training set, this study incorporates domain generalization into the fault diagnosis model and improves on previously used adversarial domain generalization methods.

In genuine industrial manufacturing, fault diagnosis is a major challenge. After more than 60 years of development, there are many mature methods for mechanical fault diagnosis thanks to the establishment of numerous international research facilities in the 1960s and beyond.

Domain generalization has become one of the current research hotspots in the field of deep learning in order to enhance the generalization capability of fault diagnosis models on datasets with unknown working conditions.

Cui et al. used traditional methods such as the seagull optimization algorithm (SOA) to determine the system parameters [3] and input the signal into their proposed first-order multistable stochastic resonance system (CMSR) and then used Fourier transform on the output and obtained the fault diagnosis results. Video methods mainly include wavelet transform [4,5], short-time Fourier transform [6,7], and other methods. Based on the Case Western Reserve bearing data set with Gaussian noise, Chen et al. organized its data points into grayscale images [8] and then used wavelet transform to process and input them into the neural network with Average Row Pooling (ARP) to obtain classification information and achieved a good anti-noise effect.

Traditional fault diagnosis methods require not only manual feature extraction but also specific professional knowledge, so the methods are not adaptable and robust. Machine-learning algorithms can automatically generate method models through iteration based on selected features, which greatly reduces labor costs. For example, Support Vector Machine (SVM) is a supervised learning method [9]. In classification, the dimension of the classification vector does not affect the performance of the model classification, which makes the support vector machine better at generalization in the domain of fault diagnosis, so SVM is one of the important tools for fault diagnosis. Fernandez-Francos et al. input the vibration fault features extracted by the envelope spectrum into the support vector machine to classify the health data and fault data [10] and then perform fault diagnosis on the fault data to determine the fault mode. Aishwarya and Brisilla used various machine-learning techniques (like SVM, K-nearest neighbors (k-NN), ML perceptron (MLP), Random Forest (RF), Decision Tree (DT), etc.) to implement a fault detection strategy in the designed induction motors under variable load conditions [11]. Dutta et al. present a case study of a machine-learning (ML)-based computational technique for automatic fault detection in a cascade pumping system based on variable frequency drive [12]. These studies have improved the correct rate of fault diagnosis.

Although traditional machine-learning methods can be trained automatically, features still need to be extracted manually, which may have the problem of incomplete feature selection and slow speed. Deep learning has solved this shortcoming completely and has become the most popular tool in the domain of fault diagnosis. The first one to detect bearing faults using convolutional neural networks (CNN) was published in 2016 [13], and since then, papers using CNN for bearing fault diagnosis have continued to appear. A lot of work has been performed in the domain of fault diagnosis, which provides new ideas for using CNN for fault diagnosis [14,15,16].

A recurrent neural network (RNN) is suitable for processing sequence data, so it can be used for fault diagnosis. In 2015, Abed et al. applied recurrent neural networks to fault diagnosis [17]. In their paper, firstly, a discrete wavelet transform was used to extract features, then the features were analyzed and selected, and finally, an RNN was used for classification.

Xie and Zhang proposed an adversarial fault diagnosis method based on confrontation [18]. In the model, an adversarial network was used to balance unbalanced signals. Experiments show that the classification performance of this method is better than other data balance methods in the case of imbalanced data sets.

Domain generalization belongs to transfer learning, which is a data-based method that mainly maps data belonging to different domains using a specific method, calculates the difference in distribution between data in different domains after mapping, and reduces the difference between domains after mapping by training the mapping method so as to find a mapping method that minimizes the difference in the distribution of different domains to achieve domain generalization.

The domain generalization problem was first introduced as a machine-learning problem by Blanchard et al. to solve cell classification problems in medicine [19] and has since been applied to transfer models between multiple domains. Muandet et al. used a complex nonlinear neural network to minimize the variance between the source and target domain features [20], and Li et al. made the source and target domain feature distributions more similar by minimizing the MMD distance between the source and target domain feature distributions and using adversarial learning [21]. Without using distance methods, Motiian et al. achieved the domain generalization task using minimized contrastive loss [22]. In autopilot scenarios, the model does not change according to real-world domain changes, which hinders the generalization of object detection across different real-world domains, and normalization perturbations are proposed to cope with classification domain generalization [23]. Bai et al. proposed a temporal domain generalization with drift-aware dynamic neural network to learn models under temporally changing data distributions and generalize to unseen data distributions following the trends of the change [24].

In order to achieve the goal of domain generalization, this paper introduces a special scenario for domain generalization called Load-Domain(LD) domain generalization and proposes a feature shift model (FSN) for it. LD domain means domains classified by their load (an attribution in the CWRU bearing dataset), so the domain labels have some physical information (actual horsepower of the bearing) we can use. FSN is designed for this scenario, where features are shifted from the target domain into the source domain to achieve domain generalization. In addition, the fully connected classifier and the relational classifier enhance the generalization ability of the feature extractor of FSN and achieve high accuracy on unseen target domains. In conclusion, our contributions:

Section 1 analyses the use of traditional methods at home and abroad, i.e., the machine-learning method, deep learning approach, and generalization method for fault diagnosis in the domain of the research status.
Section 2 and Section 3 introduce a special domain generalization scenario called Load-Domain domain generalization, also known as LD domain generalization, and proposes FSN for the LD domain and a mixed classifier (both a fully connected and relational classifier) to improve the generalization performance.
In Section 4, experiments are conducted on the CWRU bearing dataset for both the mixed classifier and FSN. We compared FSN to classical fault diagnosis and domain generalization methods, and the findings show that the proposed method outperforms both. Additionally, experiments are performed on the rotated MNIST dataset to evaluate several domain generalization methods. The comparison shows that FSN, which is the solution suggested in this paper, performs the best in the LD scenarios.

2. LD Domain and Problem Definition

We start by introducing some of the symbols used to describe problems. The input space should be X, and the label space should be Y. Then, a domain can be introduced as a joint distribution

P_{X Y}

in

X \times Y

. For a specific domain

P_{X Y}

,

P_{X}

presents marginal distribution in X,

P_{Y| X}

denotes the posterior distribution of Y given X, and

P_{X| Y}

denotes the class-conditional distribution of X given Y.

In the scenario described in this section, we can obtain K similar but different source domains S, where each source domain S corresponds to a joint distribution

P_{X Y}^{(k)}

.

S

ought to be the source domain space, i.e.,

S

= {\{S_{k} = \{(x^{(k)}, y^{(k)})\}\}}_{k = 1}^{K}

. It should be noted that for

k \neq k^{'}

and

k, k^{'} \in \{0, \dots, K\}

in the source domain, we have

P_{X Y}^{(k)} \neq P_{X Y}^{(k^{'})}

. For the scenario described in this section, the goal is to obtain a predictive model

f : X \to Y

on the source domain data and to achieve the minimum prediction error on the unknown target domain data

T = \{x^{T}\}

. The joint distribution corresponding to the target domain is denoted as

P_{X Y}^{T}

, and similarly,

P_{X Y}^{T} \neq P_{X Y}^{(k)}

,

\forall k \in \{0, \dots, K\}

.

The most important difference between the fault diagnosis scenario described in this paper and the common domain generalization situation is that for

\forall k \in \{0, \dots, K\}

, the domain number

1 \dots k

has physical meaning (actual horsepower of the bearing). In the example shown in Figure 1, the domain transfer dataset used in Figure 1a is a collection of images in different styles, and the domain labels contain no information, only a label. However, for the CWRU bearing dataset shown in Figure 1b, the domain labels in the bearing dataset represent the rotor load of the bearing during operation. Therefore, we can obtain some additional information from the domain number of the source domain data and the target domain data and use it in the domain generalization. As a result, we introduce a new domain generalization scenario called Load-Domain(LD) domain generalization to describe this special scenario.

3. Feature Shift Networks

In Section 2, a specific domain generalization scenario is introduced. In this section, we will propose a feature shift model based on the inter-domain relationship for the LD domain generalization scenario. When the model is trained in the source domain, it can perform well in fault diagnosis on unknown domains with a specific relationship to the source. We refer to this model as feature shift networks (FSN).

3.1. Theoretical Analyses

The design of FSN is based on two ideas, namely a domain alignment idea and domain distribution regularity idea.

Domain Alignment Conjecture. The solution proposed in this section is based on the idea of domain alignment. For a domain, it can be modeled as a joint distribution

P (X, Y)

(for ease of exposition in this section, we consider

P (X, Y)

as

P_{X Y}

). Then, we can decompose this joint distribution as

\begin{matrix} P (X, Y) & = & P (Y| X) P (X) \end{matrix}

(1)

\begin{matrix} = & P (X| Y) P (Y) \end{matrix}

(2)

A common assumption in domain generalization is that shifts in the data distribution occur only at the edges

P (X)

, with the posterior

P (Y| X)

remaining relatively stable. From the perspective of deep learning, if the marginal distributions of two domains can be aligned, then a predictive model trained on one domain can perform effectively as well on the other domain. In this case, the two domains can be viewed as having the same distribution. From the perspective of causal learning, if the assumption in the previous paragraph holds, then aligning the marginal distribution

P (X)

is only valid when X is the cause of Y because, in this case,

P (Y| X)

is not coupled with

P (X)

, and therefore,

P (Y| X)

can always remain stable when

P (X)

changes. Of course, there is also a possibility that Y may be the cause of X, in which case the shift of

P (X)

will also affect

P (Y| X)

.

One of the theoretical bases of the solution proposed in this section is based on the above assumption, that is, X is the cause of Y,

P (Y| X)

is not coupled with

P (X)

, and therefore, the data distribution shift only occurs on the margin

P (X)

. In this case, as long as the margin

P (X)

is aligned, the model can be transferred from one distribution to another.

Distribution Law Conjecture. In addition to the posterior stability idea mentioned above, the model proposed in this section is also based on another idea: when the domain labels correspond to physical meanings in reality, if there are two pairs of domain labels with identical internal relationships, then the internal domain feature distributions of the two pairs of domain data corresponding to these two pairs of domain labels are also identical. More precisely, if we denote the K domain labels as

D_{1}, D_{2}, \dots, D_{K}

, which correspond to domains

P (1), P (2), \dots, P (K)

, if there exists a relationship mapping

F_{d}

, such that

F_{d} (D_{k^{'}}) = D_{k}

, and

F_{d} (D_{k^{″}}) = D_{k^{'}}, k, k^{'} \in \{0, \dots, K\}

, then there exists a relationship

F_{p}

, when the relationship mapping between the domain features

f (P^{(k)})

,

f (P^{(k^{'})})

is

F_{p}

, that is,

F_{p} (f (P^{(k^{'})})) = f (P^{(k)})

, then

F_{p} (f (P^{(k^{″})})) = f (P^{(k^{'})})

.

As shown in Figure 2, within the domain labels, there is a mapping relationship

F_{d}

that can map one label to another label, and then there is a mapping relation

F_{p}

that exists between the domains corresponding to the labels related to these relationships, which can map the distribution of domain features to another domain features.

To illustrate the above more clearly in a diagram, let us consider a simple example. For the source domain labels 0, 1, and 2 of the CWRU dataset, it follows a linear distribution on the number line. We assume that the distribution of the domain features corresponding to the current domain labels is a kind of “linear” distribution in a certain dimension space. Figure 3 shows that these domain feature spaces visually appear to have a “linear” relationship. Because the source domain in the domain of the label actually corresponds to the bearing load in the running process and directly corresponds to the load size and code size (not random number), if there is a relatively simple relationship between the domain labels (used in this dataset is a linear relationship), we have reason to believe that the relationship between the feature spaces of multiple domains corresponding to their labels can also be obtained in some way. If the mapping between these domains can be obtained, the data in the feature space of one domain can be completely mapped to the feature space of the other domain. Even if this mapping is not perfectly accurate, as long as it can fit the mapping between the existing domain feature spaces reasonably well, it is considered to be better than domain generalization without any additional information to guide it.

3.2. Transfer Methods between Different Distributions

One of the most popular approaches is domain alignment, which aligns the feature distributions across the source domains in order to better apply predictive models trained on the source domain to the unknown target domain. The reasoning behind this approach is as follows: if a feature learned on the source domain is insensitive to distributional offsets between the source domains, then, for that feature, it is also relatively robust to distributional offsets on the unseen target domain. Currently, most of the methods related to domain generalization are developed based on the idea of domain alignment, and such methods are generally based on aligning the edge distribution

P (X)

, aligning the class-conditional distribution

P (X| Y)

, or aligning the posterior distribution

P (Y| X)

.

A general approach, in the scenarios discussed in this section, leads to information waste. Therefore, new domain generalization methods and processes are needed to retain and learn these regularities when we have seen the actual regularities between domains.

New Approach. In the scenario discussed in this section, each source domain and target domain label correspond to an actual working condition of the bearing, whereas the numbers (0, 1, 2, etc.) represent its physical meaning. Thus, an increase in the numbering of domain labels numbered 0, 1, and 2 represents an increase in their corresponding bearing loads, and with a linear relationship between the numbers, it is natural to expect that the feature spaces of their corresponding domains also have some relationship with each other. Extending this to unseen domains, if the target domain numbers also have actual physical meaning accordingly, then the target domain feature space should also have some kind of relationship with certain source domain feature spaces. The fact that the domain numbering adds physical meaning and that the target domain domain numbering is known is the biggest difference between the scenarios in this chapter and the domain generalization problem. Therefore, the traditional method of domain generalization through domain feature space alignment does not make any use of this added information, resulting in its underutilization. In order to make full use of the regular information between domains, this chapter proposes a new feature shift network FSN, which is optimized for this special domain generalization scenario with additional information and improves the practical effect of domain generalization.

3.3. Relational Classifiers

Recent studies have indicated that contrastive-based classification methods slightly outperform traditional fully connected classification methods [25,26,27]. This paper introduces a novel relational classifier to further enhance domain generalization networks by leveraging the relational structure among data points. Specifically, we employ a neural network as a distance metric to facilitate the learning of more generalized features, thereby enhancing generalization performance.

Relational classifiers are a subset of contrast-based classification methods, extensively utilized in self-supervised learning. In the domain of unsupervised learning for computer vision, contrast-based methods consistently outperform alternative approaches. Furthermore, recent research has shown that contrast-based methods also confer significant advantages in supervised learning. Notably, contrast learning exhibits lower sensitivity to hyperparameters compared to cross-entropy-based classification approaches, allowing for improved performance with less intensive fine-tuning. Figure 4 provides a graphic illustration of a contrast-based classification method. The contrastive approach involves passing an image through a feature extractor to obtain its feature map, which is then used for the test image. Next, a distance metric measures the distance between the test image’s feature map and those of other images. The test image is assigned to the class whose feature map is most similar to its own.

Design Idea. The contrast-based classification method classifies images by utilizing the distances between feature maps and selecting the minimum distance as the classification criterion. Previous approaches typically utilized fixed-distance metrics, such as Euclidean or cosine similarity distances for distance calculations. However, the efficacy of traditional feature map distance metrics cannot be considered entirely reliable. Since feature map distances do not always conform to traditional distance measures, we propose a new relational module that integrates an effective few-shot learning metric into the domain generalization framework. This approach ensures more accurate distance measurements. Figure 5 illustrates the distance and loss calculations performed by the relational module.

The relationship network replaces traditional measures with neural networks to compute the distance between feature maps. Similar to conventional methods, it takes two feature maps and outputs a relationship score, which reflects if the feature maps come from images of the same classification. A score of 1 implies the images belong to the same class, whereas a score of 0 suggests they are from different classes. The network is trained by minimizing the loss function, which measures the difference between the predicted relationship score and the actual score (1 or 0). The gradient of this loss is then back-propagated through the network, updating both the relationship network and the feature extraction layers.

To implement the relationship network, the two feature maps are concatenated and fed into the neural network to produce a relationship score. To classify a new feature map, the relationship score must be computed between this feature map and the feature maps representing each class. This approach allows the network to determine the closest match. We refer to this type of network, which integrates relationship modules and employs comparison-based classification, as a relationship classifier.

3.4. Design of FSN

Design Idea. An FSN based on the assumption that features from different domains can be converted to each other as long as a pattern can be found. The design idea comes from the sources shown in Figure 6. Since all labels in the label space of the source domain correspond to the actual working condition, and the load size in the working condition is the domain number, the number is practically meaningful in this scenario. There is a straightforward linear mapping between these numbers, represented as 0, 1, 2, and 3, which are evenly distributed along the number line shown on the right side of Figure 1. Domain numbers 0, 1, and 2 represent the source domains, corresponding to the data generated by the bearing under 0 HorsePower (HP) load, 1 HP load, and 2 HP load, respectively. These data sets belong to three distinct domains, which together comprise the source domain data. Domain number 3 represents the target domain, associated with the data generated by the bearing under 3 HP load. On the left side of Figure 1, these domain numbers are mapped to their respective domain feature spaces. These feature spaces are linearly ordered according to the domain labels, indicating a systematic pattern among them.

Network Structure. FSN consists of a feature extractor, a feature shift network, a relation classifier, and a label classifier. Both the feature extractor and the feature shift network are implemented by ResNet50 [28]. Because our domain generalization job is classification, we chose Cross-Entropy as the loss function of the network. The feature shift network receives the output feature of a feature extractor as input, outputs a feature of the same form, and expects this feature to be mapped to features in adjacent domains. The relation classifier is implemented in the same way as in Section 3.3.

The FSN structure is shown in Figure 7. During training, the input data are organized into the form of 1–9. Specifically, this means that there is one sample from the source domain (a + 1) and nine samples from the source domain (a). These nine samples are derived from 0 to n classifications within domain a, where n denotes the number of classifications in the task. In the actual training, domain a may be domain 0 or 1, and the corresponding domain a + 1 is domain 1 or 2. The data of domain a are changed into features after ResNet, and the extracted features of the data of domain a + 1 have to go through a feature shift network to obtain the shifted output. It is expected that this output can be aligned with the features of the data with the same labels in domain a through the relation module, and the output is the relation of 1 Score. Ultimately, when domain 3 is treated as domain a + 1, then domain 2 is treated as domain a. At this stage, the data from domain 3 are subjected to both feature extraction and feature shifting. Relationship scores are computed by comparing these features with those of each class in domain 2. The class with the highest relationship score is expected to be the true category of the data from the target domain.

4. Experiment

We conducted experiments for a mixed classifier and FSN on the CWRU dataset and an FSN on the rotated MNIST dataset to verify the efficiency of LD domain generalization.

4.1. Evaluation in CWRU Bearing Dataset

Setup. This study utilizes the Case Western Reserve University (CWRU) bearing dataset, commonly known as the CWRU bearing dataset. This dataset is an open-source collection, developed and released by Case Western Reserve University, and it has been widely applied in fault diagnosis and analysis. For more information, refer to Appendix A.

In the CWRU experiment, we consider the bearing load as the domain criterion. The model is trained in multiple source domains to classify nine fault modes, and then the model is applied to the target domain data for a generalization test. The number of samples is still about 600 samples for each fault mode, and each domain contains data for 9 fault modes. Data corresponding to loads of 0 HP, 1 HP, and 2 HP are designated as the three source domains, labeled with domain numbers 0, 1, and 2, respectively. Data with a load of 3 HP, labeled as domain number 3, serves as the target domain, which acts as the test set for evaluation.

The Epoch is set to 20, and 1200 iterations are carried out in each Epoch. Every 50 iterations, the model will be tested on the test set to obtain the real-time effect of generalization. In order to reduce the randomness of the final results, a list is used to store the accuracy of the last 30 tests, and the average of the last 30 tests is given.

Training Details. Before the experiment, we first describe the training process of the FSN. The complete training process of the FSN is shown in Figure 8. The FSN model is trained in stages to achieve better performance (i.e., relational classifier → fully connected classifier → FSN), and the training process of each stage is shown in Algorithms 1–3.

Algorithm 1 Trains the feature extractor and the relation classifier.

Input:: $x_{0}, x_{1}, \dots, x_{9}$ // $x_{1}, \dots, x_{9}$ are grayscale image with labels 0–8 from the source domain
Output:: $θ_{f}, θ_{r}$ // are the parameters of feature extractor and relation module respectively
1:: $x_{0}, x_{1}, \dots, x_{9}$ = get()
2:: $f e a t u r e_{0} = G_{f} (x_{0})$
3:: for $i = 1 \to 9$ do
4:: $f e a t u r e_{i} = G_{f} (x_{i})$
5:: $f e a t u r e_{i, 0} = c o n c a t e n a t e (f e a t u r e_{i}, f e a t u r e_{0})$
6:: $r e l a t i o n_s c o r e = G_{r} (f e a t u r e_{i, 0})$ // $G_{r}$ represents relation module
7:: if $l a b e l (x_{0})! = l a b e l (x_{i})$ then
8:: $l o s s + = l o s s_f u n c (r e l a t i o n_s c o r e, 0)$
9:: else
10:: $l o s s + = l o s s_f u n c (r e l a t i o n_s c o r e, 1)$
11:: end if
12:: end for

In the first stage of training the FSN, Algorithm 1 employs a relational classifier to perform classification training. This step is crucial for acquiring the initial parameters of both the feature extractor and the relation module. By using a relational classifier, the network learns to distinguish between different fault modes based on their relational scores.

Algorithm 2 represents the second stage of FSN training. At this stage, a fully connected classifier is integrated into the model. The training process now involves updating the parameters of the entire model, including the feature extractor, the relation module, and the newly added fully connected classifier. This integration allows the model to refine its feature representations and improve classification accuracy.

Algorithm 2 Trains fully connected classifier.

Input:: $x_{0}, x_{1}, \dots, x_{9}, θ_{f}, θ_{r}$
Output:: $θ_{f}, θ_{r}, θ_{c}$ // $θ_{c}$ is the parameters of the fully connected classifier
1:: $x_{0}, x_{1}, \dots, x_{9}$ = get()
2:: $f e a t u r e_{0} = G_{f} (x_{0})$
3:: $l o s s + = l o s s_f u n c (G_{c} (f e a t u r e_{0}), l a b e l (x_{0}))$
4:: for $i = 1 \to 9$ do
5:: $f e a t u r e_{i} = G_{f} (x_{i})$
6:: $f e a t u r e_{i, 0} = c o n c a t e n a t e (f e a t u r e_{i}, f e a t u r e_{0})$
7:: $r e l a t i o n_s c o r e = G_{r} (f e a t u r e_{i, 0})$
8:: if $l a b e l (x_{0})! = l a b e l (x_{i})$ then
9:: $l o s s + = l o s s_f u n c (r e l a t i o n_s c o r e, 0)$
10:: else
11:: $l o s s + = l o s s_f u n c (r e l a t i o n_s c o r e, 1)$
12:: end if
13:: end for

Algorithm 3 Trains fully connected classifier.

Input:: $x_{0}, x_{1}, \dots, x_{9}, θ_{f}, θ_{r}, θ_{c}$
Output:: $θ_{f}, θ_{r}, θ_{c}, θ_{s}$ // $θ_{s}$ is the parameter of FSN
1:: $x_{0}, x_{1}, \dots, x_{9} = g e t_d i f f e r e n t_d o m a i n ()$ // $x_{0}$ from domain n, $x_{1}, \dots, x_{9}$ domain n−1
2:: $f e a t u r e_{0} = G_{f} (x_{0})$
3:: $f e a t u r e_{s h i f t} = G_{s} (f e a t u r e_{0})$ // $G_{s}$ is FSN
4:: for $i = 1 \to 9$ do
5:: $f e a t u r e_{i} = G_{f} (x_{i})$
6:: $f e a t u r e_{i, s h i f t} = c o n c a t e n a t e (f e a t u r e_{i}, f e a t u r e_{s} h i f t)$
7:: $r e l a t i o n_s c o r e = G_{r} (f e a t u r e_{i, s h i f t})$
8:: if $l a b e l (x_{0})! = l a b e l (x_{i})$ then
9:: $l o s s + = l o s s_f u n c (r e l a t i o n_s c o r e, 0)$
10:: else
11:: $l o s s + = l o s s_f u n c (r e l a t i o n_s c o r e, 1)$
12:: end if
13:: end for
14:: $l o s s + = l o s s_f u n c (G_{c} (f e a t u r e_{i, s h i f t}), l a b e l (x_{0}))$

In the third stage, outlined by Algorithm 3, the training primarily focuses on the FSN. A key aspect to note in this stage is the relationship between the data samples:

x_{1}, \dots, x_{9}

not only correspond to labels 0 to 8 but also have domain numbers that are always one less than that of

x_{0}

. This configuration encourages the alignment of features from higher domains with those from lower domains that share the same label. This alignment is critical for enhancing the model’s ability to generalize across domains with different load conditions.

In general, FSN hopes to achieve better results through the above-staged training process. In the training process, the network parameters trained in the previous stages will be used as the input for the subsequent stages, indicating that the network used in the subsequent stages is trained by the previous stages. By learning the shift of domain features

1 \overset{s h i f t}{\to} 0

and

2 \overset{s h i f t}{\to} 1

on the source domain, we expect to learn a feature shift network that can achieve the shift of target domain 3 features to source domain 2 features

3 \overset{s h i f t}{\to} 2

, thereby achieving the purpose of generalization.

Result of Relational classifier. A Domain Adversarial Neural Network (DANN) is a classical domain generalization model based on a domain adversarial [29]. In this paper, a relational classifier is introduced into DANN, which has two different classification methods. We call this new model a Mixed Classifier Domain Adversarial Neural Network, or MCDANN for short. To establish the mixed classifier of our experiment, we integrate the relation network, applied in few-shot learning, into DANN. The novel deep learning model is called the Mixed Classifier Domain Adversarial Neural Network (MCDANN). As a result, MCDANN improves the generalization ability of the model. We compare the MCDANN to several classic models. The comparative experiment is conducted using the CWRU dataset. For domain division, the load serves as the basis, and loads 0HP, 1HP, 2HP, and 3HP are divided into domains 0, 1, 2 and 3. The model is trained using domains 0, 1 and 2 as the source domains, while domain 3 is reserved as the target domain, which is used to test the model’s ability to generalize to unseen data. The classical models employed in the comparison primarily fall into two categories: models trained using one-dimensional signals, and models trained using images. For the former, we have selected the classical models SVM and 1DCNN, whereas for the latter, we mainly compare them with methods based on 2DCNNs. In our experiments, ResNet is used as the underlying model for all 2DCNN implementations.

To facilitate representation, we use the following notations in Table 1: FC to signify the fully connected classifier, relation for the relational classifier, and DA for the domain adversarial method implemented by the classifier and Gradient Reversal Layer (GRL) [30,31] for the source domain. In case a model comprises two classifiers, the generalization accuracy on the target domain will be calculated separately using these two classifiers during testing.

Table 1 demonstrates that models from 2 to 6 highlight a combination of a common feature extractor and a classifier. Among the models, models 1, 2, 3 and 4 are directly generalized on the target domain after training on the source domain data. Conversely, models 5, 6 and 7 are approached using adversarial training. In the 6th model, a fully connected classifier and a relational classifier, based on a relational module, are applied. Consequently, generalization testing produces two accuracies.

The results from Models 4 and 5 clearly indicate that the generalization performance of domain adversarial models, which utilize both fully connected network classifiers and relational classifiers with relational modules, is comparable. However, the accuracy on the target domain declines following domain adversarial use of purely relational classifiers, as demonstrated by Models 4 and 6. Intriguingly, a generalization test conducted at each model parameter update reveals that the adversarial training of relational classifiers is associated with a gradual decrease in generalization accuracy. Therefore, it is imperative to augment relational classifiers with fully connected classifiers to achieve meaningful results.

The adversarial model using fully connected (FC) and relational classifiers achieves accuracy rates of 82.6% and 81%, respectively. This improvement is observed in both FC and relational comparison classifiers. The feature extractor is responsible for this improvement in generalization performance as it extracts features with better generalization capabilities.

Existing Model Contrast of FSN. After testing that the promotion can be brought from a mixed classifier, we conducted experiments on FSN with other classical models used for domain generalization.

The empirical risk minimization-based method (ERM) is first used for comparison. Although many methods have been proposed for domain generalization, which are intuitively reasonable and technically feasible, most of them can only achieve a small performance improvement [32,33,34].Therefore, in this paper, the ERM method is used as one of the methods for comparison with the FSN network.

On the basis of the above, in order to prove that FSN takes advantage of the character that domain labels have actual physical meaning, both single-source method and multi-source method experiments are carried out on FSN. Since the FSN network needs to train a network that can shift the high-domain features to the nearest low-domain features, there is no single-source method for the FSN network, which requires at least two domains with adjacent labels. Moreover, because the features need to be shifted into the feature space of adjacent domains during generalization, the FSN network must use domain 1 and domain 2 for the dual-source domain experiment. To facilitate comparison, all models are trained on domain 1 and domain 2 with dual source domains, and then tested on domain 3 for generalization. In the multi-source domain experiment, all source domains 0 to 2 are used during training, and domain generalization is tested on target domain 3. In addition, all methods are trained with the relational classifier introduced in Section 4 to improve generalization.

Figure 9 shows several ways of using multiple source domains in the generalization of domain 3 results. As can be seen from the table, the multi-source domain model generalizes significantly better than the dual-source domain model. In the experiment of dual-source domain, FSN performs poorly because the feature shift network is only trained on source domain 2 → 1, and the lack of domain makes the feature shift network unable to learn a good feature shift. In the multi-source domain experiment, FSN achieves the best generalization performance. The FSN effectively utilizes the information carried by the domain labels of the target domain and identifies the relationships between features in adjacent domains.

Figure 9 simultaneously shows the effect of each model using the FC classifier’s generalization and the relationship between the generalization effect of the classifier. We can see that the multi-source method is significantly better than the dual-source method, and FSN has the largest performance improvement when moving from dual-source to multi-source domains.

In addition, we compared FSN with several of the latest domain generalization techniques. Table 2 demonstrates that FSN outperforms these methods, largely due to its effective utilization of the LD information. As detailed in the training setup, we selected domain 3 as the target domain. This choice is crucial in Algorithm 3, as it promotes the alignment of features from high-level domains with those from low-level domains sharing the same label during training.

Visual Analysis. Figure 10 shows that FSN is still 100% and can be obtained in most of the failure modes of classification performance. For the B021 and IR014 faults (for more information, refer to Table A1), the error of the FSN is substantial, as evidenced by only 48.64% and 54.05% being correctly classified, respectively. Among them, 27.02% of IR014 faults are identified as B021 faults, and 18.91% are identified as IR021 faults. For the B021 fault, 40.54% of the data are identified as IR014 faults.

Figure 11 illustrates the use of the t-SNE method for visualizing data distributions. The failure modes are denoted by their corresponding numbers, as listed in Table A1. When the test set data are visualized in a grayscale image format, many data points of the same class cluster together, exhibiting initial signs of locality in the feature space. For example, data points from the same class tend to form coherent groups. Specifically, data associated with faults 1 and 8 show a central and cohesive distribution pattern. However, other fault types exhibit more dispersed distributions. For example, data points for fault 3 are scattered across the upper region of the plot, while fault 2 forms a semi-circular distribution around fault 1. Additionally, faults 7 and 0 display a near-uniform mixed distribution. Notably, in subsequent experiments, faults 1 and 8 achieve 100% classification accuracy when the model is applied to the test set. This high accuracy not only indicates that faults 1 and 8 are particularly distinguishable, but it also underscores the effectiveness of the t-SNE plot in reflecting the underlying data distribution patterns.

In addition, Figure 11 also shows that the clusters of fault 4 and 8 are close to each other, indicating that fault classes 4 and 8 may be difficult to separate in the classification. This is confirmed in the confusion matrix, where 20% of fault 4 is classified as fault 8. Nearly 25% of the classified results for fault 8 are incorrectly classified as fault 4, which is the only way to confuse fault 8.

The feature output by the feature extractor of FSN is represented in Figure 12a, and the feature output of the features in the figure after the feature shift network is shown in Figure 12b. As shown, after the shift, the distribution of the features from different domains becomes more aligned. Notably, the intensity or concentration of these features before and after the shift appears to be roughly equivalent. However, when these shifted features are used for classification in the testing phase, the results demonstrate significant improvement compared to the classification results using the unshifted features. This observation suggests that the FSN effectively aligns the target domain feature space more closely with the source domain feature space.

In general, the features from the unseen target domain, as processed by the trained FSN, exhibit a more concentrated distribution both before and after the feature shift. This concentration theoretically supports that the FSN method can achieve robust domain generalization performance, serving as a strong baseline for domain generalization tasks.

4.2. Discussion on the Rotated MNIST Datasets

We design and develop the Feature Shift Network (FSN) for Load-Domain (LD) domain generalization, a novel domain generalization task. Compared to traditional domain generalization tasks, LD domain generalization presents two unique features: first, the label numbers correspond to specific physical meanings; second, the domain number of the target domain is known. Besides the CWRU bearing dataset used in this study, we identify that several other datasets also exhibit these characteristics. To demonstrate the versatility and superior performance of FSN in this unique context, we extend our generalization experiments to include the MNIST dataset. This addition highlights FSN’s excellent performance in scenarios where these specific characteristics are present.

Setup. In the domain generalization experiment conducted on the rotated MNIST dataset, three domains in the training set 0–2 are used as the source domain, and the corresponding rotation angles are 0 degrees, 30 degrees, and 60 degrees. We tested 3 concentration domains as the target domain with a corresponding rotation angle of 90 degrees. The training process of the model is the same as the experimental process in the experiment of CWRU. FSN is trained on the source domain, and then the model is used on the target domain to test the generalization effect.

Comparison with Classical Models. Table 3 shows FSN and the comparison of several other models. KDA, UB, DICA, SCA, MATE, ERM, DANN, DeepC, DeepN, and CIDDG all come from Li et al. [39]. In addition, we include some newer models, e.g., DIfEX [35], VREx [36], GroupDRO [37] and ANDMask [38] in this examination. The results show that our proposed FSN (which uses an FC classifier for inference) achieves a 1.5% higher generalization effect than others in this scenario, which proves the effectiveness of FSN in the LD domain generalization task.

5. Conclusions and Future Work

This paper proposes LD domain generalization as a specific example of domain generalization, fully uses the particularity of the bearing dataset used for fault diagnosis, and implements FSN in this scenario. The domains in LD domain generalization correspond to the actual physical meaning (horsepower). FSN learns the relationship between these domains and, finally, shifts the data features in the target domain into the source domain for classification. Experimental results demonstrate that MCDANN (mixed classifier into DANN) surpasses the current DANN with higher generalization accuracy on the CWRU dataset, achieving accuracy rates of 82.6% and 81.0%, respectively. Experiments also reveal that FSN outperformed the comparison model’s generalization performance by 1% to 2% on the CWRU dataset and reached 85.22% accuracy on the rotated MNIST dataset.

Nevertheless, the application of FSN exhibits certain limitations. Addressing these constraints is a key aspect of future research efforts.

Firstly, FSN is only tested on the CWRU bearing dataset and the rotated MNIST dataset. The model needs to be tested on additional datasets with similar characteristics to validate its efficiency.
Secondly, the proposed FSN requires at least three source domains, and the domain number is required to correspond to the actual physical information. In order to make other datasets meet this requirement, the datasets need to be selected carefully, and the appropriate domain partition method should be selected.
Lastly, under the premise that the domain number corresponds to actual physical information, the source domain numbering needs to be consecutive during the training process and can only be generalized relatively well from the target domain to the adjacent source domains. Better methods for discontinuous domain generalization need to be explored.

Therefore, the application of FSN models remains limited. Addressing and mitigating these constraints is a key objective for future research.

Author Contributions

Funding acquisition, H.C.; methodology, H.C.; software, E.Z., Y.J. and L.S.; validation, E.Z., Y.J. and L.S.; formal analysis, H.C., L.S. and E.Z.; investigation, E.Z., Y.J. and L.S.; resources, H.C.; data curation, E.Z. and Y.J.; writing—original draft, E.Z. and Y.J.; supervision, H.C.; project administration, H.C. and L.S.; writing—review and editing, E.Z. and Y.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the China National Key R&D Program during the 13th Five-Year Plan Period (Grant No. 2018YFB1700405).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://engineering.case.edu/bearingdatacenter/download-data-file (accessed on 12 June 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Dataset Details

Datasets used in this paper are from Case Western Reserve University’s (CWRU) dataset of bearing, commonly referred to as the CWRU bearing dataset. The dataset is an open-source dataset, created and released at Case Western Reserve University, and has been widely applied in fault diagnosis and analysis. In order to facilitate the experiment, this paper takes the load as a field (0 3HP in total 4 fields). Under each load, three fault sizes are selected for the three fault locations: sphere, inner race, and 6 o’clock of outer race. Therefore, there are nine fault modes under each load. The data acquisition frequency of 12,000 Hz was chosen because some data were missing.

Each load (domain) below has nine kinds of failure modes with different fault locations and sizes. When numbering each load condition, for each load number, the value directly corresponds to the HP value of the load (0 to 3), and for the fault mode number, the details are given in Table A1.

Table A1. CWRU bearing dataset.

Fault Number	Fault Location	Fault Size	Load HP
Fault Number	Fault Location	Fault Size	0	1	2	3
0	Ball	0.007	B007_0	B007_1	B007_2	B007_3
1		0.014	B014_0	B014_1	B014_2	B014_3
2		0.021	B021_0	B021_1	B021_2	B021_3
3	Inner race	0.007	IR007_0	IR007_1	IR007_2	IR007_3
4		0.014	IR014_0	IR014_1	IR014_2	IR014_3
5		0.021	IR021_0	IR021_1	IR021_2	IR021_3
6	6 o’clock of outer race	0.007	IR007_0	IR007_1	IR007_2	IR007_3
7		0.014	IR014_0	IR014_1	IR014_2	IR014_3
8		0.021	IR021_0	IR021_1	IR021_2	IR021_3

For bearing signal processing, it is first normalized, then an overlap sampling window is used, which has a size of 784 and a stride of 200, and then the one-dimensional signal is converted into a gray image. In this case, there are about 600 grayscale images of 28*28 size in each fault mode of each domain. Taking the fault image under domain 0 data generated by condition 0 as an example, the corresponding grayscale images for each fault mode are shown in Figure A1.

It can be seen that the grayscale images of different fault modes under the same working conditions are quite different. Taking the first fault mode B007_0 under four working conditions as an example, the grayscale diagram is shown in Figure A2. It also shows that even the grayscale images of the same fault mode have different features if their working conditions are different.

Figure A1. The wavelet time–frequency diagram of 10 fault modes under working condition 0. (a) B007_0. (b) B014_0. (c) B021_0. (d) IR007_0. (e) IR014_0. (f) IR021_0. (g) OR007_0. (h) OR014_0. (i) OR021_0.

Figure A2. The wavelet time–frequency diagram of 10 fault modes under working condition 0. (a) B007_0. (b) B014_0. (c) B021_0. (d) IR007_0.

References

LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Hu, Y.; Zheng, J.; Li, M.; Ma, W. Central moment discrepancy based domain adaptation for intelligent bearing fault diagnosis. Neurocomputing 2021, 429, 12–24. [Google Scholar] [CrossRef]
Cui, H.; Guan, Y.; Chen, H.; Deng, W. A novel advancing signal processing method based on coupled multi-stable stochastic resonance for fault detection. Appl. Sci. 2021, 11, 5385. [Google Scholar] [CrossRef]
Zhao, M.; Kang, M.; Tang, B.; Pecht, M. Multiple wavelet coefficients fusion in deep residual networks for fault diagnosis. IEEE Trans. Ind. Electron. 2018, 66, 4696–4706. [Google Scholar] [CrossRef]
Peng, Z.K.; Chu, F. Application of the wavelet transform in machine condition monitoring and fault diagnostics: A review with bibliography. Mech. Syst. Signal Process. 2004, 18, 199–221. [Google Scholar] [CrossRef]
Wang, D.; Peter, W.T.; Tsui, K.L. An enhanced Kurtogram method for fault diagnosis of rolling element bearings. Mech. Syst. Signal Process. 2013, 35, 176–199. [Google Scholar] [CrossRef]
Karlsson, S.; Yu, J.; Akay, M. Time-frequency analysis of myoelectric signals during dynamic contractions: A comparative study. IEEE Trans. Biomed. Eng. 2000, 47, 228–238. [Google Scholar] [CrossRef]
Chen, H.; Shi, L.; Zhou, S.; Yue, Y.; An, N. A Multi-Source Consistency Domain Adaptation Neural Network MCDANN for Fault Diagnosis. Appl. Sci. 2022, 12, 10113. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Fernández-Francos, D.; Martínez-Rego, D.; Fontenla-Romero, O.; Alonso-Betanzos, A. Automatic bearing fault diagnosis based on one-class ν-SVM. Comput. Ind. Eng. 2013, 64, 357–365. [Google Scholar] [CrossRef]
Aishwarya, M.; Brisilla, R. Design and Fault Diagnosis of Induction Motor Using ML-Based Algorithms for EV Application. IEEE Access 2023, 11, 34186–34197. [Google Scholar] [CrossRef]
Dutta, N.; Kaliannan, P.; Shanmugam, P. SVM Algorithm for Vibration Fault Diagnosis in Centrifugal Pump. Intell. Autom. Soft Comput. 2023, 35, 2997–3020. [Google Scholar] [CrossRef]
Janssens, O.; Slavkovikj, V.; Vervisch, B.; Stockman, K.; Loccufier, M.; Verstockt, S.; Van de Walle, R.; Van Hoecke, S. Convolutional neural network based fault detection for rotating machinery. J. Sound Vib. 2016, 377, 331–345. [Google Scholar] [CrossRef]
Guo, L.; Lei, Y.; Li, N.; Xing, S. Deep convolution feature learning for health indicator construction of bearings. In Proceedings of the 2017 Prognostics and System Health Management Conference, Harbin, China, 9–12 July 2017; pp. 1–6. [Google Scholar]
Chen, Y.; Peng, G.; Xie, C.; Zhang, W.; Li, C.; Liu, S. ACDIN: Bridging the gap between artificial and real bearing damages for bearing fault diagnosis. Neurocomputing 2018, 294, 61–71. [Google Scholar] [CrossRef]
Qian, W.; Li, S.; Wang, J.; An, Z.; Jiang, X. An intelligent fault diagnosis framework for raw vibration signals: Adaptive overlapping convolutional neural network. Meas. Sci. Technol. 2018, 29, 095009. [Google Scholar] [CrossRef]
Abed, W.; Sharma, S.; Sutton, R.; Motwani, A. A robust bearing fault detection and diagnosis technique for brushless DC motors under non-stationary operating conditions. J. Control. Autom. Electr. Syst. 2015, 26, 241–254. [Google Scholar] [CrossRef]
Xie, Y.; Zhang, T. Imbalanced learning for fault diagnosis problem of rotating machinery based on generative adversarial networks. In Proceedings of the 2018 37th Chinese Control Conference (CCC), Wuhan, China, 25–27 July 2018; pp. 6017–6022. [Google Scholar]
Blanchard, G.; Lee, G.; Scott, C. Generalizing from several related classification tasks to a new unlabeled sample. In Advances in Neural Information Processing Systems; Neural Information Processing Systems Foundation, Inc. (NeurIPS): La Jolla, CA, USA, 2011; Volume 24. [Google Scholar]
Muandet, K.; Balduzzi, D.; Schölkopf, B. Domain generalization via invariant feature representation. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 10–18. [Google Scholar]
Li, H.; Pan, S.J.; Wang, S.; Kot, A.C. Domain generalization with adversarial feature learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5400–5409. [Google Scholar]
Motiian, S.; Piccirilli, M.; Adjeroh, D.A.; Doretto, G. Unified deep supervised domain adaptation and generalization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5715–5725. [Google Scholar]
Fan, Q.; Segu, M.; Tai, Y.W.; Yu, F.; Tang, C.K.; Schiele, B.; Dai, D. Towards Robust Object Detection Invariant to Real-World Domain Shifts. In Proceedings of the Eleventh International Conference on Learning Representations, Vienna, Austria, 7–11 May 2022. [Google Scholar]
Bai, G.; Ling, C.; Zhao, L. Temporal Domain Generalization with Drift-Aware Dynamic Neural Networks. In Proceedings of the Eleventh International Conference on Learning Representations, Vienna, Austria, 7–11 May 2022. [Google Scholar]
Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.; Hospedales, T.M. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1199–1208. [Google Scholar]
Hu, H.; Gu, J.; Zhang, Z.; Dai, J.; Wei, Y. Relation networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3588–3597. [Google Scholar]
Hu, H.; Zhang, Z.; Xie, Z.; Lin, S. Local relation networks for image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3464–3473. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Sicilia, A.; Zhao, X.; Hwang, S.J. Domain adversarial neural networks for domain generalization: When it works and how to improve. Mach. Learn. 2023, 112, 2685–2721. [Google Scholar] [CrossRef]
Osumi, K.; Yamashita, T.; Fujiyoshi, H. Domain adaptation using a gradient reversal layer with instance weighting. In Proceedings of the 2019 16th International Conference on Machine Vision Applications (MVA), Tokyo, Japan, 27–31 May 2019; pp. 1–5. [Google Scholar]
Ueda, M.; Kanda, K.; Miyao, J.; Miyamoto, S.; Nakano, Y.; Kurita, T. Invariant feature extraction for CNN classifier by using gradient reversal layer. In Proceedings of the 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Melbourne, Australia, 17–20 October 2021; pp. 851–856. [Google Scholar]
Zhu, W.; Lu, L.; Xiao, J.; Han, M.; Luo, J.; Harrison, A.P. Localized adversarial domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 7108–7118. [Google Scholar]
Gulrajani, I.; Lopez-Paz, D. In search of lost domain generalization. arXiv 2020, arXiv:2007.01434. [Google Scholar]
Koh, P.W.; Sagawa, S.; Marklund, H.; Xie, S.M.; Zhang, M.; Balsubramani, A.; Hu, W.; Yasunaga, M.; Phillips, R.L.; Gao, I.; et al. Wilds: A benchmark of in-the-wild distribution shifts. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 5637–5664. [Google Scholar]
Lu, W.; Wang, J.; Li, H.; Chen, Y.; Xie, X. Domain-invariant feature exploration for domain generalization. arXiv 2022, arXiv:2207.12020. [Google Scholar]
Krueger, D.; Caballero, E.; Jacobsen, J.H.; Zhang, A.; Binas, J.; Zhang, D.; Le Priol, R.; Courville, A. Out-of-distribution generalization via risk extrapolation (rex). In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 5815–5826. [Google Scholar]
Sagawa, S.; Koh, P.W.; Hashimoto, T.B.; Liang, P. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv 2019, arXiv:1911.08731. [Google Scholar]
Parascandolo, G.; Neitz, A.; Orvieto, A.; Gresele, L.; Schölkopf, B. Learning explanations that are hard to vary. arXiv 2020, arXiv:2009.00329. [Google Scholar]
Li, Y.; Tian, X.; Gong, M.; Liu, Y.; Liu, T.; Zhang, K.; Tao, D. Deep domain generalization via conditional invariant adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 624–639. [Google Scholar]

Figure 1. Domain label with or without physical information. (a) For these four styles, the domain labels contain no information except that they can be used to distinguish between different domains; (b) For these four styles, the domain labels contain no information except that they can be used to distinguish between different domains.

Figure 2. Mapping relations for domain-specific generalization. Rectangles and circles represent domain labels and domains, and different color means different domains and corresponding domain labels.

Figure 3. An example of the label and domain feature space. Stars and circles in the same domain represent different classes of fault.

Figure 4. Relational classifier.

Figure 5. Output and loss of relational classifiers.

Figure 6. Design idea of FSN.

Figure 7. FSN structure.

Figure 8. The training process of FSN.

Figure 9. Model generalization effect comparison.

Figure 10. Generalization results of FSN. Darker regions indicate higher generalization accuracy.

Figure 11. t-SNE distribution of grayscale features.

Figure 12. t-SNE method to visualize the data. (a) Model generalization results before shift; (b) model generalization results after using adversarial.

Table 1. Experimental comparison of models.

Model Number	Model	Classifier in Training		Generalization Accuracy/%
Model Number	Model	FC	Relation	None	FC	Relation
1	SVM			73.0
2	1DCNN + FC	✓			74.3
3	2DCNN + FC	✓			79.1
4	2DCNN + Relation		✓			78.8
5	2DCNN + FC + DA(DANN)	✓			80.1
6	2DCNN + Relation + DA		✓			65
7	2DCNN + FC + Relartion + DA + DA(MCDANN)	✓	✓		82.6 *	81 *

* The bold number represents top accuracy with mixed classifier.

Table 2. Generalization performance of each model on the CWRU dataset.

Models	ERM	DANN	DIFEX [35]	VREx [36]	GroupDRO [37]	ANDMask [38]	FSN
Accuracy	81.03	82.6	83.54	82.96	80.1	78.95	84.1 *

* The bold number represents that FSN get the top accuracy in generalization task on the CWRU dataset.

Table 3. Generalization accuracy of each model on the rotated MNIST dataset.

Models	KDA	UB	DICA	SCA	MATE	ERM	DANN	DeepC	DeepN	CIDDG	DIFEX	VREx	GroupDRO	ANDMask	FSN
Acc	72.81	69.39	72.05	73.43	78.34	79.56	82.95	80.08	83.99	84	76.28	82.96	83.75	82.83	85.22 *

* The bold number represents that FSN get the top accuracy in generalization task on the rotated-MNIST dataset.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, H.; Zhao, E.; Jia, Y.; Shi, L. FSN: Feature Shift Network for Load-Domain (LD) Domain Generalization. Appl. Sci. 2024, 14, 5204. https://doi.org/10.3390/app14125204

AMA Style

Chen H, Zhao E, Jia Y, Shi L. FSN: Feature Shift Network for Load-Domain (LD) Domain Generalization. Applied Sciences. 2024; 14(12):5204. https://doi.org/10.3390/app14125204

Chicago/Turabian Style

Chen, Heng, Erkang Zhao, Yunpeng Jia, and Lei Shi. 2024. "FSN: Feature Shift Network for Load-Domain (LD) Domain Generalization" Applied Sciences 14, no. 12: 5204. https://doi.org/10.3390/app14125204

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FSN: Feature Shift Network for Load-Domain (LD) Domain Generalization

Abstract

1. Introduction

2. LD Domain and Problem Definition

3. Feature Shift Networks

3.1. Theoretical Analyses

3.2. Transfer Methods between Different Distributions

3.3. Relational Classifiers

3.4. Design of FSN

4. Experiment

4.1. Evaluation in CWRU Bearing Dataset

4.2. Discussion on the Rotated MNIST Datasets

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Dataset Details

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI