Research on High-Speed Train Bearing Fault Diagnosis Method Based on Domain-Adversarial Transfer Learning

Zou, Yingyong; Zhao, Wenzhuo; Liu, Tao; Zhang, Xingkui; Shi, Yaochen

doi:10.3390/app14198666

Open AccessArticle

Research on High-Speed Train Bearing Fault Diagnosis Method Based on Domain-Adversarial Transfer Learning

by

Yingyong Zou

,

Wenzhuo Zhao

,

Tao Liu

,

Xingkui Zhang

and

Yaochen Shi

^*

School of Mechanical and Vehicle Engineering, Changchun University, Changchun 130022, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(19), 8666; https://doi.org/10.3390/app14198666

Submission received: 25 June 2024 / Revised: 7 September 2024 / Accepted: 17 September 2024 / Published: 26 September 2024

(This article belongs to the Collection Bearing Fault Detection and Diagnosis)

Download

Browse Figures

Versions Notes

Abstract

:

Traditional bearing fault diagnosis methods struggle to effectively extract distinctive, domain-invariable characterizations from one-dimensional vibration signals of high-speed train (HST) bearings under variable load conditions. A deep migration fault diagnosis method based on the combination of a domain-adversarial network and signal reconstruction unit (CRU) is proposed for this purpose. The feature extraction module, which includes a one-dimensional convolutional (Cov1d) layer, a normalization layer, a ReLU activation function, and a max-pooling layer, is integrated with the CRU to form a feature extractor capable of learning key fault-related features. Additionally, the fault identification module and domain discrimination module utilize a combination of fully connected layers and dropout to reduce model parameters and mitigate the risk of overfitting. It is experimentally validated on two sets of bearing datasets, and the results show that the performance of the proposed method is better than other diagnostic methods under cross-load conditions, and it can be used as an effective cross-load bearing fault diagnosis method.

Keywords:

domain-adversarial; channel reconstruction; transfer learning; fault diagnosis

1. Introduction

China has the longest and most complex high-speed rail system in the world [1]. The advantages of rail traffic, such as high economy and transport prices, contribute to higher demand and increased traffic intensity, as well as to higher loads and speeds of trains, which ultimately results in the rapid deterioration of this infrastructure [2,3]. High-speed trains (HSTs) are often considered one of the best modes of transportation, and China has the longest and most complex high-speed railroad system, with HSTs operating at a maximum speed of up to 380 km/h [4]. At such high speeds, bearings, as important parts of the transmission system, are subjected to complex alternating loads, which increase the wear rate of the bearings and reduce their service life [5]. They are an important part of HSTs, and their health situation is greatly related to the operation of the whole train.

The majority of the existing research on bearing fault diagnosis primarily focuses on the vibration signals generated by bearing impacts, which contain information concerning the healthy situation of the bearings. Analysis of the vibration signals and the information concerning the healthy situation of the bearings is crucial for the diagnosis of bearing faults in high-speed trains. With the proposition of machine learning methods, more and more academics are studying the correlated intelligent diagnosis algorithms and have successfully produced a variety of effective fault diagnosis algorithms based on vibration acceleration signals. Conventional intelligent diagnostic approaches have often used a mixture of signal processing measures and classification methods, using signal processing measures that have a well-established theoretical foundation, such as variational modal decomposition (VMD) [6], empirical modal decomposition (EMD) [7], and empirical wavelet transform (EWT) [8], to obtain signal features. These features are then input into classification methods, such as support vector machines (SVMs), extreme learning machines (ELMs) [9], and back-propagation networks (BPNNs) [10], to recognize the bearing fault state. Although these methods can classify and identify the bearing fault state, the accuracy of its diagnosis is strongly linked to the fault features extracted by the conventional signal processing methods in the previous period. Feature extraction using traditional signal processing methods mainly relies on personal experience, and it is difficult to sufficiently extract the correlated information from the vibration signal. This results in low-quality feature extraction, thus affecting the accuracy of the final fault identification.

Zhang et al. [11] combined the spatial discard regularization method with separated convolution to extract features from the original signals, achieving effective differentiation of the original signals of bearings in different states. Liu et al. [12] used improved one-dimensional and two-dimensional convolutional neural networks to achieve intelligent recognition of the bearing states, demonstrating high recognition accuracy. Che et al. [13] combined a deep belief network and convolutional neural network to extract relevant information from grayscale images and time series signals and used the fused deep learning model to recognize the fault types. More adaptable intelligent fault diagnosis approaches based on deep learning [14,15,16] can be used to feature vibration data automatically. However, their excellent performance depends on massive amounts of well-labeled training data and requires that the training data and the test data fulfill the condition of being identically distributed. In practical engineering work, owing to the effects of complex working environments, it is difficult to obtain enough data with markers of the same distribution, resulting in the inability to effectively apply all kinds of intelligent fault diagnosis measures based on deep learning.

In an intelligent fault diagnosis approach that is based on migration learning, the data for model training and testing do not need to have the same probability distribution. This can be accomplished by exploiting the source domain with the help of technical methods of migration to accomplish the tasks in the target domain, especially for data samples where there are no or few markers in the target domain [17]. Recently, on the basis of various deep neural networks, scholars have proposed several migration learning methods for rotating machinery fault diagnosis, and experimental validation has been carried out on various types of datasets collected under different operating conditions [18,19]. Domain adaptation is one of the migration learning methods that can be used to automatically align feature distributions in the source and target domains for deep feature extraction by using distribution difference metric functions, such as the maximum mean difference (MMD), multinomial kernel maximum mean difference (MK-MMD), and Wasserstein distance, which in turn realizes the migration from the source domain to the target domain to accomplish the task demands of the target domain. Li et al. [20] added the multiple maximum mean difference (MK-MMD) to a deep transfer network model to reduce the distributional difference of the data in the two domains and identified the faults of the target domains. Wan et al. [21] utilized the multiple kernel maximum mean discrepancy to adjust the marginal distribution and the conditional distribution in the multidomain discriminators of the two domains to achieve extraction of the domain-invariable features and complete cross-domain fault identification. Guo et al. [22] embedded the maximum mean difference into a fully connected layer in a convolutional neural network, thus enabling domain-invariant feature extraction to achieve pattern recognition of the faults. Wen et al. [23] added an adversarial mechanism in combination with an autoencoder to achieve cross-domain diagnosis from experimental bearings to the actual wheelset bearings used in the test models. Zhang et al. [24] presented an improved domain-adversarial neural network with improved multi-feature fusion to achieve the classification of bearing health status. Wu et al. [25] combined the adversarial mechanism and maximum mean discrepancy with a weight-sharing convolutional neural network to achieve fault identification under different operating conditions.

In summary, there have been many researchers who have investigated deep migration learning methods from various perspectives to improve their performance for bearing fault state recognition. However, compared to conventional bearings, the forces on rolling bearings in high-speed trains are more complex and variable. Based on the conditions of variable loads and existing theories, a migration model for fault diagnosis, which incorporates an adversarial mechanism and a channel reconstruction unit (CRU), is established in this study. In the process of establishing the model, we analyze not only the influence of each module in the feature extractor on the performance of feature extraction but also the effect of the structure of the fault recognition module and domain discrimination module on the diagnosis accuracy. Finally, the method was experimentally analyzed and found to have good performance in identifying bearing fault states under different load conditions.

The rest of the paper is organized as follows. Section 2 provides an introduction to the theory of domain-adversarial neural networks (DANNs), Section 3 describes the migration learning fault diagnosis framework, and Section 4 validates the performance of the method through experiments. Section 5 summarizes the paper.

2. Basic Theory

The characterization of data and their edge distributions are represented by

X

and

P (X)

. Thus,

D = {X, P (X)}

represents a domain. Two domains that are not the same indicate differences in the characterization or edge distributions.

D^{s} = {X^{s}, P (X^{s})}

denotes the source domain, while

D^{t} = {X^{t}, P (X^{t})}

denotes the target domain. These two domains comprise bearing vibration data under various loads with the same labels but distinct edge distributions,

P (X^{s}) \neq P (X^{t})

. The purpose of transfer learning is to aid in accomplishing tasks in the target domain by leveraging messages from the source domain. The domain-adversarial neural network (DANN) [26] is one of the classical methods of transfer learning.

Domain-adversarial neural networks (DANNs) mainly include a feature extractor

G_{F}

, a domain discriminant module

G_{D},

and a fault recognition module

G_{C}

. The feature extractor primarily performs deep feature extraction on the input data, while the fault identification module is responsible for identifying faults in the target domain test data samples after the feature extraction. The domain discriminant module serves as a binary classifier with an added gradient inversion layer.

X

represents the input data, and the characterization extractor

G_{F}

learns the features of the data samples by updating the parameter

θ_{F}

, which can be denoted as

G_{F} (X; θ_{F})

. The features learned by the extractor

G_{F}

are then inputted into the fault recognition module

G_{C}

for fault recognition, represented as

G_{C} (G_{F} (X; θ_{F}); θ_{C})

. The loss function is expressed as Equation (1).

L_{C}^{i} (θ_{F}, θ_{C}) = L_{C} (G_{C} (G_{F} (X^{i}; θ_{F}); θ_{C}), C^{i})

(1)

where

X^{i}

denotes the

i^{t h}

sample,

C^{i}

denotes the fault category of the

i^{t h}

sample, and

θ_{F}

and

θ_{C}

represent the parameters of the feature extractors

G_{F}

and the fault recognition module

G_{C}

. The domain discrimination module

G_{D}

is utilized to discriminate the faults from that domain, and its loss function is given in Equation (2).

L_{D}^{i} (θ_{F}, θ_{D}) = L_{D} (G_{D} (G_{F} (X^{i}; θ_{F}); θ_{D}), D^{i})

(2)

where

θ_{D}

represents the parameters of the domain discriminative module

G_{D},

and

D^{i}

denotes the label of the domain. The loss function is shown in Equation (3).

L = L_{C} (D_{s}, C_{s}) - λ L_{D} (D_{s}, D_{t})

(3)

where

D_{s}

and

C_{s}

denote the samples and their corresponding true labels,

D_{t}

denotes the samples in the target domain, and

λ

is the equilibrium parameter. The total quantity of samples is represented by

N

.

N_{s}

and

N_{t}

represent the quantity of samples in the source and target domains, respectively. The total loss function is shown in Equation (4).

L (θ_{F}, θ_{C}, θ_{D}) = \frac{1}{N_{s}} \sum_{i = 1}^{N_{s}} L (θ_{F}, θ_{C}) - λ [\frac{1}{N_{s}} \sum_{i = 1}^{N_{s}} L_{D}^{i} (θ_{F}, θ_{D}) + \frac{1}{N_{t}} \sum_{i = N_{s}}^{N} L_{D}^{i} (θ_{F}, θ_{D})]

(4)

3. Fault Diagnosis Transfer Model

To enable bearing fault diagnosis under different loading conditions, this paper proposes a novel transfer learning method by combining the domain-adversarial neural network with the channel reconstruction unit module (CRU). Initially, the signal undergoes preprocessing, converting the original signal into a one-dimensional tensor, which is then fed into the feature extractor. Subsequently, domain-invariable features are extracted through adversarial learning to facilitate modal recognition for unlabeled target domain samples. The model structure is depicted in Figure 1, and additional details are described below.

3.1. Feature Extractor

The feature extractor comprises five feature extraction modules and five channel reconstruction modules. Each feature extraction module includes a one-dimensional convolutional layer, a normalization layer, and a maximum pooling layer, as illustrated in Figure 2. The parameters of each feature extraction module are detailed in Table 1.

The space formed by the source domain samples and the target domain samples is

X_{D}^{i} = \{X_{s}^{i}, X_{t}^{i}\} \in ϑ^{l \times (N_{s} + N_{t})}

, where

l

is the data length for every smple. The convolution operation on the input data is expressed as Equation (5).

X_{D, y_{c o v}}^{i} = f (X_{D, (y - 1) o u t}^{i} * ω_{y_{c o v}} + b_{y_{c o v}})

(5)

where

X_{D, (y - 1) o u t}^{i}

represents the output of the last feature extraction module,

X_{D, y_{c o v}}^{i}

denotes the output of the current yth module,

{ω_{y_{c o v}}, b_{y_{c o v}}}

denotes the training parameters for the weights and biases in the convolutional computation, and

f (\cdot)

denotes the linear rectifier unit (ReLU) activation function. The input to the first feature extraction module is a data sample that has undergone signal preprocessing.

Then, the samples after the convolution operation are input into the batch normalization (BN) layer for a normalization operation to prevent the order-of-magnitude difference of the input variables from affecting the operation results too much. The BN operation process is shown in Equation (6).

B N (X_{D, y_{c o v}}^{i}) = 2 * \frac{X_{D, y_{c o v}}^{i} - X_{m i n}}{X_{m a x} - X_{m i n}} - 1

(6)

where

{X_{m a x}, X_{m i n}}

shows the maximum and minimum values in the operational data. Following normalization, the maximum pooling operation is applied on the normalized

B N (X_{D, y_{c o v}}^{i})

to reduce the model parameters and the operation cost. The maximum pooling operation is shown in Equation (7).

p (B N (X_{D, y_{c o v}}^{i})) = \max \{{B N (X_{D, y_{c o v}}^{i})}^{q}| s (r - 1) + 1 \leq q \leq s r}

(7)

where

s

represents the length of the segment, and

r

is the starting point of the pooling position.

3.2. Basic Theory of CRU

Following feature extraction by the yth feature extraction module, the output is then input into the channel reconstruction unit (CRU). The CRU depicted in Figure 3 was proposed by Li et al. [27]. Unlike standard convolution, the CRU employs splitting, transforming, and reconstruction strategies.

The first step is the splitting strategy, which divides the pass into two parts, the

α C

and

(1 - α) C

channels, as depicted in Figure 3. Here,

0 < α < 1

is the splitting ratio. The channels of feature mapping are subsequently compressed using

1 \times 1

convolution to improve the efficiency of computation. Following splitting, the feature channel is squeezed, and the squeezing ratio

τ

is 2. Following splitting and squeezing, the feature

X = p (B N (X_{D, y_{c o v}}^{i}))

from the feature module is divided into the upper portion

X_{u p}

and the lower portion

X_{l o w}

, and then enters into the transformation stage, where

X_{u p}

is inputted into the upper transformation stage. In the upper transformation stage, group-by-group convolution (GWC) [28] and point-by-point convolution (PWC) are used instead of a convolution operation to extract deep information. The

k \times 1 G

WC and

1 \times 1 P

WC operations are performed on the upper part of the output. After that, the outputs are summed to form the merged feature

Y_{1}

. The upper transform stage can be expressed by Equation (8).

Y_{1} = G X_{u p} + P_{1} X_{u p}

(8)

where

G \in R^{\frac{α C}{g τ} \times k \times 1 \times c}

and

P_{1} \in R^{\frac{α C}{τ} \times 1 \times 1 \times c}

are the relevant parameters of GWC and PWC.

X_{l o w}

is input into the lower transform stage to generate features with shallow hidden details using the

1 \times 1 P

WC operation. This serves as a complement to the rich features generated earlier. Utilizing

X_{l o w}

allows for the generation of additional features without incurring extra costs. Finally, the newly generated features and the reused features are concatenated to form the feature

Y_{2}

. The lower transformation stage formula is shown in Equation (9):

Y_{2} = P_{2} X_{l o w} \cup X_{l o w}

(9)

where

P_{2} \in R^{\frac{(1 - α) C}{τ} \times 1 \times 1 \times (1 - \frac{1 - α}{τ}) c}

represents the relevant parameters of the lower input PWC.

After the transformation, the output features

Y_{1}

,

Y_{2}

of the upper and lower transformation stages are fused using the simplified Sknet method [29], as depicted in the fusion section in Figure 3. Global average pooling is applied to collect the global spatial message for

S_{m}

, which is computed as in Equation (10).

S_{m} = P o o l i n g (Y_{m}) = \frac{1}{L} \sum_{i = 1}^{L} Y_{c} (i), m = 1,2

(10)

β_{1} = \frac{e^{s_{1}}}{e^{s_{1 +}} e^{s_{2}}}, β_{2} = \frac{e^{s_{2}}}{e^{s_{1 +}} e^{s_{2}}}, β_{1} + β_{2} = 1

(11)

The global up-channel descriptions

S_{1}

and global down-channel descriptions

S_{2}

are then stacked together, and a channelized soft attention operation is used to generate the feature importance vectors

β_{1}

and

β_{2}

. Finally, the up-channel descriptions

S_{1}

are combined with the down-channel descriptions

S_{2}

by combining the upper feature

Y_{1}

and lower feature

Y_{2}

by channel to obtain the channelized refinement feature

Y

, as shown in Equations (11) and (12).

Y = β_{1} Y_{1} + β_{2} Y_{2}

(12)

The result of the final processing of the features input from the feature module by the channel reconstruction unit can be expressed as Equation (13).

X_{y, o u t}^{i} = C R U [p (B N (X_{D, y_{c o v}}^{i}))]

(13)

3.3. Fault Recognition Module

The output of the feature extractor is

X_{D, f e a}^{i} = X_{y, o u t}^{i} (y = 5)

, which contains feature information learned from both domains.

X_{s, f e a}^{i}

represents the features learned from source domains that contain useful knowledge. The fault identification module consists of two fully connected layers, a

D r o p o u t

layer and the output layer. The zth fully connected layer processes

X_{D, f e a}^{i},

as shown in Equation (14).

X_{F C, s, z}^{i} = h [X_{s, f e a}^{i}; θ^{F C}] = {ρ [ω^{F C} * X}_{s, f e a}^{i} + b^{F C}]

(14)

where

θ^{F C} = {ω^{F C}, b^{F C}}

represents the parameters of the fully connected layer. The dropout layer’s process for the second fully connected layer

X_{F C, z}^{i} (z = 2)

can be expressed as Equation (15).

X_{s, D r o p o u t}^{i} = D r o p o u t (X_{F C, s, z}^{i}, v)

(15)

where

v

is the deactivation rate, which is taken as 0.5 in this paper. The softmax function is used as the output layer to identify the class of faults. The softmax function can be expressed as Equation (16).

S o f t m a x (C_{s}^{i} = j| X_{s, D r o p o u t}^{i}) = \frac{e^{X_{s, D r o p o u t}^{i}}}{\sum_{j = 1}^{k} e^{X_{s, D r o p o u t}^{i}}} (j = 0,1 \cdot \cdot \cdot 9)

(16)

where

c_{s}^{i}

denotes the fault label recognized by the output layer. The loss function of the fault recognition module is expressed as Equation (17).

L_{C} = - \frac{1}{N_{s}} \sum_{i = 1}^{N_{s}} \sum_{j = 1}^{k} I [C^{i} = k] l o g \frac{e^{X_{s, D r o p o u t}^{i}}}{\sum_{j = 1}^{k} e^{X_{s, D r o p o u t}^{i}}}

(17)

where

k

is the quantity of fault categories in the data sample, and

I (\cdot)

represents the indicator function.

3.4. Domain Discriminator Module

The structure of the domain discrimination module

G_{D}

mirrors that of the fault recognition module, but the domain discrimination module is used to differentiate the domain labels of the samples by the features

X_{D, f e a}^{i}

output from the feature extractor. The domain class of the sample is judged after first performing a convolution operation on the feature

X_{D, f e a}^{i}

through two fully-connected layers, as in Equation (14), then processed through a dropout layer, as in Equation (15), and finally, processed through an output layer. The output can be expressed as Equation (18).

D_{o u t} = \frac{1}{1 + e^{e^{X_{s, D r o p o u t}^{i}}}}

(18)

When the output layer cannot discriminate the samples from that domain, its learned features are domain-invariant features. Equation (19) is the loss function of the domain discrimination module

G_{D}

.

L_{D}^{i} = t_{i} l o g D_{o u t} (X_{D}^{i}) + (1 - t_{i}) \log (1 - D_{o u t} (X_{D}^{i}))

(19)

where the domain label is “0” or “1”, and

D_{o u t} (X_{D}^{i})

is the prediction label.

3.5. Model Loss Function

In the domain-adversarial neural network’s adversarial training, the total loss function is expressed as Equation (20).

L (θ_{F}^{*}, θ_{C}^{*}, θ_{D}^{*}) = \min_{θ_{F}, θ_{C}, θ_{D}} L_{C} (θ_{F}, θ_{C}) - λ L_{D} (θ_{F}, θ_{D})

(20)

where the equilibrium parameter

λ

varies dynamically with the number of iterations. The expression is shown in Equation (20).

λ_{p g} = \frac{2}{1 + e^{- 10 p g}} - 1

(21)

where

p g

represents the relative process of training iterations, which is the ratio of the current iterations to the total iterations.

4. Experiments

4.1. Case Western Reserve University (CWRU) Bearing Experiment Data Analysis

The CWRU’s bearing dataset has been extensively utilized by researchers to validate various diagnostic methods. For the purposes of this section, the sampling frequency of the system chosen to collect the bearing vibration acceleration data was 48 kHz, the motor speed was 1797 rpm, and the load sizes applied to the bearings in the experiment were 735 N, 1470 N, and 2205 N. The bearings exhibited four health states under each load: normal condition (N), inner ring failure (IN), outer ring failure (OF), and roller failure (RF). Each failure state of the bearing included three different damage levels in addition to the healthy state: 0.178 mm, 0.356 mm, and 0.533 mm. Two of the three different load conditions were chosen as the source domain dataset and the target domain dataset. A total of 2000 samples were randomly chosen from the sample data associated with each health state to serve as the training and testing sample sets for the two domains. The total quantity of samples for training and testing was 40,000. The data setup of the experimental operation is summarized in Table 2, while the dataset under each load condition is presented in Table 3. Figure 4 illustrates the dataset for the 735 N load condition under normal conditions, along with the time and frequency domain images of the signals for each fault condition, with a damage level of 0.178 mm.

4.1.1. Effects of Individual Modules on Experimental Test Results

In order to demonstrate the performance of the fault diagnostic model in this paper, the test results of methods lacking adversarial mechanisms and channel reconstruction units and this paper’s method were visualized using the t-distributed stochastic neighbor embedding (t-SNE) algorithm. The visualization results of the migration task from Domain B to Domain A are depicted in Figure 5. In each category, there was a notable distribution of samples. However, as shown in Figure 5a, when the adversarial mechanism was absent, a more severe cross-mixing phenomenon occurred between categories due to differences in the data distribution. The model that lacked the adversarial mechanism performed poorly in recognizing the test samples in the target domain. As shown in Figure 5b, each category of samples had a large distance between them, and confounding occurred between sample class 7 and sample class 0. However, when deep features were extracted using this paper’s method, the method was better able to aggregate the samples in each health state compared to the previous two methods.

As illustrated in the confusion matrix in Figure 6, it can be seen that due to the lack of an adversarial mechanism, the accuracy of each category shown in Figure 6a was relatively low, especially for category 4, which had an accuracy of only 45%. Figure 6b,c depict that the adversarial module could effectively extract consistent features from the two domains and improve the accuracy of bearing fault recognition. Additionally, comparing Figure 6b,c, the accuracy of nearly every category reached 100% after introducing the channel reconstruction unit in the feature extractor. This indicates that deeper features associated with faults can be effectively learned after the introduction of the channel unit, reflecting the model’s superiority. The results of other migration tasks using the three methods are shown in Table 4.

4.1.2. Impact of FC Layers on Experimental Test Results

Throughout the model, besides the feature extractor, the performance of the algorithm is also affected by the parameters in the fault identification module and the domain discrimination module. To investigate the effect of fully connected layers on the algorithm’s performance, experiments were conducted to assess the classification accuracy with different quantities of fully connected layers over a certain number of iterations. As shown in Table 5, it was found that the classification accuracy was higher in the structure that used two fully connected (FC) layers, a dropout layer, and an output layer. This is because a lower number of fully connected layers in the classifier reduces the model’s expressive ability, adversely affecting its performance and failing to achieve the desired classification effect. On the other hand, a higher number of layers increases the model’s computational cost and can easily lead to overfitting, negatively impacting the classifier’s effectiveness. Based on this, the fault recognition module and domain discrimination module are structured with two fully connected layers, a dropout layer, and an output layer.

4.1.3. Comparative Experimental Analysis with Other Methods

The proposed method in this paper was analyzed using the Case Western Reserve University bearing dataset, as shown in Table 6. The number of samples per run condition was 40,000, and there were 2000 training samples and 2000 test samples included in each state. This paper’s module was also compared with three other methods, including stacked self-encoder (SDAE)+ joint geometrical and statistical alignment (JGSA) [30], deep domain confusion (DDC) [31], and a domain-adversarial convolutional neural network (DACNN) [32]. For the SDAE + JGSA method, there are 3000 samples for each operating condition, and every health state contains 300 samples. The algorithm first extracts the deep features of the image using the SDAE algorithm and then reduces the feature distribution differences using the JGSA algorithm.

As the results in Table 6 show, the first two methods utilize SDAE and CNN to join the relevant joint geometrical and statistical alignment (JGSA) and domain confusion loss function to align the distribution differences. The highest diagnostic accuracy can be 98.3%. Both the DACNN and the present method introduce the adversarial mechanism, thus realizing the extraction of domain-invariant features. However, compared to the DACNN, the present study effectively improved the model to extract deeper features. As expressed in the table, each task of this paper’s method had a better performance than the DACNN method, which shows that the introduction of the CRU in the domain-adversarial network had a significant effect on increasing the performance of diagnosing bearing faults.

4.2. Analysis of Experimental Data of Bearing Failure Simulation on Bearing Life Prediction Test Bench

In the previous section, the bearing datasets A, B, and C of the CWRU had loads corresponding to 735 N, 1470 N and 2205 N at a speed of 1797 rpm. The above experiments simulated the application performance of the models with various loads at the same speed. For the purpose of further testing the generalization performance of the models, the same simulation conditions were used in this section. The four data samples were selected for experimental validation of the model.

4.2.1. LY-GZ-02 Experimental Bench Data Description

The LZ-GZ-02 bearing experimental bench model was selected to simulate faulty bearing working conditions under normal working conditions. Vibration acceleration sensors and data acquisition devices were used to collect the vibration data under each faulty condition of the bearing. Figure 7 illustrates the relevant parts of the experimental setup, which mainly consisted of hydraulic cylinders, an operating table, a data collector, and other devices.

Bearings in four different states of health were each operated under three different load conditions at the same rotational speed. The bearing health states included normal (N), outer ring failure (OF), inner ring failure (IF), and rolling element failure (RF). As shown in Table 7, the loads were 0 N, 1000 N, and 2000 N. The sampling frequency of the experimental acquisition device was 20 KHz. For each type of load, the samples for each bearing state were 1000, and the sample length was 1024. Time–frequency images of the vibration acceleration signals of the bearings at 1000 rpm, with loads of 1000 N under the four states, are shown in Figure 8.

4.2.2. Experimental Comparisons

The TCA, JDA, and DACNN methods are validated with the performance of the proposed method presented herein. TCA utilizes kernel functions to map the number of samples into a high-dimensional regenerative kernel Hibert space, which narrows down the differences in the distribution of the pre-existing data and preserves the domain-invariable features. In TCA, the radial basis function is chosen as the kernel function of TCA. JDA (joint distribution adaptation) is a traditional transfer learning method, which extracts domain-invariable features and predicts the target domain by maximizing the difference in the distribution of the categories of the source and target domains for the mapping of each category. In JDA, radial basis functions are selected. The DACNN is also a migration learning model that learns domain-invariant features using a CNN with adversarial mechanisms.

As illustrated in Table 8, the diagnostic accuracies of the TCA method and the JDA method between the three load conditions were 76.1% and 75.4%, which indicates pure kernel space mapping and two domain categories. It was difficult for the spatial mapping between them to obtain valuable features under various loading conditions. Compared to the present method, the DACNN method failed to regard the significance and uniqueness of the features of the recognition process in the feature learning process, and its recognition accuracy was lower. This paper’s method further extracted the deep features in the samples by adding the channel reconstruction unit to the feature extractor species. Accordingly, the accuracy of the test between condition B and condition C reached 100%, and the mean accuracy of the three conditions was 99.4%, which is clearly better than that of the other methods.

In the results shown in Figure 9, A and B are the source and target domains, respectively. As depicted in Figure 9a,b, the feature categories learned by the TCA and JDA methods did not really perform effective alignment of the distributional differences, and most of the samples labeled 3 (outer-ring faults) were incorrectly predicted, which did not allow for effective feature identification and sample classification. While the method shown in Figure 9c improved the overall effect, the adversarial training, in a certain sense, made the network effectively learn the domain-invariable features, which positively affected the fault classification module in the prediction of the test samples. However, this paper’s model was more capable of feature learning compared to the DACNN, indicating that it can more obviously recognize the various health states of the target domain bearings. The results of the above analysis show that this model is superior to the other three models, and the combination of the adversarial mechanism and the channel reconstruction unit can significantly improve the fault diagnosis performance of the bearings under different loading conditions.

5. Conclusions

In this paper, a migration learning model combining a domain-adversarial network and CRU is proposed for high-speed train bearing fault diagnosis under variable load conditions. The model has three advantages. Firstly, normalization is added to the feature extraction module to reduce the influence of individual difference samples on the network model, and the RELU activation function is used to improve the speed of the model operation. Secondly the influence of each module in the feature extractor on the performance of the algorithm was analyzed through experiments, and it was finally determined that combining the feature extraction module with the CRU strengthens the model’s ability to learn domain-invariant features. Finally, by experimentally analyzing the impact of the quantity of fully connected layers on the performance of the algorithm, it was determined that the ideal structure of the fault identification module and the domain discrimination module consists of two fully connected layers combined with the dropout function, which reduces the model parameters and the risk of model overfitting. In addition to this, the present method was compared with existing intelligent diagnosis methods to verify its good performance. However, this method only investigates fault diagnoses under different load conditions using the same equipment. Therefore, the next step is to study the migration fault diagnosis method for different types of bearings in different machines and the same equipment in high-speed trains, so as to better monitor the health status of high-speed train bearings.

Author Contributions

Conceptualization, Y.Z. and W.Z.; methodology, W.Z.; software, W.Z.; validation, W.Z., T.L. and X.Z.; data curation, W.Z., T.L. and X.Z; writing—original draft preparation, W.Z.; writing—review and editing, Y.Z. and Y.S.; visualization, W.Z.; supervision, Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Jilin Provincial Department of Science and Technology, grant number 20230101208JC.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used in Experiment 1 of this paper can be found at the following link: https://engineering.case.edu/bearingdatacenter/welcome, accessed on 16 September 2024. The data from Experiment 2 in this study are available upon request from the corresponding author. For privacy reasons, these data are not publicly available.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Brkić, R.; Adamović, Ž.; Bukvić, M. Modeling of reliability and availability of data transmission in railway system. Adv. Eng. Lett. 2022, 1, 136–141. [Google Scholar] [CrossRef]
Lazović, T.; Marinković, A.; Atanasovska, I.; Sedak, M.; Stojanović, B. From innovation to standardization—A century of rolling bearing life formula. Machines 2024, 12, 444. [Google Scholar] [CrossRef]
Vuković, V.; Kovalevskyy, S. Development of an innovative technical solution for the application of segmental managan inserts on the wear surface of the clamp of the tamping railway machines. Adv. Eng. Lett. 2024, 3, 42–51. [Google Scholar] [CrossRef]
Hu, W.; Xin, G.; Wu, J.; An, G.; Li, Y.; Feng, K.; Antoni, J. Vibration-based bearing fault diagnosis of high-speed trains: A literature review. High-Speed Railw. 2023, 1, 219–223. [Google Scholar] [CrossRef]
Hou, D.; Qi, H.; Li, D.; Wang, C.; Han, D.; Luo, H.; Peng, C. High-speed train wheel set bearing fault diagnosis and prog-nostics: Research on acoustic emission detection mechanism. Mech. Syst. Signal Process. 2022, 179, 109325. [Google Scholar] [CrossRef]
Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2014, 62, 531–544. [Google Scholar] [CrossRef]
Ali, J.B.; Fnaiech, N.; Saidi, L.; Chebel-Morello, B.; Fnaiech, F. Application of empirical mode decomposition and artificial neural network for automatic bearing fault diagnosis based on vibration signals. Appl. Acoust. 2015, 89, 16–27. [Google Scholar]
Zhang, Q.; Ding, J.; Zhao, W. An adaptive boundary determination method for empirical wavelet transform and its application in wheelset-bearing fault detection in high-speed trains. Measurement 2021, 171, 108746. [Google Scholar] [CrossRef]
He, C.; Wu, T.; Gu, R.; Jin, Z.; Ma, R.; Qu, H. Rolling bearing fault diagnosis based on composite multiscale permutation entropy and reverse cognitive fruit fly optimization algorithm—Extreme learning machine. Measurement 2021, 173, 108636. [Google Scholar] [CrossRef]
Li, J.; Yao, X.; Wang, X.; Yu, Q.; Zhang, Y. Multiscale local features learning based on BP neural network for rolling bearing intelligent fault diagnosis. Measurement 2020, 153, 107419. [Google Scholar] [CrossRef]
Zhang, J.; Kong, X.; Li, X.; Hu, Z.; Cheng, L.; Yu, M. Fault diagnosis of bearings based on deep separable convolutional neural network and spatial dropout. Chin. J. Aeronaut. 2022, 35, 301–312. [Google Scholar] [CrossRef]
Liu, X.; Sun, W.; Li, H.; Hussain, Z.; Liu, A. The method of rolling bearing fault diagnosis based on multi-domain supervised learning of convolution neural network. Energies 2022, 15, 4614. [Google Scholar] [CrossRef]
Che, C.; Wang, H.; Ni, X.; Lin, R. Hybrid multimodal fusion with deep learning for rolling bearing fault diagnosis. Measurement 2021, 173, 108655. [Google Scholar] [CrossRef]
Zhu, Z.; Lei, Y.; Qi, G.; Chai, Y.; Mazur, N.; An, Y.; Huang, X. A review of the application of deep learning in intelligent fault diagnosis of rotating machinery. Measurement 2023, 206, 112346. [Google Scholar] [CrossRef]
Zhang, S.; Zhang, S.; Wang, B.; Habetler, T.G. Deep learning algorithms for bearing fault diagnostics—A comprehensive review. IEEE Access 2020, 8, 29857–29881. [Google Scholar] [CrossRef]
Saufi, S.R.; Ahmad, Z.A.B.; Leong, M.S.; Lim, M.H. Challenges and opportunities of deep learning models for machinery fault detection and diagnosis: A review. IEEE Access 2019, 7, 122644–122662. [Google Scholar] [CrossRef]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Misbah, L.; Lee, C.K.M.; Keung, K.L. Fault diagnosis in rotating machines based on transfer learning: Literature review. Knowl. Based Syst. 2024, 283, 111158. [Google Scholar] [CrossRef]
Hakim, M.; Omran, A.A.B.; Ahmed, A.N.; Al-Waily, M.; Abdellatif, A. A systematic review of rolling bearing fault diagnoses based on deep learning and transfer learning: Taxonomy, overview, application, open challenges, weaknesses and recommendations. Ain Shams Eng. J. 2023, 14, 101945. [Google Scholar] [CrossRef]
Li, J.; Ye, Z.; Gao, J.; Meng, Z.; Tong, K.; Yu, S. Fault transfer diagnosis of rolling bearings across different devices via multi-domain information fusion and multi-kernel maximum mean discrepancy. Appl. Soft Comput. 2024, 159, 111620. [Google Scholar] [CrossRef]
Wan, L.; Li, Y.; Chen, K.; Gong, K.; Li, C. A novel deep convolution multi-adversarial domain adaptation model for rolling bearing fault diagnosis. Measurement 2022, 191, 110752. [Google Scholar] [CrossRef]
Guo, L.; Lei, Y.; Xing, S.; Yan, T.; Li, N. Deep Convolutional Transfer Learning Network: A New Method for Intelligent Fault Diagnosis of Machines With Unlabeled Data. IEEE Trans. Ind. Electron. 2019, 66, 7316–7325. [Google Scholar] [CrossRef]
Wen, H.; Guo, W.; Li, X. A novel deep clustering network using multi-representation autoencoder and adversarial learning for large cross-domain fault diagnosis of rolling bearings. Expert Syst. Appl. 2023, 225, 120066. [Google Scholar] [CrossRef]
Zhang, D.; Zhang, L. A multi-feature fusion-based domain adversarial neural network for fault diagnosis of rotating machinery. Measurement 2022, 200, 111576. [Google Scholar] [CrossRef]
Wu, Y.; Zhao, R.; Ma, H.; He, Q.; Du, S.; Wu, J. Adversarial domain adaptation convolutional neural network for intelligent recognition of bearing faults. Measurement 2022, 195, 111150. [Google Scholar] [CrossRef]
Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F. Domain-adversarial training of neural net-works. J. Mach. Learn. Res 2017, 17, 1996–2030. [Google Scholar]
Li, J.; Wen, Y.; He, L. SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 6153–6162. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
Li, X.; Wang, W.; Hu, X.; Yang, J. Selective kernel networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 510–519. [Google Scholar]
Dong, S.; He, K.; Tang, B. The fault diagnosis method of rolling bearing under variable working conditions based on deep transfer learning. J. Braz. Soc. Mech. Sci. Eng. 2020, 42, 585. [Google Scholar] [CrossRef]
Tzeng, E.; Hoffman, J.; Zhang, N.; Saenko, K.; Darrell, T. Deep domain confusion: Maximizing for domain invariance. arXiv 2014, arXiv:1412.3474. [Google Scholar]
Han, T.; Liu, C.; Yang, W.; Jiang, D. A novel adversarial learning framework in deep convolutional neural network for intelligent diagnosis of mechanical faults. Knowl. Based Syst. 2019, 165, 474–487. [Google Scholar] [CrossRef]

Figure 1. Structure of transfer learning fault diagnosis model.

Figure 2. Structure of feature extractor module.

Figure 3. Structure of CRU.

Figure 4. Time-domain and frequency-domain images of each health state of the Skf6205 bearing.

Figure 5. Visualization of t-SNE for each experimental result.

Figure 6. Confusion matrix for each experimental result.

Figure 7. LY-GZ-02 laboratory bench.

Figure 8. Time-domain and frequency-domain images of four health states of Skf6007 bearings.

Figure 9. Confusion matrix for each experimental method.

Table 1. Structural parameters of the characterization module.

Module Sequence	Structural Parameters
First feature extraction module	Cov1d(64 × 1, 64) + BN(64)+ ReLU+ Maxpool(2, 2)
Second feature extraction module	Cov1d(3 × 1, 32) + BN(32) + ReLU + Maxpool(2, 2)
Third feature extraction module	Cov1d(3 × 1, 64) + BN(64) + ReLU+ Maxpool(2, 2)
Fourth feature extraction module	Cov1d(3 × 1, 32) + BN(64) + ReLU + Maxpool(2, 2)
Fifth feature extraction module	Cov1d(3 × 1, 64) + BN(64) + ReLU + Maxpool(2, 2)

Table 2. Dataset descriptions of different operating conditions.

Condition	Rotation Speed (rpm)	Load	Sample Size	Sample Length
A	1797	735 N	40,000	1024
B	1797	1470 N	40,000	1024
C	1797	2205 N	40,000	1024

Table 3. Dataset settings.

Fault Type	Degree of Damage	Sample Size	Label
N	None	2000	9
IF	0.178 mm	2000	8
	0.356 mm	2000	7
	0.533 mm	2000	6
OF	0.178 mm	2000	5
	0.356 mm	2000	4
	0.533 mm	2000	3
RF	0.178 mm	2000	2
	0.356 mm	2000	1
	0.533 mm	2000	0

Table 4. Test results of each method for each task.

Task	No DA	No CRU	Proposed Method
A-B	82.5%	97.2%	99.1%
A-C	87.1%	97.3%	99.4%
B-A	85.3%	98.3%	99.6%
B-C	81.7%	94.58%	98.6%
C-A	85.4%	96.4%	99.1%
C-B	83.2%	95.8%	99%

Table 5. Experimental results for different FC layer counts under the A-C migration task.

Structure	Epoch	Accuracy
One FC layers + dropout + output	1000	83.5%
Two FC layers + dropout + output	1000	99.4%
Three FC layers + dropout + output	1000	93.2%

Table 6. Testing results of different methods.

Method	A-B	A-C	B-A	B-C	C-A	C-B	Average Accuracy
SDAE + JGSA [30]	93.4%	90.1%	90%	93.8%	91.2%	94.7%	92.2%
DDC [31]	98.4%	95.7%	97.7%	98.3%	88.5%	96.7%	95.9%
DACNN [32]	97.3%	98.3%	96.2%	98.1%	98.1%	98.4%	97.7%
Proposed method	99.1%	99.4%	99.6%	98.6%	99.1%	99%	99.3%

Table 7. Descriptions of the datasets.

Condition	Fault Type	Label	Sample Size	Load
A	RF	0	1000	0 N
	IF	1	1000
	N	2	1000
	OF	3	1000
B	RF	0	1000	1000 N
	IF	1	1000
	N	2	1000
	OF	3	1000
C	RF	0	1000	2000 N
	IF	1	1000
	N	2	1000
	OF	3	1000

Table 8. Test results of LY-GZ-02 test bench data.

Method	A-B	A-C	B-A	B-C	C-A	C-B	Average Accuracy
TCA	76.9%	75.1%	74.2%	77.5%	74.4%	78.3%	76.1%
JDA	75%	74.4%	76.3%	76.5%	73.3%	76.7%	75.4%
DACNN	95.8%	96.7%	94.5%	97.5%	94.5%	97.3%	96.1%
Proposed method	99.1%	99%	99.3%	100%	99.1%	100%	99.4%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zou, Y.; Zhao, W.; Liu, T.; Zhang, X.; Shi, Y. Research on High-Speed Train Bearing Fault Diagnosis Method Based on Domain-Adversarial Transfer Learning. Appl. Sci. 2024, 14, 8666. https://doi.org/10.3390/app14198666

AMA Style

Zou Y, Zhao W, Liu T, Zhang X, Shi Y. Research on High-Speed Train Bearing Fault Diagnosis Method Based on Domain-Adversarial Transfer Learning. Applied Sciences. 2024; 14(19):8666. https://doi.org/10.3390/app14198666

Chicago/Turabian Style

Zou, Yingyong, Wenzhuo Zhao, Tao Liu, Xingkui Zhang, and Yaochen Shi. 2024. "Research on High-Speed Train Bearing Fault Diagnosis Method Based on Domain-Adversarial Transfer Learning" Applied Sciences 14, no. 19: 8666. https://doi.org/10.3390/app14198666

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on High-Speed Train Bearing Fault Diagnosis Method Based on Domain-Adversarial Transfer Learning

Abstract

1. Introduction

2. Basic Theory

3. Fault Diagnosis Transfer Model

3.1. Feature Extractor

3.2. Basic Theory of CRU

3.3. Fault Recognition Module

3.4. Domain Discriminator Module

3.5. Model Loss Function

4. Experiments

4.1. Case Western Reserve University (CWRU) Bearing Experiment Data Analysis

4.1.1. Effects of Individual Modules on Experimental Test Results

4.1.2. Impact of FC Layers on Experimental Test Results

4.1.3. Comparative Experimental Analysis with Other Methods

4.2. Analysis of Experimental Data of Bearing Failure Simulation on Bearing Life Prediction Test Bench

4.2.1. LY-GZ-02 Experimental Bench Data Description

4.2.2. Experimental Comparisons

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI