Rotating Machinery Fault Diagnosis with Limited Multisensor Fusion Samples by Fused Attention-Guided Wasserstein GAN

Fu, Wenlong; Yang, Ke; Wen, Bin; Shan, Yahui; Li, Shuai; Zheng, Bo

doi:10.3390/sym16030285

Open AccessArticle

Rotating Machinery Fault Diagnosis with Limited Multisensor Fusion Samples by Fused Attention-Guided Wasserstein GAN

by

Wenlong Fu

^1,2

,

Ke Yang

¹,

Bin Wen

^1,*,

Yahui Shan

^3,*,

Shuai Li

¹ and

Bo Zheng

¹

College of Electrical Engineering and New Energy, China Three Gorges University, Yichang 443002, China

²

Hubei Provincial Key Laboratory for Operation and Control of Cascaded Hydropower Station, China Three Gorges University, Yichang 443002, China

³

Wuhan Second Ship Design and Research Institute, Wuhan 430064, China

^*

Authors to whom correspondence should be addressed.

Symmetry 2024, 16(3), 285; https://doi.org/10.3390/sym16030285

Submission received: 24 January 2024 / Revised: 23 February 2024 / Accepted: 27 February 2024 / Published: 1 March 2024

Download

Browse Figures

Versions Notes

Abstract

:

As vital equipment in modern industry, the health state of rotating machinery influences the production process and equipment safety. However, rotating machinery generally operates in a normal state most of the time, which results in limited fault data, thus greatly constraining the performance of intelligent fault diagnosis methods. To solve this problem, this paper proposes a novel fault diagnosis method for rotating machinery with limited multisensor fusion samples based on the fused attention-guided Wasserstein generative adversarial network (WGAN). Firstly, the dimensionality of collected multisensor data is reduced to three channels by principal component analysis, and then the one-dimensional data of each channel are converted into a two-dimensional pixel matrix, of which the RGB images are obtained by fusing the three-channel two-dimensional images. Subsequently, the limited RGB samples are augmented to obtain sufficient samples utilizing the fused attention-guided WGAN combined with the gradient penalty (FAWGAN-GP) method. Lastly, the augmented samples are applied to train a residual convolutional neural network for fault diagnosis. The effectiveness of the proposed method is demonstrated by two case studies. When training samples per class are 50, 35, 25, and 15 on the KAT-bearing dataset, the average classification accuracy is 99.9%, 99.65%, 99.6%, and 98.7%, respectively. Meanwhile, the methods of multisensor fusion and the fused attention mechanism have an average improvement of 1.51% and 1.09%, respectively, by ablation experiments on the WT gearbox dataset.

Keywords:

limited samples; generative adversarial network; multisensor fusion; fault diagnosis; attention mechanism

1. Introduction

As an indispensable part in the field of mechanical engineering, rotating machinery has been widely applied in industry, transport, energy, and many other fields [1], which plays a significant role in achieving energy conversion and transfer, power output, motion control, etc. Rotating machinery often operates under complex conditions with high temperatures and intense impacts [2]. Gears and bearings, as the key components of rotating machinery, are prone to failures, resulting in production losses or posing risks to personnel. Therefore, effective fault diagnosis of rotating machinery, including gears and bearings, is one of the important means to improve reliability and safety.

The fault diagnosis of rotating machinery is currently divided into two categories of methods: model-based ones [3] and data-driven ones [4]. Model-based methods need to establish a dynamic model and know in advance information about the mechanical equipment. However, as the design or operating condition changes, model-based methods may no longer be effective, and diagnosis accuracy would decrease. In addition, model-based methods require complex parameter adjustment and correction, which is difficult to satisfy for cases with real-time requirements. In contrast, traditional data-driven machine learning algorithms, such as support vector machines [5], extreme learning machines [6] and random forests [7], are simpler and more effective at fitting data. Nevertheless, due to their shallow model structure, these algorithms struggle to extract deeper features from the data.

In recent years, with the development of artificial intelligence, data-driven methods mainly based on deep learning have been widely applied in fault diagnosis because of their ability to mine deeper features of data, such as convolutional neural network (CNN), long short-term memory (LSTM), deep autoencoder (DAE) and deep belief network (DBN). CNN is widely employed in fault diagnosis due to its powerful learning ability and nonlinear feature extraction performance. He et al. [8] proposed an innovative deep triple-stream CNN based on the wavelet weight initialization method and the balanced dynamic adaptive thresholding algorithm for fault diagnosis. Xie et al. [9] employed multisensor and deep residual CNN to realize fault diagnosis of rotating machinery. LSTM can extract time-series dependency from collected signals. An et al. [10] proposed the periodic sparse attention and LSTM approach for rolling bearing fault diagnosis, which utilized LSTM to capture long-term dependent features in fault information and reduce the computational complexity of time series analysis. Considering the excellent anti-interference performance of DAE, Jia et al. [11] effectively extracted deep features under a strong noise background by combining stacked DAE optimized by variational mode decomposition, continuous wavelet transform (CWT) and the sparrow search algorithm. To take advantage of the deep extraction capability of DBN, Xing et al. [12] introduced a distribution-invariant DBN to capture unchanged distribution features. Although the deep learning models mentioned above have demonstrated satisfying advancement in the field of fault diagnosis, there are still some challenges that need to be solved. On one hand, rotating machinery usually operate in a healthy state for most of the time, with the transition to a faulty state being a prolonged process. On the other hand, rotating machinery running in a damaged state for a long time would lead to serious consequences and is impermissible. Consequently, it is imperative to address the challenges associated with limited samples in the fault diagnosis of rotating machinery [13].

As a novel approach, a generative adversarial network (GAN) was proposed by Goodfellow et al. [14] to tackle the challenge of limited data. With the rapid development of the intelligent fault diagnosis technique, variants of GAN have become widely employed in fault diagnosis to address the problem of insufficient samples and to improve model performance. To generate high-quality signals with global interactions and local dependencies, Gao et al. [15] proposed an integrated convolutional transformer GAN to augment limited data. Fan et al. [16] proposed a full-attention mechanism with Wasserstein GAN (WGAN), which integrated gradient normalization into the discriminator to prevent gradient explosion. Yang et al. [17] combined conditional GAN with two-dimensional CNN to realize data augmentation and fault diagnosis. To realize unsupervised fault diagnosis with limited samples, a rolling bearing fault diagnosis method based on quick self-attention convolutional GAN was proposed in the literature [18]. Additionally, Zhong et al. [19] proposed a bearing fault diagnosis method based on residual factorized hierarchical search-based GAN to augment CWT time-frequency images. Li et al. [20] introduced a novel supervised model based on the modified auxiliary classifier GAN, which enhanced the compatibility of classification and discrimination through the addition of independent classifiers, ultimately generating higher-quality multimode samples more efficiently. Although the preceding studies have introduced effective solutions for addressing the challenge of limited fault samples, they mainly adopt a single sensor to achieve fault analysis, overlooking the complementary information derived from multisensor analysis [21]. To facilitate monitoring the operational status of rotating equipment, multisensors are typically deployed to collect data in practical industries. Building upon the aforementioned analyses, a novel fault diagnosis approach for rotating machinery with limited multisensor fusion samples is proposed in this paper based on fused attention-guided WGAN. In the proposed approach, fused attention-guided WGAN is combined with gradient penalty (FAWGAN-GP) to proficiently generate high-quality multisensor fusion samples. Meanwhile, gradient penalty (GP) is introduced to solve the problems of gradient explosion and training instability in WGAN. Moreover, residual CNN is employed to achieve high classification accuracy. The primary contributions of this paper are delineated as follows.

(1): The fused attention (FA) mechanism is proposed, which employs a weighted fusion approach to integrate features extracted using a lightweight channel attention mechanism and the improved self-attention mechanism module, facilitating the extraction of important features from global, local and channel levels to enhance the quality of the generated samples.
(2): The generator in the basic WGAN model is enhanced by adding the L₁ loss function, which aims to make generated samples closer to real samples. Meanwhile, the GP term is incorporated into the discriminator to address the issues of gradient vanishing or gradient explosion in the training process. Furthermore, the data augmentation model FAWGAN-GP is proposed by introducing FA into both the generator and discriminator.
(3): To address the limitation posed by single-sensor incomplete information and the challenge of limited fault samples, a novel fault diagnosis method for rotating machinery with limited multisensor fusion samples is proposed based on the proposed model FAWGAN-GP.

The rest of this paper is organized as follows. The theoretical background of multisensor fusion, WGAN and self-attention mechanism is briefly introduced in Section 2. Section 3 depicts the proposed method and the whole research framework. The effectiveness of the proposed method is demonstrated by two case studies in Section 4. Finally, the conclusions are given in Section 5.

2. Basic Theory

2.1. Multisensor Fusion

In modern industry, to mitigate the risk of information collection failure caused by a single sensor, multisensors are commonly installed in machinery to monitor its operating status. Therefore, multisensor-based fault diagnosis methods have become a hot research topic in recent years. In general, multisensor fusion [22] is divided into three main types: data-level fusion [23], feature-level fusion [24] and decision-level fusion [25].

From the data perspective, data-level fusion involves the optimal combination of data from multisensors. Data-level fusion aims to achieve high accuracy performance while minimizing data loss. However, the data-level fusion model lacks error correction capability and may exhibit poor performance when dealing with asymmetric sensors or diverse types of sensors.

A feature-level fusion model is designed to reduce the dimensionality of large-scale data and is particularly suitable for real-time fault diagnosis. Feature-level fusion offers greater flexibility and adaptability than data-level fusion, enabling it to effectively handle situations involving asymmetric sensors or diverse types of sensors.

Decision-level fusion integrates decision results from multiple fault diagnosis models to achieve multisensor fusion.

2.2. WGAN

The fundamental structure of GAN is illustrated in Figure 1. Typical GAN comprises a generator and discriminator, where the generator, denoted as G, generates samples that approximate the real data distribution, and the discriminator, denoted as D, is responsible for distinguishing whether input samples originate from real data distribution or generative distribution.

Training instability is a common problem with the original GAN, which is attributed to GAN applying Jensen–Shannon (JS) divergence to measure the difference between the distribution of real data and the distribution of generated data. However, JS divergence has non-differentiable points resulting in the model collapse, and gradient explosion will arise when training GAN. To address this issue, the earth mover (EM) distance is applied in WGAN [26] to measure the different distribution between the generated data and the real data. The EM distance is defined as follows.

W (P_{r}, P_{g}) = \inf_{γ \in \prod (P_{r}, P_{g})} E_{(x, y) \in γ} [‖x - y‖]

(1)

where x and y represent the samples obtained by sampling from the real sample distribution P_r and the generated sample distribution P_g, respectively. П(P_r, P_g) denotes all sets that can be obtained using the joint distribution γ(P_r, P_g).

The aforementioned equation fundamentally aims to minimize the distance between x and y. However, identifying the infimum of the equation presents a considerable challenge. To address this problem, the Kantorovich–Rubinstein duality principle is utilized to recast the measurement of distance through a functional representation. Whereafter, an approximation of the Wasserstein distance is derived by imposing constraints on the network weight parameters to confine functional representation within a predetermined range. Therefore, the optimization goal of the complete WGAN loss function can be succinctly expressed as follows.

\underset{G}{m i n} \underset{D \in R}{m a x} V (D, G) = E_{x \sim P_{r}} [D (x)] - E_{z \sim P_{z}} [D (G (z))]

(2)

When training the discriminator D, it needs to be maximized E_x~Pr[D(x)] − E_z~Pz[D(G(z))], and when training the generator G, it needs to be minimized −E_z~Pz[D(G(z))].

2.3. Self-Attention Mechanism

A self-attention mechanism [27] integrates both intrinsic and extrinsic information to enhance the precision of local feature representation. The processes of self-attention mainly contain three steps. Initially, original signals pass through three 1 × 1 convolutional branches f(x), g(x) and h(x), respectively, to compute the hidden feature of each input feature vector.

f (x) = W_{f} (x)

(3)

g (x) = W_{g} (x)

(4)

h (x) = W_{h} (x)

(5)

where W_f, W_g and W_h are convolutional operation.

Subsequently, an attention matrix is generated with a softmax function with the result of the dot product between f(x)^T and g(x), which can be expressed as follows.

s_{i j} = f {(x_{i})}^{T} g (x_{j})

(6)

β_{i j} = \frac{e x p (s_{i j})}{\sum_{i = 1}^{N} e x p (s_{i j})}

(7)

where s_ij denotes the result of multiplying the ith pixel feature and the jth pixel feature, and β_ij represents the normalized attention matrix for s_ij.

Lastly, the hidden feature h(x_i) corresponding to each input is weighted and summed to obtain the self-attention mechanism output.

y_{SA} = \sum_{i = 1}^{N} β_{i j} h (x_{i})

(8)

where y_SA is the output of the self-attention mechanism.

3. The Proposed Method

3.1. Generative Model Based on WGAN-GP

Although WGAN has made better progress in achieving stable training, relying on a weight clipping strategy to forcefully satisfy the Lipschitz constraint on the discriminator, WGAN sometimes only generates poor samples. Meanwhile, due to not being able to control the boundary value after weight clipping, WGAN may still be confronted with gradient explosion or gradient disappearance. To improve the quality of generated samples and solve the above problems, this paper introduces GP [28] into the discriminator of WGAN instead of weight clipping. GP can not only improve training stability but also satisfy the Lipschitz continuity condition. The loss function of the gradient penalty and discriminator are defined as follows.

G P = λ E_{\hat{x} ~ p_{\hat{x}}} [{({‖\nabla_{\hat{x}} D (\hat{x})‖}_{2} - 1)}^{2}]

(9)

L_{D} = E_{x \sim P_{r (x)}} [D (x)] - E_{z \sim P_{z (z)}} [D (G (z))] + λ G P

(10)

where

{‖•‖}_{2}

represents the L₂ norm, ∇ is the gradient operator and λ is the coefficient of the gradient penalty term, which is set to 10.

\hat{x}

is obtained by randomly interpolating the sampling between the real data distribution P_r and the generated data distribution P_z, which is calculated as

\hat{x} = ε x + (1 - ε) \tilde{x}

, where

x \sim P_{r}

,

\tilde{x} \sim P_{z}

and ε are obtained by sampling from a uniform distribution on [0, 1].

In addition, L₁ loss is incorporated into the generator of WGAN-GP in this paper, aiming to enhance the quality of the generated data by calculating the absolute value of the error between the real data and the generated data. L₁ loss and the loss function of the generator are defined as follows.

L_{1} = \frac{\sum_{i = 1}^{n} |y - G (z)|}{n}

(11)

L_{G} = - E_{z ~ P_{z (z)}} [D (G (z))] + φ L_{1}

(12)

where z denotes the random noise, y is the real sample and φ is the weight coefficient of the L₁ loss function, which is set to 100.

3.2. The FA Mechanism Module

As shown in Figure 2, the proposed FA mechanism module consists of three parts, containing the lightweight channel attention mechanism, the improved self-attention mechanism module and the fusion operation.

3.2.1. Lightweight Channel Attention Mechanism

The upper part of the proposed FA mechanism is constituted by a lightweight channel attention mechanism. Firstly, global information is extracted from the spatial dimension by applying a global average pooling (GAP) layer to the input. Subsequently, a fully connected (FC) layer with a scaling factor is utilized for downscaling, accompanied by the application of a rectified linear unit (ReLU) activation function. Afterward, the FC layer is applied to restore the original dimension, and the output is given to the sigmoid activation function. Ultimately, to obtain two-dimensional features, the acquired features are reshaped to match the dimension of the input.

3.2.2. The Improved Self-Attention Mechanism Module

The lower part of the proposed FA mechanism is the improved self-attention mechanism module. Firstly, three convolutional layers f(x), g(x) and h(x) are adjusted by changing the convolutional kernel size from 1 × 1 to k × k, aiming to expand the receptive field of convolutional layers for feature extraction. Thus, the modified convolutional layers are employed to deal with the input, obtaining the attention feature y_o, where the calculating process is the same as the self-attention mechanism. With y_o passing through k × k convolutional layer O(x), the output o_j of convolution operation O(x) is obtained:

o_{j} = W_{o} (y_{o})

(13)

where W_o denotes convolutional operation.

Finally, referring to the idea of residual connections, the original input is added to the generated attention value o_j. The final output y_ISA of the improved self-attention mechanism module is described as follows.

y_{I S A} = γ o_{j} + x_{i n}

(14)

where

γ

is a trainable parameter and initialized to 0. x_in is the original input.

3.2.3. Integration Operation

To effectively fuse the features extracted by the two designed attention mechanisms, the direct weighted connection is implemented by applying the following equation.

f (R_{i}, Y_{i}) = α R_{i} + (1 - α) Y_{i}

(15)

where i is the index of the feature, Y denotes the output of the lightweight channel attention mechanism and R denotes the output of the improved self-attention mechanism module. α is trainable parameter and α∈(0, 1).

3.3. FAWGAN-GP-Based Data Augmentation Approach

Considering that the attention module applied to the intermediate and high-level feature maps exhibits superior performance compared to its application in the low-level feature maps [29], the proposed FA mechanism is integrated into the intermediate and high levels of both the generator and discriminator, as shown in Figure 3, which is referred to as FAWGAN-GP. The network parameters and structure of the proposed FAWGAN-GP are presented in Table 1. The structure of the discriminator and generator are four-layer CNN, and the generator is designed as a residual block to improve the ability of generation. It should be clarified that the generator and discriminator of the FAWGAN-GP utilize an Adam optimizer. β₁ and β₂ are set to 0.5 and 0.999, respectively. The learning rates are set to 1 × 10⁻⁴ and 5 × 10⁻⁴ for the generator and discriminator of the proposed FAWGAN-GP, respectively. The number of the training epoch is set to 2500, and the batch size is set to 32. Moreover, the generator applies a Gaussian error linear unit [30] (GELU) activation function, while the discriminator employs a leaky rectified linear unit (LReLU). GELU, a commonly applied activation function in recent years, is known for providing smoother gradients and is defined as follows.

G E L U (x) = 0.5 x (1 + \tanh [\sqrt{2 / π} (x + 0.044715 x^{3})])

(16)

To address the problem of limited samples in real industry, limited samples with multisensor information are augmented by utilizing the constructed FAWGAN-GP model. The proposed FAWGAN-GP model can enhance the quality of the generated samples by facilitating the extraction of important features from global, local and channel levels. Meanwhile, the L₁ loss function in the FAWGAN-GP model aims to generate samples closer to real samples. In addition, to ensure stable training processing, the GP term is introduced into the proposed FAWGAN-GP to prevent the problem of gradient explosion.

3.4. Overall Framework of Fault Diagnosis

To address the limitation posed by single-sensor incomplete information and the challenge of limited fault samples, a novel fault diagnosis framework for rotating machinery with multisensor fusion samples is proposed in this paper based on FAWGAN-GP. As depicted in Figure 4, it can be divided into three main parts: data collection and processing, data augmentation by FAWGAN-GP and fault diagnosis, encompassing the following steps.

Step 1: Various state quantities of rotating machinery, such as vibration signals, displacement signals and current signals, are collected with multisensors. Meanwhile, to address the issue of varying sampling frequencies among multisensors, a polyphase anti-aliasing filter is utilized to resample the signals to the highest sampling frequency. The multisensor signals can be represented as follows.

X_{r a w} = \{x_{i j} : x_{i j} \in R^{n \times m}\}

(17)

where X_raw is a matrix of n rows and m columns, n represents the signal length of each sensor and m represents the number of sensors.

Step 2: To mitigate the impact of irrelevant data, principal component analysis (PCA) is applied to retain the top three principal components of the collected multisensor signals. The process of dimensionality reduction by PCA can be represented as follows.

X_{p c a} = X_{raw} W^{T} = \{x_{i j} : x_{i j} \in R^{n \times 3}\}

(18)

where X_pca∈Rⁿ^×3 is the three-channel feature after dimensionality reduction, and W^T∈R^m^×3 is the linear transformation matrix.

Step 3: The split points for three-channel features are randomly selected from a uniform distribution [1, n − l × l], and they ensure that the chosen split points do not overlap. Subsequently, signal samples for each channel of length l × l are generated based on the corresponding split points.

Step 4: The segmented signals from the three-channel features are transformed into two-dimensional pixel matrices and then normalized to values between 0 and 255. This process is defined as follows.

\begin{array}{l} P M^{j} (a, b) & = \frac{L^{i} ((a - 1) \times l + b) - m i n (L^{i})}{m a x} (L^{i}) \\ - m i n (L^{i}) \times 255, i = 1, 2, 3 \end{array}

(19)

Step 5: The three normalized matrices are fused into a RGB pixel matrix. The RGB pixel matrix can be expressed as follows.

R G B M (a, b, C_{f}) = (P M^{j} (a, b), C_{f})

(20)

where C_f = j = 1, 2, 3.

Step 6: A small number of training samples and a certain number of test samples are randomly created from the multisensor RGB images.

Step 7: The proposed FAWGAN-GP model is employed to train samples and generate sufficient samples to augment the original training samples.

Step 8: The augmented training dataset is employed to train the residual CNN classifier with which the test samples are classified.

The network parameters and structure of the residual classifier in this paper are shown in Table 2. The residual classifier utilizes an Adam optimizer, and β₁ and β₂ are set to 0.5 and 0.999, respectively. The learning rate of Adam is set to 5 × 10⁻⁵. The batch size and training epoch are set to 40 and 70, respectively.

4. Case Studies

The programming language of this paper is Python 3.7, the programming environment is Tensorflow 2.2, and the hardware configuration is a Windows 10 64-bit system, 13th Gen Intel(R) Core(TM) i5-13490F CPU at 2.50 GHz and 32 GB of RAM equipped with the GPU NVIDIA GeForce RTX 3060Ti.

4.1. Case 1: KAT Bearing Dataset

The experiment data of Case 1 are provided by the Konstruktions- und Antriebstechnik (KAT) Data Center at the University of Paderborn, Germany [31]. As shown in Figure 5, the test platform collects vibration signals and radial force on the bearings, as well as load torque and two-phase motor current signals at the bearing location. It is worth noting that the sampling frequency of the sensors for radial force and load torque is 4 kHz, while the sampling frequency of the sensors for the rest is 64 kHz. Additionally, the KAT-bearing dataset captures data from four different operating conditions. In this paper, data from the condition with a rotation speed of 900 rpm, load torque of 0.7 N·m and radial force of 1000 N were gathered. The dataset includes states in healthy (NO), inner ring fault (IRF), outer ring fault (ORF) and a combination of inner and outer faults (IORF).

The specific experimental information for Case 1 is shown in Table 3. The proposed FAWGAN-GP model was trained by datasets A, B, C and D, respectively. The generated samples for each class were set as A1, B1, C1 and D1, respectively. At the same time, test datasets A2, B2, C2 and D2 with 100 real samples in each class were applied for evaluation. The proposed FAWGAN-GP model utilized dataset A to generate the RGB images shown in Figure 6. It is evident from Figure 6 that the proposed FAWGAN-GP model can effectively generate diverse and clear samples.

To evaluate the effectiveness of the proposed approach, the classifier was trained on the combined datasets A + A1, B + B1, C + C1 and D + D1, respectively. Subsequently, the classifier was tested on datasets A2, B2, C2 and D2, respectively. The results of confusing matrices are shown in Figure 7, which shows that the classification accuracy for the four datasets was 100%, 99.5%, 99.5% and 98.5%, respectively. Remarkably, despite the limited availability of only 15 training samples in each class, high classification accuracy was still attained. Furthermore, a feature-level evaluation technique t-SNE was employed to visualize the two-dimensional feature distribution of both the generated samples and real samples. Initially, a sufficient number of real samples were utilized to train the classifier to obtain a pre-trained classifier. Subsequently, the samples applied to train FAWGAN-GP combined with the generated samples in a 1:1 ratio were input into the pre-trained classifier for the test. The results of the four datasets are displayed in Figure 8. Although a few misclassifications were observed from the experimental results, the overall clustering results between the generated samples and real samples are commendable, which effectively validated the performance of the proposed method.

To verify the effectiveness of the proposed FAWGAN-GP, comparative experiments were conducted applying four related models, including deep convolution GAN (DCGAN), auxiliary classifier GAN (ACGAN), modified ACGAN (MACGAN) [20] and WGAN with gradient penalty (WGAN-GP). All five models were trained by utilizing datasets A, B, C and D, respectively. Meanwhile, dataset B1 was generated for each class, and dataset B2 with 100 real samples for each class was tested. It should be emphasized that the network parameters, network layers, epoch, batch size and learning rate applied for the models of comparison were identical to the proposed FAWGAN-GP. All fault diagnosis experiments were run 10 times and the average result was used to measure the model performance. Table 4 presents the average classification accuracy of five models on four train datasets. When training dataset A, the accuracy of the FAWGAN-GP model was 1.725%, 14.9%, 11.65% and 0.7% higher than the other four models, respectively. As the number of training samples decreased, the gap between the accuracy of the FAWGAN-GP model and that of the four comparative models gradually widened. Specifically, when training dataset D, the accuracy of the FAWGAN-GP model improved by 4.45%, 53.2%, 51.7% and 1.2%, respectively. Furthermore, although ACGAN and MACGAN were capable of generating multi-mode samples, the performance of ACGAN and MACGAN was influenced by the number of training samples, resulting in lower diagnostic accuracy compared to the proposed model. Additionally, the proposed model could achieve higher classification accuracy compared to DCGAN and WGAN-GP and was more stable than the other four comparative models based on the variance of classification accuracy.

To assess the impact of the number of generated samples on classification accuracy, the proposed method and WGAN-GP were trained on dataset D, and datasets A1, B1, C1 and D1 were generated, respectively. Subsequently, the residual CNN classifier was trained by applying datasets D + A1, D + B1, D + C1 and D + D1, respectively, and A2, B2, C2 and D2 were tested, respectively. The average results from 10 tests are presented in Figure 9. The results suggest that the classification accuracy is influenced by the number of generated samples. Additionally, the proposed model demonstrated a relatively minimal decline in accuracy compared to WGAN-GP when the number of generated samples was reduced.

4.2. Case 2: WT Gearbox Dataset

The WT planetary gearbox dataset was collected by Beijing Jiaotong University [32]. As shown in Figure 10, the test platform mainly contains four basic components: the motor, planetary gearbox, stator gearbox and load device. The four planetary gears rotate around the sun gear, and the structure is shown in Figure 11a. The sun wheel of the WT dataset has five health statuses, as shown in Figure 11b–f. The WT dataset is collected by multisensors for each health status, including the vibration signals of the horizontal direction and vertical direction, as well as the coded data of the planetary gearbox input shaft. The sampling frequency of all collected data is 48 kHz. In addition, the WT dataset collected data with eight velocity conditions for each health status. In this paper, the multisensor signals under 20 Hz velocity conditions were gathered for the experiments, and the RGB images were obtained by applying the same steps of multisensor fusion as described above. The specific experimental information of Case 2 was designed in Table 5.

To explore the influence of learning rates and GP term coefficient in hyperparameters to the proposed FAWGAN-GP, the learning rates of the generator and discriminator were selected in group {(G, D): (0.0001, 0.0001), (0.0001, 0.0005), (0.0005, 0.0001), (0.001, 0.001)}, respectively. Meanwhile, the GP term coefficient was selected in group {1, 10, 20, 50}, respectively. The residual CNN classifier was trained by dataset C + C1, and dataset C2 was utilized to test. The average accuracy of all classification results was obtained by running the experiment 10 times, displayed in Figure 12. The results show that the selected hyperparameters in this paper can achieve the highest classification accuracy.

Four ablation experiments were designed to verify the individual effectiveness of multisensors and the FA mechanism. Experiment 1 applied single-sensor samples without the inclusion of the FA mechanism, and experiment 2 fused the FA mechanism into experiment 1. Experiment 3 applied multisensor samples without the inclusion of the FA mechanism, and experiment 4 fused the FA mechanism into experiment 3. Therefore, the comparison between experiment 1 and experiment 3 focuses on the influence of sensor quantity on classification accuracy, while the comparison between experiment 3 and experiment 4 investigates the impact of fusing the FA mechanism on classification accuracy when utilizing multisensor. Models of the four experiments were trained on datasets A, B, C and D to generate datasets A1, B1, C1 and D1, respectively. The residual CNN classifier was trained on A + A1, B + B1, C + C1 and D + D1, respectively. Meanwhile, datasets A2, B2, C2 and D2 were employed for testing, respectively. All classification experiments were run 10 times, and the average classification accuracies of four experiments were presented in Figure 13. Based on the results of experiment 1 and experiment 3, it can be concluded that employing multisensors has an average improvement of 1.51% in accuracy compared to applying a single sensor. Furthermore, the results of experiment 3 and experiment 4 indicate an average increase of 1.09% in accuracy by incorporating the FA mechanism. Hence, the effectiveness of the proposed method in this paper can be validated.

To verify the influence brought by the classifier on the experimental results, the residual CNN classifier was replaced with four different CNN classifiers, including CNN, VGG16, AlexNet and ResNet18. It is worth noting that the network parameters, batch size, learning rate and epoch for all comparison classifiers were the same as the residual CNN in this paper. The augmented dataset of C + C1 was divided into a training set and validation set with a ratio of 4:1, and the dot–line plots of the epoch and validation set accuracy for comparative results of different classifiers were plotted in Figure 14. Meanwhile, the testing classification accuracy in dataset C2 for CNN, VGG16, AlexNet, ResNet18 and the residual CNN were 87%, 94%, 96.4%, 86.2% and 97.6%, respectively. Compared with other classifiers, it can be seen that the residual CNN classifier achieved the highest accuracy, which improved by 10.6%, 3.6%, 1.2% and 11.4%, respectively. Although the original rising speed in the accuracy of VGG16, AlexNet and ResNet18 were faster than the residual CNN classifier, the stability and the late rising speed in accuracy were worse than the residual CNN classifier.

5. Conclusions

Focusing on addressing the issue posed by single-sensor incomplete information in fault diagnosis with limited samples, a novel intelligent fault diagnosis method for rotating machinery with limited multisensor fusion samples is proposed in this paper. Firstly, the dimensionality of the collected multisensor data is reduced to three-channel features by PCA and segmented signals of each channel is transformed into two-dimensional images. Meanwhile, RGB images are obtained by fusing three-channel two-dimensional images. Subsequently, the limited multisensor fusion samples are augmented by applying the proposed FAWGAN-GP. Eventually, the augmented dataset is employed to train the residual CNN classifier for fault diagnosis.

Through experimentation on both the KAT-bearing dataset and WT gearbox dataset, the effectiveness of the proposed method was validated from multiple aspects as well as contrastive analysis. The average classification accuracy achieved was 99.9%, 99.65%, 99.6% and 98.7%, respectively, when training samples per class were 50, 35, 25 and 15, respectively, on the KAT-bearing dataset. Additionally, to validate the individual impact of multisensors, the FA mechanism and the residual CNN classifier on classification accuracy, four ablation experiments and the experiment of comparison with different classifiers were designed to demonstrate the positive contributions of the individual component to the fault diagnosis of rotating machinery. Multisensor fusion and the FA mechanism had an average improvement of 1.51% and 1.09%, respectively, in the diagnostic accuracy of the WT gearbox dataset, and the residual CNN classifier achieved the highest classification accuracy compared with other different CNN classifiers.

Although the proposed fault diagnosis method effectively enhances the diagnosis performance and solves the problem of real-world industry with limited samples, the diagnostic accuracy needs to be further improved in the condition of real industry with extremely limited samples. Meanwhile, the proposed method just considers the problem of limited samples, but there are challenges with limited samples and variable working conditions in rotating machinery. Therefore, future work will be conducted to address these issues.

Author Contributions

Conceptualization and review of this manuscript, W.F.; methodology, software, experiments and writing original draft, K.Y.; supervision and project management, B.W. and Y.S.; software, S.L. and B.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Natural Science Foundation of Hubei Province of China (No. 2022CFB935) and Open Fund of Hubei Key Laboratory for Operation and Control of Cascaded Hydropower Station (No. 2022KJX10).

Data Availability Statement

The KAT-bearing dataset and WT gearbox dataset in this paper are public datasets, which are available in Konstruktions- und Antriebstechnik (KAT)—Data Sets and Download (Universität Paderborn) (uni-paderborn.de) and Liudd-BJUT/WT-planetary-gearbox-dataset: WT-planetary-gearbox-datasets (github.com).

Conflicts of Interest

The authors declare no competing interests.

References

Yu, G.; Wu, P.; Lv, Z.; Hou, J.; Ma, B.; Han, Y. Few-shot fault diagnosis method of rotating machinery using novel MCGM based CNN. IEEE Trans. Industr. Inform. 2023, 19, 10944–10955. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, Q.; He, X.; Sun, G.; Zhou, D. Compound-fault diagnosis of rotating machinery: A fused imbalance learning method. IEEE Trans. Control. Syst. Technol. 2021, 29, 1462–1474. [Google Scholar] [CrossRef]
Gao, Z.; Cecati, C.; Ding, S. A survey of fault diagnosis and fault-tolerant techniques—Part I: Fault diagnosis with model-based and signal-based approaches. IEEE Trans. Ind. Electron. 2015, 62, 3757–3767. [Google Scholar] [CrossRef]
Liao, W.; Fu, W.; Yang, K.; Tan, C.; Huang, Y. Multi-scale residual neural network with enhanced gated recurrent unit for fault diagnosis of rolling bearing. Meas. Sci. Technol. 2024, 35, 056114. [Google Scholar] [CrossRef]
Wang, M.; Cao, H.; Ai, Z.; Zhang, Q. Fault diagnosis of ship ballast water system based on support vector machine optimized by improved sparrow search algorithm. IEEE Access 2024, 12, 17045–17057. [Google Scholar] [CrossRef]
Zhao, X.; Jia, M.; Bin, J.; Wang, T.; Liu, Z. Multiple-order graphical deep extreme learning machine for unsupervised fault diagnosis of rolling bearing. IEEE Trans. Instrum. Meas. 2021, 70, 3506012. [Google Scholar] [CrossRef]
Sun, Y.; Zhang, H.; Zhao, T.; Zou, Z.; Shen, B.; Yang, L. A new convolutional neural network with random forest method for hydrogen sensor fault diagnosis. IEEE Access 2020, 8, 85421–85430. [Google Scholar] [CrossRef]
He, C.; Shi, H.; Si, J.; Li, J. Physics-informed interpretable wavelet weight initialization and balanced dynamic adaptive threshold for intelligent fault diagnosis of rolling bearings. J. Manuf. Syst. 2023, 70, 579–592. [Google Scholar] [CrossRef]
Xie, T.; Huang, X.; Choi, S. Intelligent mechanical fault diagnosis using multisensor fusion and convolution neural network. IEEE Trans. Industr. Inform. 2022, 18, 3213–3223. [Google Scholar] [CrossRef]
An, Y.; Zhang, K.; Liu, Q.; Chai, Y.; Huang, X. Rolling bearing fault diagnosis method base on periodic sparse attention and LSTM. IEEE Sens. J. 2022, 22, 12044–12053. [Google Scholar] [CrossRef]
Jia, N.; Cheng, Y.; Liu, Y.; Tian, Y. Intelligent fault diagnosis of rotating machines based on wavelet time-frequency diagram and optimized stacked denoising auto-encoder. IEEE Sens. J. 2022, 22, 17139–17150. [Google Scholar] [CrossRef]
Xing, S.; Lei, Y.; Wang, S.; Jia, F. Distribution-invariant deep belief network for intelligent fault diagnosis of machines under new working conditions. IEEE Trans. Ind. Electron. 2021, 68, 2617–2625. [Google Scholar] [CrossRef]
Hou, X.; Du, F.; Huang, K.; Qiu, J.; Zhong, X. A current-based fault diagnosis method for rotating machinery with limited Training samples. IEEE Trans. Instrum. Meas. 2023, 72, 3530414. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems 27, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
Gao, H.; Zhang, X.; Gao, X.; Li, F.; Han, H. ICoT-GAN: Integrated convolutional transformer GAN for rolling bearings fault diagnosis under limited data condition. IEEE Trans. Instrum. Meas. 2023, 72, 3515114. [Google Scholar] [CrossRef]
Fan, J.; Yuan, X.; Miao, Z.; Sun, Z.; Mei, X.; Zhou, F. Full attention wasserstein GAN with gradient normalization for fault diagnosis under imbalanced data. IEEE Trans. Instrum. Meas. 2022, 71, 3517516. [Google Scholar] [CrossRef]
Yang, J.; Liu, J.; Xie, J.; Wang, C.; Ding, T. Conditional GAN and 2-D CNN for bearing fault diagnosis with small samples. IEEE Trans. Instrum. Meas. 2021, 70, 3525712. [Google Scholar] [CrossRef]
Wan, W.; He, S.; Chen, J.; Li, A.; Feng, Y. QSCGAN: An un-supervised quick self-attention convolutional GAN for LRE bearing fault diagnosis under limited label-lacked data. IEEE Trans. Instrum. Meas. 2021, 70, 3527816. [Google Scholar] [CrossRef]
Zhong, Z.; Liu, H.; Mao, W.; Xie, X.; Hao, W.; Cui, Y. Imbalanced bearing fault diagnosis based on RFH-GAN and PSA-DRSN. IEEE Access 2023, 11, 131926–131938. [Google Scholar] [CrossRef]
Li, W.; Zhong, X.; Shao, H.; Cai, B.; Yang, X. Multi-mode data augmentation and fault diagnosis of rotating machinery using modified ACGAN designed with new framework. Adv. Eng. Inform. 2022, 52, 101552. [Google Scholar] [CrossRef]
Huo, Z.; Martínez-García, M.; Zhang, Y.; Shu, L. A multisensor information fusion method for high-reliability fault diagnosis of rotating machinery. IEEE Trans. Instrum. Meas. 2022, 71, 3500412. [Google Scholar] [CrossRef]
Ma, M.; Sun, C.; Chen, X. Deep coupling autoencoder for fault diagnosis with multimodal sensory data. IEEE Trans. Industr. Inform. 2018, 14, 1137–1145. [Google Scholar] [CrossRef]
Chen, H.; Hu, N.; Cheng, Z.; Zhang, L.; Zhang, Y. A deep convolutional neural network based fusion method of two-direction vibration signal data for health state identification of planetary gearboxes. Measurement 2019, 146, 268–278. [Google Scholar] [CrossRef]
Tian, J.; Xie, G.; Zhang, X.; Nie, X.; Li, L.; Zhang, S. Fault diagnosis with robustness and lightweight synergy under noisy environment. IEEE Sens. J. 2023, 23, 16351–16362. [Google Scholar] [CrossRef]
Pan, Y.; An, R.; Fu, D.; Zheng, Z.; Yang, Z. Unsupervised fault detection with a decision fusion method based on bayesian in the pumping unit. IEEE Sens. J. 2021, 21, 21829–21838. [Google Scholar] [CrossRef]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 214–223. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved training of wasserstein gans. In Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Jia, Z.; Liu, Z.; Vong, C.; Pecht, M. A rotating machinery fault diagnosis method based on feature learning of thermal images. IEEE Access 2017, 7, 12348–12359. [Google Scholar] [CrossRef]
Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016. [Google Scholar] [CrossRef]
Lessmeier, C.; Christian, J.; Zimmer, D.; Sextro, W. Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification. In Proceedings of the PHM Society European Conference, Bilbao, Spain, 5–8 July 2016; Volume 3. [Google Scholar]
Liu, D.; Cui, L.; Cheng, W. A review on deep learning in planetary gearbox health state recognition: Methods, applications, and dataset publication. Meas. Sci. Technol. 2023, 35, 012002. [Google Scholar] [CrossRef]

Figure 1. The basic structure of GAN.

Figure 2. The structure of the proposed FA mechanism.

Figure 3. The structure of the proposed FAWGAN-GP.

Figure 4. The proposed overall framework diagram.

Figure 5. KAT test platform.

Figure 6. Generated samples by training dataset A.

Figure 7. Confusion matrix for four datasets in the KAT.

Figure 8. T-SNE for four datasets in the KAT.

Figure 9. Comparison of the number of generated samples.

Figure 10. WT test platform.

Figure 11. (a) Internal structure of planetary gearbox. (b) Healthy (NO), (c) broken tooth (BT), (d) worn gear (WR), (e) root crack (RC), (f) missing tooth (MT).

Figure 12. Hyperparameters of learning rates and GP term coefficient experiments.

Figure 13. Ablation experiment of four datasets.

Figure 14. Comparative results of different classifiers.

Table 1. The network parameters and structure of the proposed FAWGAN-GP.

Network	Layer	Operation	Strides	Activation Function
Discriminator	Input (64 × 64 × 3)	—	—	—
	Conv2D (4 × 4 × 32)	None	2 × 2	LReLU
	Conv2D (4 × 4 × 64)	None	2 × 2	LReLU
	FA (2 × 2 × 64)	None	1 × 1	LReLU
	Conv2D (4 × 4 × 128)	None	2 × 2	LReLU
	FA (2 × 2 × 128)	None	1 × 1	LReLU
	Conv2D (4 × 4 × 256)	None	2 × 2	LReLU
	Dense (1)	Flatten	—	None
Generator	Input (100)	—	—	—
	Dense (4 × 4 × 512)	Reshape	—	GELU
	RB (4 × 4 × 128)	BN	2 × 2	GELU
	RB (4 × 4 × 64)	BN	2 × 2	GELU
	FA (2 × 2 × 64)	BN	1 × 1	LReLU
	RB (4 × 4 × 32)	BN	2 × 2	GELU
	FA (2 × 2 × 32)	BN	1 × 1	LReLU
	Conv2Dtranspose (4 × 4 × 3)	BN	2 × 2	Tanh
Residual Block (RB)	Conv2Dtranspose (4 × 4)	BN	2 × 2	GELU
	Conv2Dtranspose (4 × 4)	BN	1 × 1	GELU
	Conv2Dtranspose (4 × 4)	Add	2 × 2	None

Table 2. The network parameters and structure of classifier.

Network	Layer	Operation	Map	Strides	Activation Function
Classifier	Conv2D (3 × 3)	BN	16	2 × 2	LReLU
	RB1	BN	16	1 × 1	LReLU
	RB1	BN	16	1 × 1	LReLU
	RB2	BN	32	2 × 2	LReLU
	RB2	BN	64	2 × 2	LReLU
	AveragePooling2D	8	64	N/A	N/A
	Flatten	N/A	N/A	N/A	LReLU
	Dense	Class_num	N/A	N/A	Softmax
Residual Block1 (RB1)	Conv2D (3 × 3)	BN	16	1 × 1	LReLU
	Conv2D (3 × 3)	BN	16	1 × 1	N/A
	N/A	Add	N/A	N/A	N/A
Residual Block2 (RB2)	Conv2D (3 × 3)	BN	32/64	2 × 2	LReLU
	Conv2D (3 × 3)	BN	32/64	1 × 1	LReLU
	Conv2D (3 × 3)	Add	32/64	2 × 2	LReLU
	Conv2D (3 × 3)	BN	32/64	1 × 1	LReLU
	Conv2D (3 × 3)	BN	32/64	1 × 1	LReLU
	N/A	Add	N/A	N/A	N/A

Table 3. Dataset with different training samples and generative samples in Case 1.

Train Dataset	Generative Samples	Test Dataset	Health Status	Class Labels
A/B/C/D	A1/B1/C1/D1	A2/B2/C2/D2
50/35/25/15	250/150/100/50	100/100/100/100	IORF	0
50/35/25/15	250/150/100/50	100/100/100/100	NO	1
50/35/25/15	250/150/100/50	100/100/100/100	IRF	2
50/35/25/15	250/150/100/50	100/100/100/100	ORF	3

Table 4. Comparison of classification accuracy of different models.

Models	Accuracy of Diagnosis (%)
Models	Dataset A	Dataset B	Dataset C	Dataset D
DCGAN	98.175 ± 0.500	97.750 ± 0.889	96.500 ± 0.833	94.250 ± 1.445
ACGAN	85.000 ± 2.861	70.250 ± 4.236	54.500 ± 3.778	45.500 ± 8.444
MACGAN	88.250 ± 1.264	71.750 ± 1.375	53.750 ± 2.111	47.000 ± 1.556
WGAN-GP	99.200 ± 0.317	98.225 ± 0.756	97.900 ± 1.000	97.500 ± 0.792
FAWGAN-GP	99.900 ± 0.031	99.650 ± 0.114	99.600 ± 0.128	98.700 ± 0.372

Table 5. Dataset with different training samples in Case 2.

Train Dataset	Generative Samples	Test Dataset	Health Status	Class Labels
A/B/C/D	A1/B1/C1/D1	A2/B2/C2/D2
75/65/50/35	150/150/150/150	100/100/100	BT	0
75/65/50/35	150/150/150/150	100/100/100	NO	1
75/65/50/35	150/150/150/150	100/100/100	MT	2
75/65/50/35	150/150/150/150	100/100/100	RC	3
75/65/50/35	150/150/150/150	100/100/100	WR	4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fu, W.; Yang, K.; Wen, B.; Shan, Y.; Li, S.; Zheng, B. Rotating Machinery Fault Diagnosis with Limited Multisensor Fusion Samples by Fused Attention-Guided Wasserstein GAN. Symmetry 2024, 16, 285. https://doi.org/10.3390/sym16030285

AMA Style

Fu W, Yang K, Wen B, Shan Y, Li S, Zheng B. Rotating Machinery Fault Diagnosis with Limited Multisensor Fusion Samples by Fused Attention-Guided Wasserstein GAN. Symmetry. 2024; 16(3):285. https://doi.org/10.3390/sym16030285

Chicago/Turabian Style

Fu, Wenlong, Ke Yang, Bin Wen, Yahui Shan, Shuai Li, and Bo Zheng. 2024. "Rotating Machinery Fault Diagnosis with Limited Multisensor Fusion Samples by Fused Attention-Guided Wasserstein GAN" Symmetry 16, no. 3: 285. https://doi.org/10.3390/sym16030285

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Rotating Machinery Fault Diagnosis with Limited Multisensor Fusion Samples by Fused Attention-Guided Wasserstein GAN

Abstract

1. Introduction

2. Basic Theory

2.1. Multisensor Fusion

2.2. WGAN

2.3. Self-Attention Mechanism

3. The Proposed Method

3.1. Generative Model Based on WGAN-GP

3.2. The FA Mechanism Module

3.2.1. Lightweight Channel Attention Mechanism

3.2.2. The Improved Self-Attention Mechanism Module

3.2.3. Integration Operation

3.3. FAWGAN-GP-Based Data Augmentation Approach

3.4. Overall Framework of Fault Diagnosis

4. Case Studies

4.1. Case 1: KAT Bearing Dataset

4.2. Case 2: WT Gearbox Dataset

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI