**1. Introduction**

As an indispensable component in rotating machines, bearing health status directly affects or even determines the equipment service life. However, in practice, a bearing usually works under extreme and harsh conditions, which makes the bearing prone to faults [1]. Therefore, the timely and accurate fault diagnosis is crucial to reduce the maintenance costs and avoid serious accidents.

In recent years, the data-driven fault diagnosis has been attracting more and more attention from both academia and industry. Among the various data-driven methods, Convolution Neural Network (CNN) and Long Short Term Memory (LSTM) are the most widely used due to their powerful abilities in the complex feature extraction and nonlinear mapping. CNN was first employed in the bearing fault diagnosis by O. Janssens in 2016 [2], and, since then, many improvements have proposed to enhance the CNN's performance, such as 1D-CNN, 2D-CNN, multiscale CNN, and adaptive CNN [3–6]. Russell Sabir adopted LSTM for the bearing fault diagnosis based on the motor current signal and obtained a classification accuracy of 96% [7]. L. Yu and D. Qiu proposed the stacked LSTM and the bidirectional LSTM, respectively, and both LSTMs obtained an accuracy of more

**Citation:** Ruan, D.; Song, X.; Gühmann, C.; Yan, J. Collaborative Optimization of CNN and GAN for Bearing Fault Diagnosis under Unbalanced Datasets. *Lubricants* **2021**, *9*, 105. https://doi.org/10.3390/ lubricants9100105

Received: 23 August 2021 Accepted: 8 October 2021 Published: 15 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

than 99% on the bearing fault diagnosis [8,9]. H. Pan combined 1D-CNN and 1D-LSTM into a unified structure by using the CNN's output into LSTM, achieving a satisfactory test accuracy up to 99.6% [10].

Although many sound results have been reported in the deep learning-based fault diagnosis, there are still many challenges to be solved. For example, all the studies mentioned above assume that there are plenty of high quality data for the deep network training. However, in many applications, the available history or experimental data is very limited or data provided is severely unbalanced. For example, the sample size under some fault classes is extremely smaller compared with the others. Either insufficient or unbalanced data will cause the serious performance reduction of deep networks. According to D. Xiao's work, when the training set samples were reduced from 1000 to 150, the CNN's accuracy declined correspondingly from 97.2% to 83.9% [11]. When the imbalance ratio increased from 2:1 to 40:1, the fault classification accuracy for the outer ring fault based on the GAN-SAE dropped sharply from 97.79% to 20.95% [12].

To address this problem, scholars have proposed diverse methods. Oversampling was first proposed to solve the data imbalance, where the direct replication was used to generate more samples for such labels that had very few ones [13,14]. Although this method is simple and efficient, it easily causes overfitting since no new information is incorporated. As another prospective method for data generation, GAN has been already used for new sample generation in the fault diagnosis. Both W. Zhang and S. Shao employed GAN to learn the mapping between the noise distribution and the actual machinery vibration data to expand available dataset. The results confirmed that the diagnosis accuracy could be improved once the imbalanced data was augmented by GAN [15,16]. However, when building and evaluating the GAN, published research studies only focus on the overall similarity between the generated data and the original one, which inevitably brings problems in the data quality. Small loss function in the general GAN only means that the generated data has a high similarity to the original signal, but it does not guarantee that the generated signal has captured the important characteristics of the original signal. When generating more samples for the unbalanced datasets in the fault diagnosis, it is important to ensure that the generated sample carries the same or nearly the same fault information as the original one, which includes both time and frequency domain characteristics. For this reason, an improved GAN is proposed in this paper and applied to generate samples for an unbalanced experimental dataset which is further used in the CNN-based fault diagnosis.

The main innovations of this paper include: (1) A GAN, together with a CNN, and two networks are optimized in cooperation. The GAN generates a more balanced dataset for the CNN, and the CNN evaluates the quality of the GAN's data generation. Both networks contribute to each other in performance improvement. (2) The fault characterization information is used to improve the general entropy-based loss function in the GAN. The amplitude and frequency errors in the envelope spectrum between the experimental and generated samples are taken as a correction term in the GAN's loss function to enable the GAN to produce samples with higher fidelity and identify more fault information.

The remaining part of this paper is organized as follows. Section 2 details the theory and methodology of the GAN, CNN, and loss function improvement. Section 3 describes the test bench and experimental dataset. Section 4 discusses and analyzes the results. Section 5 concludes the whole paper.

#### **2. Methodology**

#### *2.1. Theory of the GAN*

A GAN generates new data without any prior knowledge of the probability density function of original data. It mainly consists of a generator and a discriminator. The discriminator determines whether a sample comes from the original or generated dataset. On the contrary, the generator tries to produce data similar to the original one so that the

discriminator can hardly make right decisions. In the general GAN, the loss functions of generator and discriminator are defined as Equations (1) and (2), respectively [15]:

$$L\_G = -\frac{1}{K} \sum\_{i=1}^{K} \log \left( D \left( \mathbf{x}\_{fakx}^i \right) \right), \tag{1}$$

$$L\_D = -\frac{1}{J} \sum\_{m=1}^{J} \log(D(\mathbf{x}\_{real}^{\rm m})) - \frac{1}{K} \sum\_{i=1}^{K} \log\left(1 - D\left(\mathbf{x}\_{fake}^{i}\right)\right),\tag{2}$$

where *J* is the number of real samples, and *K* is the number of generated samples. *x<sup>m</sup> real* represents the data samples coming from the real training dataset, and *x<sup>i</sup> f ake* denotes the data samples from GAN generator. *D*(*x<sup>m</sup> real*) designates the output of discriminator *D* with the input data sample *x<sup>m</sup> real*.

Based on the loss function *LG* and *LD*, the GAN can be trained as a minmax two-player game until the global optimum, *D*(*xreal*) = *D*(*xf ake*) = 0.5, is reached. This indicates that the generated data from the generator is so similar to the real one that the discriminator cannot tell the difference.

#### *2.2. Fault Data Generation Based on GAN and CNN*

The direct task of a GAN is to generate more samples for the labels with limited measurements. However, the ultimate goal is to improve the data-driven fault-diagnosis method performance when it deals with the imbalanced datasets. Therefore, it is reasonable to take the final fault-diagnosis results into consideration when constructing a GAN so that the data generated can indeed sharpen the algorithm's fault-diagnosis ability. In this paper, to facilitate research, a CNN is selected as a representative of the data-driven fault-diagnosis methods, and the diagnosis task is focused on the fault classification, so its performance is evaluated by the cross-entropy, as shown in Equation (3). The CNN's result is introduced as a correction term in the GAN's generator loss function as formulated in Equation (4):

$$L\_{\rm CNN} = -\sum\_{i=1}^{N} \mathbf{x}\_i \log(p\_i),\tag{3}$$

$$L\_{G'} = L\_G + \beta L\_{CNN'} \tag{4}$$

where *N* is the number of bearing fault types. *xi* = 1, if the input sample belongs to the bearing fault type *i*; otherwise, *xi* = 0. *pi* is the output of softmax function, which represents the probability that the input data belongs to the bearing fault type *i*. The formulation for *pi* is given in Equation (5), and it satisfies ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *pi* = 1 [17]. *β* is a scale factor to keep the loss functions of the GAN and CNN at the same range.

$$p\_i = \frac{\mathfrak{e}^{a\_i}}{\sum\_{i=1}^{N} \mathfrak{e}^{a\_i}}.\tag{5}$$

## *2.3. Improvement of Loss Function with Envelope Spectrum*

The general GAN can produce data with high similarity to the original measurement, as stated in the last sub-section. In theory, the data fidelity can be even improved when a CNN is employed to collaboratively optimize a GAN. However, until now, all the data points in a sample are treated equally, and the GAN's target is to keep the generated data as similar to the original one as possible. However, in the fault diagnosis, some data points contain more information than others. For example, once a fault occurs on a certain component, such as the outer and inner ring or the balls, the corresponding fault characteristic frequencies (*FCF*) will appear in the acceleration spectrum. Compared with the overall similarity, the frequency and amplitude at the fault characteristic frequencies contain much more information about the bearing health condition. Therefore, the error of amplitudes and frequencies between the original signal and the generated one at the fault

characteristic frequencies is defined as another correction term in the frequency domain as follows:

$$L\_{frequency} = \sum\_{i=1}^{N} \left( \left| M\_{real}^i - M\_{fake}^i \right| + \left| F\_{real}^i - F\_{fake}^i \right| \right), \tag{6}$$

where *N* denotes the maximum order of *FCF*, and *N* = 5 in this study. *M<sup>i</sup> real* and *<sup>M</sup><sup>i</sup> f ake* stand for the i-th order *FCF* amplitude from the real and generated sample. *F<sup>i</sup> real* and *<sup>F</sup><sup>i</sup> f ake* represent the i-th order *FCF* frequency from the real and generated sample. In addition, due to different value ranges of frequency and amplitude, in this study, the most widely used normalization method, MinMaxScaler [18], is applied to normalize the amplitudes and frequencies within the 5th-order *FCF* to the range of [0, 1].

Finally, *Lf requency* is combined with *LCNN* to construct the final loss function of the GAN's generator. As shown in Equation (7), the sum of *LCNN* and *Lf requency* is taken as a modification term in the general GAN's loss function *LG* to ensure the generated data from GAN has a high similarity and captures the important information in detail at the same time. *α* is a weight factor.

$$L\_{G''} = L\_G + \mathfrak{a} \left( L\_{\text{CNN}} + L\_{\text{frequency}} \right). \tag{7}$$

To obtain *Lf requency*, the first step is to calculate the theoretical *FCF*. The XJTU-SY dataset [19] introduced in the following section includes only three kinds of faults, namely the outer race fault, the inner race fault, and the cage fault. The theoretical *FCFs* for the aforementioned 3 fault types are the *BPFO* (Ball Passing Frequency on Outer race), *BPFI* (Ball Passing Frequency on Inner race), and *FTF* (Fundamental Train Frequency), respectively. Their formulations are listed as follows [20]:

$$BPFO = \frac{nf\_s}{2} \left( 1 - \frac{d}{D} \cos \alpha \right) \tag{8}$$

$$BPFI = \frac{nf\_s}{2} \left( 1 + \frac{d}{D} \cos \alpha \right),\tag{9}$$

$$FTF = \frac{f\_s}{2} \left( 1 - \frac{d}{D} \cos \alpha \right) \tag{10}$$

where *n* is the number of rolling elements, and *fs* means the shaft frequency. *d* represents the ball diameter, and *D* denotes the pitch diameter. *α* is the bearing contact angle.

After calculation of the theoretical *FCF*, the second step is to capture the actual *FCF* around corresponding theoretical values. The actual *FCF* can be affected by many factors, such as the shaft speed, external load, friction coefficient, raceway groove curvature, and the defect size [21,22]. Therefore, there exists bias between the theoretical *FCF* and the actual *FCF* in most cases. Besides, some harmonics of *FCF* influenced by modulation of other vibrations may not be detected in the test bench [22]. Thus, in this paper, the i-th order actual *FCF* is determined as the maximum peak in the interval of [0.95, 1.05] × *FCF*1*st* × *i*, where *FCF*1*st* is the first order theoretical *FCF*, and *i* is the current frequency order. The actual *FCF* of both the real measurement sample and generated sample are determined by above two steps. Once actual *FCF* is identified, the *Lf requency* can be obtained by Equation (6).
