*2.4. Collaborative Training Mechanism of the GAN and CNN*

Once the modification for the GAN loss function has been determined, the next step is to train a GAN in cooperation with a CNN. The collaborative training process is demonstrated in Figure 1. Generally, a GAN provides a more balanced dataset for CNN to improve its fault diagnosis accuracy. Whereas CNN evaluates the GAN's generated dataset and outputs its fault classification result as a correction term in the generator's loss function to improve the GAN's data-generation quality, under the collaborative training structure, both CNN and GAN performance can be enhanced. Specifically, as shown in Figure 1, the CNN is firstly built based on the unbalanced dataset, and its classification error is supposed to be high. Meanwhile, the discriminator, as well as the generator, of the GAN are established. Initially, the generator does not work so well, and the generated samples are not so similar to the original ones. The next step is to optimize the CNN and GAN collaboratively. During the optimization process, the GAN's generator learns to generate samples similar to the original signal. The newly generated samples are immediately added to the training dataset of the CNN so that the dataset imbalance can be reduced. When the Nash equilibrium is reached, which is defined as *D*(*xreal*) = *D*(*xf ake*) = 0.5, the optimization process stops. Lastly, the GAN's generator is used to extend the original dataset and fine-tune the CNN with the extended dataset. The architecture of the GAN proposed in this paper is detailed in Figure 2. Tables 1 and 2 summarize the hyperparameters of the GAN and CNN, respectively.

**Figure 1.** Collaborative training structure of the GAN and CNN.

**Figure 2.** Architecture of generator and discriminator in the GAN.

**Table 1.** Hyperparameters of the GAN.


**Table 2.** Hyperparameters of the CNN.


#### **3. Experimental dataset**

*3.1. Introduction of Bearing Test Bench and Dataset*

Experimental data for validation comes from the Xi'an Jiaotong University (XJTU-SY) bearing test bench [19]. As shown in Figure 3, the bearing accelerated life test bench consists of an alternating current induction motor, motor speed controller, supporting shaft, supporting bearing, hydraulic loading system, and test bearing. The test bearing type is LDK UER204, and its basic parameters are summarized in Table 3. The bearing works under 3 different conditions, as specified in the first column of Table 4, where *fs* stands for the shaft frequency, and *Fr* the radial loading force. Both the axial and radial accelerations are measured at a sampling frequency of 25.6 kHz, and the sampling interval between any two measurements is defined as 1 min, and each sampling lasts for 1.28 s. Under each condition, 5 bearings are tested, such as bearing 1\_1–1\_5 under condition 1. As each test bearing has a different lifetime, the measurement sample size of each test bearing varies from one to another.

**Figure 3.** XJTU-SY experimental setup [19].

**Table 3.** Specifications of bearing parameters.


Due to the inherent micro-anisotropy and different working conditions, the lifetime and failure location of the test bearing differ from each other. For a single fault, there are 3 fault types in total, namely the outer race fault, the inner race fault, and the cage fault. Moreover, there are two datasets, bearing 1\_5 and bearing 3\_2, containing the measurements of compound fault. To simplify the labeling process, only a single fault is considered in this paper. As summarized in Table 4, the number of total samples is large enough for CNN training. However, the dataset is extremely unbalanced. For the most test bearings under all 3 conditions, the failure occurs on the outer ring, with very limited samples on the inner ring and the cage.


**Table 4.** Data specification of XJTU-SY bearing dataset.

#### *3.2. Data Preprocessing*

The XJTU-SY bearing dataset has recorded the bearing acceleration during the whole life cycle. The test bench runs continuously until the acceleration amplitude exceeds 10 × *Anormal*, which is defined as the failure point. Here, *Anormal* is the maximum amplitude of the horizontal or vertical vibration signals when the bearing runs in the normal operating stage. The fault location in Table 4 stands for position where the fault happens when bearing finally fails. In order to extract the sufficient measurement data for the fault classification while maintaining the correct labels, the signals with acceleration amplitude between 2 × *Anormal* and 10 × *Anormal* are regarded as the fault signals, as shown in Figure 4. All the measurement samples in the fault period are labeled with the corresponding final failure position, such as 1 for the cage fault, 2 for the inner race fault, and 3 for the outer race fault.

**Figure 4.** Complete life cycle of bearing 1\_1.

After preparation for the valid source data and labels, the next step is the data preprocessing. At first, the original measurement is denoised by 3-level wavelet decomposition, with *Symlet*4 as the mother wavelet. After the noise cancellation for the high-frequency components, the data is normalized by z-score. Finally, the normalized data is transformed from 1D to 2D, which means that the acceleration series are sliced into fragments with the same length and then stacked row by row to build a matrix, as illustrated in Figure 5. In each sample, there are a total of 32,768 points of data in each sample. Therefore, the size of 2D matrix is determined as 181 × 181, and the reshaped 2D matrix is fed into GAN and CNN as images. All the work in this study is conducted in MATLAB Deep Network Designer.

**Figure 5.** Illustration of data reshape.

#### **4. Results and Analysis**

*4.1. Fault Data Generation Based on Optimized GAN*

According to Table 5, there are significantly more samples for the outer race fault than for the inner race fault and the cage fault. Consequently, generating more samples for the inner race fault and the cage fault is paramount to reduce the dataset imbalance. It should be noticed that the inner race fault samples consist of data from bearing 2\_1, bearing 3\_3, and bearing 3\_4, while the cage fault samples consist of data from bearing 1\_4 and bearing 2\_3. This means both the inner race and cage faults have measurement samples collected from different working conditions that define different data distributions. Furthermore, each test bearing has totally different aging dynamics, which can be deduced from their full life cycle trajectories [19]. As a result, the GANs for these datasets need to be trained individually. Bearing 1\_4 has only one sample and is, hence, not feasible for the fault diagnosis. In total, 4 GANs need to be established for bearing 2\_1, bearing 2\_3, bearing 3\_3, and bearing 3\_4.


**Table 5.** Sample size of different fault types.

The data samples generated by a general GAN and an optimized GAN are illustrated in Figure 6 and compared with the original ones after normalization. Specifically, Figure 6(a1) stands for the original signal of a measurement sample from bearing 2\_1, Figure 6(a2) is the corresponding sample generated by the general GAN, and Figure 6(a3) shows the sample generated by the optimized GAN. Likewise, Figure 6(b1–b3) are the result for bearing sample 2\_3, and Figure 6(c1–c3) for bearing sample 3\_3. Take the inner race fault bearing 2\_1 as an example; both GANs produce the samples with high similarity to the original ones measured in time domain, and even the peaks are accurately rebuilt. It can be further noticed that the optimized GAN generates a much more accurate peak amplitude than the general GAN. In order to evaluate the GAN's data-generation quality in time domain, every sample is regarded as a vector *x* (*<sup>x</sup>* <sup>∈</sup> RD), and every sampling point *x<sup>i</sup>* as an element in the vector.

The similarity between the generated sample and the original one can be measured by the angle between two corresponding vectors. Therefore, cosine similarity is adopted as a time domain similarity metric, which is defined as follows:

$$\cos \theta = \frac{\vec{m} \cdot \vec{n}}{|\vec{m}| \cdot |\vec{n}| }\tag{11}$$

where *m* and *n* stand for the acceleration series from the original measurement and the generated sample, respectively, with *m*- = {*x*1, *x*2, ··· , *xL*} and *n* = *x* - <sup>1</sup>, *x* - <sup>2</sup>, ··· , *x* - *L* . |*m*- | and |*n*| identify the 2-norm of *m* and*n*, respectively.

The cosine similarity results are summarized in Table 6. For all 3 cases, the sample generated by the optimized GAN has higher cosine similarity to the original one than that produced by the general GAN, which proves the superiority of the optimized GAN in the high-quality data generation. Additionally, the reason why the cosine similarity is relatively small can be explained as the acceleration values change within a big range of [−5, 5], and the signal length is up to 32,761, which means any difference in acceleration amplitude or direction or time lag between counterpart points will bring big accumulative deviation. Besides, the assumption by taking the acceleration signal as 1D vector may not be so feasible when it contains too many elements, which needs further exploration in the future, such as using the Fréchet distance to replace the cosine similarity [23].

**Figure 6.** Comparison between original sample and generated sample in time domain; (**a**), (**b**) and (**c**) represent bearing 2\_1, bearing 2\_3 and bearing 3\_3 respectively, while (**1**), (**2**) and (**3**) represent the original sample, general GAN and optimized GAN respectively.


Apart from the overall similarity in time domain, the signal characteristics in the frequency domain are the same or even more important for the fault diagnosis. In this study, the envelope spectrum is processed on the original and generated samples. As only the 1st to 5th *FCFs* are considered in this study, the signal is first filtered by a low-pass filter of 1000 Hz, and then the envelope spectrum is extracted by Hilbert transform and Fast Fourier Transform. The results are displayed in Figures 7–9. Take Figure 7 as an example, which gives the envelope spectrum of bearing 2\_1, where the black line is the result of the original measurement, the blue line stands for the sample generated by the general GAN, and the red line symbolizes the sample from the optimized GAN. The theoretical *BPFI* is also provided by the green dash line. We can find that the envelope spectrum of samples generated by the optimized GAN is similar to the original one, while it appears clearly different from that of the samples generated by the general GAN, especially the amplitudes at the real fault characteristic frequencies. Two locally enlarged views in Figure 7 show that the amplitude from the sample generated by the optimized GAN is much closer to that of the original sample, compared with the sample from the general GAN. The phenomenon is the same for the inner race fault (bearing 3\_3), as well as the cage fault (bearing 2\_3), which confirms that the optimized GAN can efficiently promote the generated signals to capture more accurate fault characteristics in the frequency domain. As for the other peaks besides fault characteristic ones, especially for the inner race fault, we can find that most of them are caused by the modulation from the shaft frequency and its harmonics, which is

consistent with the previous research [24]. Additionally, the deviation between the actual *FCFs* and the corresponding theoretical values can be explained by many factors, such as the frequency resolution of 0.7814 Hz, the occurrence of rolling element sliding, and the transient contact angles under high external load.

**Figure 7.** Envelope spectrum comparison: inner race fault of bearing 2\_1.

**Figure 8.** Envelope spectrum comparison: inner race fault of bearing 3\_3.

**Figure 9.** Envelope spectrum comparison: cage fault of bearing 2\_3.

Tables 7–9 summarize the sample frequencies and amplitudes at the corresponding *FCF* and harmonics, as well as the relative error percentage of these two features between the generated and original samples. The comparison in Table 7 shows that, for all the 1st– 5th order *BPFI*s, the frequencies and amplitudes of samples generated by the optimized GAN are much closer to the original ones than those of samples produced by the general GAN. For the sample generated by the optimized GAN, the frequency error percentage under all five orders of *BPFI* is zero, while the sample generated by the general GAN cannot fully capture the actual *BPFI* in the original ones, even though the deviation error is 0.34% and only exists in the 5th order *BPFI*. However, if we focus on the amplitudes under *BPFI*, the optimized GAN shows much more superiority over the general one. The amplitude errors under all 5 orders of *BPFI* from the samples generated by the optimized GAN are much smaller than those from the general GAN. Take the 2nd *BPFI* as an example; the actual amplitude from the original samples is 0.062, while the corresponding amplitudes of the samples from the general GAN and the optimized GAN are 0.023 and 0.047, respectively. The relative error percentage of amplitude drops from 62.0% to 23.8%. The above analysis confirms that the modification term *Lf requency* in the GAN's generator loss function can enable the GAN to capture the fault information in the frequency domain. The same conclusion can be also drawn based on the results in Tables 8 and 9.




**Table 8.** Amplitudes and frequencies of bearing 3\_3 at 1st–5th *BPFI*.

**Table 9.** Amplitudes and frequencies of bearing 2\_3 at 1st–5th *FTF*.


In summary, data generation results show that both the general GAN and the optimized GAN can generate similar samples compared to the original ones. However, the samples generated by the optimized GAN have higher similarity to the original one than that generated by the general GAN, especially at the *FCF* and harmonics in the frequency domain. More specifically, data generation for one fault type under different working conditions, such as bearing 2\_1 and bearing 3\_3, proves that the optimized GAN method can be applied to the bearings under the different working conditions. Furthermore, the results of bearing 2\_1 (inner race fault) and bearing 2\_3 (cage fault) demonstrate that the optimized GAN method adapts to the bearings with different defect types.
