*3.3. Procedures of Proposed Fusion Domain-Adaptation CNN*

This section presents the summaries of the proposed fusion domain-adaptation CNN as shown in Figure 1. The main procedures are described as follows.

Collect the infrared thermal images and raw vibration signals from the concerned gearboxes under different working conditions and divide them into labeled source domain samples and unlabeled target domain samples.

Convert the raw time-domain signals into frequency domain signals and the squared envelope spectrum and arrange them into matrixes.

Fuse the RGB 3-channels of infrared thermal image and two matrixes (frequency domain and squared envelope spectrum) to obtain 5-channel fusion samples.

Train the FDACNN model using 5-channel fusion samples by adversarial training.

Test the performance of the proposed FDACNN model by using the remaining samples from the target domain.

#### **4. Experimental and Result Discussion**

#### *4.1. Dataset Descriptions*

Test data are from a compound gear failure experiment performed on a helical gearbox called Spectra Quest Mechanical Failure Simulator (MFS) from Northwestern Polytechnical University lab [33,34]. The experiment system and the layout of the experiment rig are shown in Figure 2a,b, respectively. The experiment system mainly consists of an AC motor, two gearboxes and a generator. The infrared camera is fixed on the front of gearbox 1 to collect the infrared thermal image. The detailed parameters of the infrared camera are listed in Table 1. Vibration signals are collected by an acceleration sensor mounted on the surface of the gearbox 1. The sample frequency is 12.8 kHz, and the motor speed is 3000 rpm. In this experiment, there are two kinds of lubricating oil, i.e., EP 320 and EP 100. EP 320 lubricant viscosity is 320cSt @ 40 ◦C, EP 100 lubricant viscosity is 100cSt @ 40 ◦C. The EP 320 lubricant is applied in this article.

In this study, five different health states are introduced, including a normal state, two structural fault states (TB 50 and TB 100) and two non-structural fault states (OS 1500 and OS 2000). "TB 50" and "TB 100" refer to 50% and 100% tooth breakage in driving gear, respectively. Based on the baseline oil of 2600 mL, "OS 1500" and "OS 2000" refer to the reduction of 600 mL and 1100 mL of oil from GB1, respectively. Vibration signals and infrared images were collected under four different loads of 0%, 30%, 70% and 100% (L0, L30, L70 and L 100). For each load, vibration signals in each state were divided into 800 samples with 2048 data points. Four-hundred and eighty samples were randomly selected as tests, and the remaining 320 samples were used to train. Similarly, 480 infrared images were used to train, and the other 320 were used to test. The size of each infrared thermal image is 64 × 32. The details of the dataset are listed in Table 2.

#### *4.2. Implementation Details*

At first, data-level fusion strategy is used to fuse infrared thermal images and vibration signals. The measured raw samples in five different health states are transformed to acquire frequency domain signals by FFT and CS2 by squared envelope spectrum in this study. Under L0 load, the three kinds of signal waveforms of different health states are shown in Figure 3. As shown in Figure 3, the left column is the measured raw signals, and the middle column and the right column are the corresponding spectral distribution and squared envelope spectrum, respectively. It can be observed that the time-domain characteristics and frequency domain characteristics of each health state are relatively similar and difficult to distinguish. Then, frequency domain signals and CS2 sequence are arranged into 2 × 64 × 32 formats.

**Figure 2.** The gear box fault simulator system: (**a**) the experimental test rig; (**b**) the layout of the test rig.


**Table 1.** The detailed parameters of the infrared camera.

**Table 2.** 5 health states of gearbox.


**Figure 3.** Raw signals, spectral distribution and squared envelope spectrum of different health states. (**a**) Normal; (**b**) TB 50; (**c**) TB 100; (**d**) OS 1500; (**e**) OS 2000.

Each infrared thermal image has RGB channels, i.e., 3 × 64 × 32 formats. The collected infrared thermal images of each health state under L0 are shown in Figure 4. From Figure 4, we can observe that the images of normal, OS 1500 and OS 2000 are relatively similar, and the images of TB 50 and TB 100 are relatively similar. However, it is still very difficult to visually distinguish concrete health states. RGB channels of the infrared thermal image (3 × 64 × 32) will be combined with the frequency domain signals and CS2 (2 × 64 × 32), and the 5-channel fusion samples (5 × 64 × 32) are obtained to train DACNN.

**Figure 4.** The infrared thermal image of different health states.

In DACNN, a feature extractor, a domain discriminator and a state classifier were constructed, and the structures of those are listed in Table 3. The DACNN is trained by adversarial training using fusion data. In order to illustrate the robustness of the proposed method, multiple test tasks are designed. The concrete setting of different tasks and the results are listed in Table 4. It can be observed that the proposed method has good performance among the five test tasks, especially the accuracy of reaching 100.00% in T1, T3 and T4. In task T5, the transfer span is larger from load L0 to L100. The accuracy of the proposed method can still reach 96.67%. This suggests that the fusion of infrared thermal images and vibration signals to implement cross-domain fault diagnosis has good performance, and the result is relatively robust.

**Table 3.** The structures of features extractor, domain discriminator and states classifier.


**Table 4.** Result of different test tasks.

