*4.1. Dataset Description*

The IEEE PHM Challenge 2012 dataset was collected from PRONOSTIAtest platform, as shown in Figure 5, on which an accelerated degradation experiment was conducted to collect run-to-failure data within a few hours. The PRONOSTIA platform is composed of three parts: rotating part, load part, and data collection part. The rotating part has a motor with a power of 250 W. To accelerate degradation, the load part provides a 4000 N load for the rolling bearing. Vibration signals were collected using an accelerometer sensor placed in the horizontal direction. The sampling frequency was 25.6 kHz, while the data were recorded every 10 s. In total, seventeen bearings were selected to collect whole-life degradation data under three working conditions. The specific information of the working conditions is shown in Table 1.

In this experiment, the seven bearings (i.e., Bearing1\_1 to Bearing1\_7) under the first working condition were selected as the source domain data. Moreover, Bearing 2 and Bearing 3 under the second working condition (i.e., Bearing2\_2 and Bearing2\_3) were taken as the offline data in the target domain, and we took Bearing 1 (i.e., Bearing2\_1) and

Bearing 4 (i.e., Bearing2\_4) under the second working condition as the target bearings to be tested in the target domain.

**Figure 5.** PRONOSTIAtest platform [34].

**Table 1.** Description of the three working conditions in IEEE Prognostics and Health Management (PHM) Challenge 2012 dataset.


#### *4.2. Results of State Assessment*

In this section, the results of the state assessment are provided. Taking Bearing 1\_1 as an example, HHT was first run to get the marginal spectrum data for this bearing, and then, we chose the first 500 samples to train an SVDD model. The Gaussian radial basis function (RBF) kernel was adopted, and the regularization parameter and kernel parameter of SVDD were set to one and 0.001, respectively. After feeding the HHT spectrum data into the trained SVDD model sequentially, the results of the state assessment can be obtained. Table 2 shows the period of the normal state and the fault state of all seven bearings under the first working conditions. These results will be used as the label information for training a DTDA model in the next section.

#### *4.3. Results of Online Detection*

In this section, Bearing2\_1 and Bearing2\_4 under the second working condition are chosen as the target bearings to evaluate the effectiveness of the proposed method. Specifically, these two bearings have quite different degradation trends and noise levels in the normal state. Bearing2\_1 has a long period of slow degradation, while Bearing2\_4 has no apparent early fault state and quickly evolves to the fast degradation state. Therefore, these two bearings are believed to be representative enough to provide a comprehensive evaluation.


**Table 2.** State assessment results of the IEEE PHM Challenge 2012 dataset.

4.3.1. Results of Bearing2\_1

First, Figure 6 provides the visualized feature distribution after domain adaptation by DTDA. Here, two bearings (Bearing1\_2 and Bearing1\_3) in the source domain and two bearings (Bearing2\_2 and Bearing2\_3) in the target domain are chosen. For comparison, Figure 6 also provides the feature distribution by using the deep autoencoder (DAE) without domain adaptation. Here, PCA is used for visualization. From Figure 6b, before domain adaptation, the feature distribution of the bearings in the source domain (red points and blue points) and the target domain (purple points and green points) vary largely, which indicates that different working conditions have different data distribution characteristics. However, after domain adaptation by the DTDA, the feature distribution of different domains tends to be consistent, as shown in Figure 6a. The results shown in Figure 6 demonstrate that the DTDA model can effectively extract domain-invariant feature representation between different working conditions.

**Figure 6.** Feature distribution of the four bearings under the first and second working conditions extracted by (**a**) the DTDA and (**b**) the deep autoencoder (DAE). Here, PCA is used for visualization.

Second, Figure 7 provides the results of early fault online detection on Bearing2\_1. To ensure the results are more reliable, the location of five successive anomalous samples is defined as the occurrence of an early fault. The anomaly before the occurrence location is defined as a false alarm. For straightforward comparison, Figure 7 also reports the HI sequence built by the proposed method in Section 3.4 and the root mean square (RMS) curve. From Figure 7a, an early fault occurs at Sample 162 with only four false alarms. From Figure 7b, the HI sequence has a basically consistent trend with Figure 7a, which

proves that the HI sequence can be used to evaluate the reliability of the detection results. As a widely-used indicator to reflect the degradation trend, the RMS curve rises slowly at Sample 180, which lags by nearly 20 samples. This comparison demonstrates that the domain-invariant feature representation extracted by the DTDA model has a better discriminative ability.

**Figure 7.** Online anomaly detection results on Bearing2\_1 using (**a**) the proposed method, (**b**) the constructed health indicator (HI) by PCA, (**c**) the constructed HI by DAE, and (**d**) the RMS curve. Here, the label "1" in Subfigure (**a**) indicates the normal state; "−1" indicates the fault state.

The effectiveness of the obtained HI sequence is further analyzed. In Section 3.4, PCA is used to shrink the degradation features into a one-dimensional component, which performs as an HI. To test the effectiveness of PCA in HI construction, the DAE is also introduced to build an HI sequence by replacing PCA, as shown in Figure 7c. Specifically, after extracting the domain-invariant feature representation, the features of Bearing2\_2 and Bearing2\_3 in the target domain can be generated. Then, a DAE model with onedimensional output is trained using the feature set of these two bearings. Finally, the online features of Bearing2\_1 are directly fed into the obtained DAE model to get a onedimensional output, i.e., the expected HI sequence. It is clear that the two HI sequences built by PCA and DAE are nearly identical in geometric shape, and the location of the early fault is almost the same. This phenomenon indicates that the common features obtained by the DTDA have good representative capability to reflect the degradation trend, while both PCA and the DAE can easily extract a representative component from the features to build the HI. Still, the HI by PCA is a little more sensitive to the early fault. Since training a DAE model generally needs a sufficient amount of data, less samples in the online stage may cause over-fitting. Moreover, the DAE is trained by a gradient descent algorithm, which has more computational cost than PCA. Under comprehensive consideration, PCA is believed to be more suitable for HI construction than the DAE.

In Figure 7d, the RMS curve fluctuates drastically in the initial part, even locating in the normal state. This phenomenon is mainly caused by the irregular vibration of running-in, assembly errors, etc., not by early fault. If the features are not representative (like RMS), there will be many false alarms in the normal state. Quite different from the RMS curve, the HI sequence has almost no irregular fluctuations in the normal state. This phenomenon shows that the DTDA model can extract fault features that are robust to the irregular fluctuations in the normal state. Moreover, the HI sequence has an obvious upward trend after the location of the early fault, while the RMS curve keeps flat for a long period. It is clear that the features extracted by the DTDA are more sensitive to early fault than the RMS feature. As a result, the DTDA model can generate deep features with better discriminative ability, which is helpful to improve the performance of early fault detection.

To further analyze the comparative advantage of the proposed method, the trained DTDA in the offline stage is used to generate the online features of Bearing2\_1, as shown in Figure 8. It is worth noting that the visualization label in Figure 8 corresponds to the results in Figure 7a. The features of the two states are almost linearly separable, which indicates that the features extracted by DTDA are discriminative for early fault and very applicable for online detection.

**Figure 8.** Online features of the target bearing Bearing2\_1 in the IEEE PHM Challenge 2012 dataset.
