4.3.2. Results of Bearing2\_4

First, similar to Figure 7, Figure 9 shows the results of the online detection for Bearing2\_4. This bearing falls into the early fault state at Sample 744, with no false alarm. It is also clear that the trend of the HI sequence and the RMS curve completely matches the detection results in Figure 9a, as shown in the dotted frame. Moreover, compared with the RMS curve, the HI sequence does not have an obvious fluctuation, which proves the effectiveness of the proposed method in early fault online detection.

**Figure 9.** Online anomaly detection results for Bearing2\_4 using (**a**) the proposed method, (**b**) the constructed HI, and (**c**) the RMS curve. Here, the label "1" in Subfigure (**a**) indicates the normal state; "−1" indicates the fault state.

Figure 10 further shows the online features of Bearing2\_4, which were generated using the domain-invariant feature representation extracted by the DTDA. The separability of these features is even more obvious than the online features of Bearing2\_1 shown in Figure 8. This phenomenon is caused by the degradation process of Bearing2\_4 in which the bearing transitions directly from the normal state to the fault state. In this scenario, the fault state data are certainly easy to distinguish from the normal state data. Figure 10 proves again that the proposed method can effectively recognize the normal state and the early fault state.

#### *4.4. Comparative Results with State-of-the-Art Methods*

In this section, nine state-of-the-art methods of bearing fault detection are introduced for a comprehensive comparison. These nine methods include one typical signal analysis method (Method 1), five anomaly detection methods without transfer learning (Methods 2–6), and three anomaly detection methods with transfer learning (Methods 7–9). For simplicity, the proposed method is named DTDA.

Following [16], two evaluation metrics are employed: (1) the detection location, which is the location (number) of the signal snapshot of the appearing fault; (2) the number of false alarms, which is the number of anomalies before the detection location. The comparative results are reported in Table 3.

From Table 3, the proposed method DTDA obtains the earliest detection location and almost the lowest number of false alarms. Although RD-DTL and SDFMhave a lower number of false alarms than the DTDA, the detection location of these two methods is relatively late. It is worth noting that the detection locations of all ten methods on Bearing2\_4 are not much different. This is because the bearing evolves quickly from the normal state to the fast degradation state, with a very short period of early fault. Since the data of the fast degradation state are quite different from the normal state data, all methods can detect faults at the location of the state change. However, the number of false alarms produced by different methods on Bearing2\_4 is not the same. Some methods like iFOREST and the local outlier factor (LOF) produce too many false alarms. Moreover, bandwidth empirical mode decomposition-adaptive multi-scale morphological analysis (BEMD-AMMA) has no false alarm since it utilizes signal analysis to conduct fault detection by observing the fault frequency.

**Figure 10.** Online features of the target Bearing2\_4. Here, PCA is used for visualization.

**Table 3.** Comparative results of the proposed method with nine state-of-the-art methods. Earlier detection location and lower number of false alarms indicate better. BEMD-AMMA, bandwidth empirical mode decomposition-adaptive multi-scale morphological analysis; LOF, local outlier factor; TCA, transfer component analysis.


Here, a detailed analysis of the comparative results is listed as follows:

(1) Comparison with BEMD-AMMA:

BEMD-AMMA can be viewed as a state-of-the-art signal analysis-based method for bearing early fault detection. This method first uses the bandwidth empirical mode decomposition (BEMD) to reconstruct the raw vibration signal and then utilizes an adaptive multi-scale morphological analysis (AMMA) algorithm to demodulate the reconstructed signal to obtain time-domain signals. Finally, a fault can be determined if the fault characteristic frequency can be observed. To calculate the fault characteristic frequency, this method has to know the various parameters of the target bearing and working condition in advance. Obviously, this limitation is too strict to achieve in the online scenario. Moreover, a fault only evolves to a certain degree, and the corresponding characteristic frequency can then appear. Therefore, the detection location will be delayed. In contrast, benefiting from the sensitivity of the early fault features extracted by the DTDA model, the proposed method can detect fault occurrence at an earlier location.

(2) Comparison with LOF:

The LOF is a typical anomaly detection algorithm running on sample density. In this experiment, the first 100 samples at the starting online stage were chosen to calculate the LOF value, and then, the largest value is selected as the alarm threshold. The parameter *K* in the LOF was set to 10. From Table 3, the detection location of the LOF is later than that of the proposed method, while the number of false alarms is much larger. This is because the normal state of a bearing may have unexpected irregular fluctuations. Moreover, when using normal state data to train the LOF, the threshold value will be relatively larger, resulting in a late detection location.

(3) Comparison with iFOREST:

iFOREST is also a typical anomaly detection algorithm adopting a random segmentation strategy. iFOREST segments all samples into various isolated outliers, and the ones with a shorter path are viewed as anomalies. In this experiment, the number of trees was set to 100. From Table 3, the detection location of iFOREST is much delayed for Bearing2\_1, and too many false alarms appear. This is because with online samples arriving sequentially, the segmentation number continues to increase. Consequently, the detection performance is not stable.

(4) Comparison with SRD:

SRD is a state-of-the-art early fault detection algorithm based on probability density estimation. In the offline stage, this method directly uses the original signals of multiple working conditions to establish a global model and determine the alarm limit. The online signals are fed into this model to get the detection results. However, this method does not take into account the difference between the bearing data under different working conditions. The setting of the alarm limit is also too subjective. Therefore, the detection location was very much delayed for Bearing2\_1. In contrast, the proposed method does not need to manually set a threshold for detection, and the DTDA model can effectively transfer fault information between different working conditions to improve the detection performance.

(5) Comparison with SDFM:

SDFM is a state-of-the-art online detection method for bearing early fault. This method employs a sliding window to determine the location of early fault occurrence by means of DAE features. The DAE network structure was set to [800, 512, 10], and the size of the sliding window was set to 100. Benefiting from the sliding window, this method had no false alarm, but the detection location was a bit delayed. The main reason is that this method is not a transfer learning method and heavily relies on the amount of offline training data. If the amount of training data is insufficient, the early fault information could not be extracted completely. In contrast, the proposed method can borrow data from offline working conditions to supplement early fault information for online detection.

(6) Comparison with S4VM-SODRMB:

This method is also a state-of-the-art online detection method for bearing early fault. This method only needs online data to update model training in an unsupervised learning architecture. In this experiment, the first 100 samples of online data were accumulated to extract the DAE features and then train an initial SVM model. The

sequentially collected data batch was used to update the SVM model successively. The radius-margin upper bound of leave-one-out error was then utilized to calculate an index for online detection. In Table 3, the detection location of this method was much delayed for Bearing2\_1. The reason is that only a small amount of online data was used to train an initial model. Once the data contain irregular fluctuation or colored noise, the initial model will be biased, and the detection results will deteriorate. In contrast, the proposed method can utilize offline data to facilitate online detection. The transfer learning technique in the DTDA model guarantees the effective use of early fault information from offline data.

(7) Comparison with TCA + SVDD:

Transfer component analysis (TCA) is a widely used transfer learning algorithm by minimizing the MMD distance between different domains. In this experiment, TCA was first run to conduct domain adaptation between the available data from offline and online working conditions. Then, the common features were used to train an SVDD model by using the available training data from the online working condition. This SVDD model was used to recognize anomalies in the online stage. In this experiment, the regularization parameter and kernel parameter were set to 10 and one, respectively. From Table 3, it is clear that the detection location of TCA+SVDD was much delayed for Bearing2\_1, and the number of false alarms became larger. The reason is that TCA conducts domain adaptation with a shallow model, while such domain adaptation is in a single mode. In contrast, the proposed method not only conducts dual domain adaptation, but also extracts common temporal information of early fault. Therefore, the proposed method can provide more representative features for online detection.

(8) Comparison with RD-DTL and OD-DTL:

RD-DTL and OD-DTL are both the newest state-of-the-art early fault online detection methods based on deep transfer learning. RD-DTL first uses a robust auto-encoder to determine the periods of different degradation states and then constructs an MMDbased DAE network to extract common features for the normal state, followed by an SVM model for recognition. This method focuses on the robustness of the online detection model. OD-DTL utilizes a pre-trained VGG-16 network on the ImageNet dataset to fine-tune a deep neural network for bearing online detection. This method only conducts model-level domain adaptation, not considering the feature transfer. Therefore, the performance of fault detection is limited. From Table 3, these two methods get similar results as the proposed method. RD-DTL even gets zero false alarms. However, the proposed method gets an earlier detection location than them. The main reason is that the proposed method conducts dual domain adaptation with temporal information. Therefore, the online features by DTDA are more sensitive to early fault.

In summary, the proposed DTDA model can achieve dual domain adaptation at the feature level, which can facilitate the transfer of fault information between different working conditions. Moreover, the DTDA utilizes the TCN as the feature extractor to extract the temporal information of the degradation process, which can improve the representative ability of the online features for early fault. Therefore, the proposed method is more applicable to the online detection of bearing early fault.

Another problem of online detection is computational time. The proposed method needs to train the DTDA model in the offline stage and then directly inputs the sequentially collected online data into the model to recognize the fault occurrence. The dual domain adaptation by the DTDA provides a domain-invariant feature representation with a better discriminative ability for the online task. The offline model training is computationally expensive, since the adversarial training of the DTDA is an iteration process. However, no additional training time is required in the online stage. The classification on an online sample by the trained DTDA model is almost a linear operation, so the time for recognizing

an online sample is very short. For this reason, the corresponding time data are not provided in Table 3.

#### **5. Conclusions**

Online detection of bearing early fault is an application-oriented fault detection method with significant practical meaning. This paper proposes a new online detection method of early fault based on deep dual temporal domain adaptation. This method adopts deep domain adaptation with temporal information to extract domain-invariant feature representation with stronger discriminative ability. Employing this representation as the channel of information transfer, the proposed method can improve the detection robustness and accuracy in the online scenario with fewer false alarms as well. This method can directly tackle whole-life degradation data, with no need to manually mark fault data in advance. Therefore, this method is more applicable for the online detection of early fault, and the idea of this paper can be widely expanded for different objects.

In the next work, an attention mechanism will be introduced into domain adversarial training to improve the effect of domain adaptation for time series data. Besides, this paper focuses on the anomaly detection problem for a bearing across different working conditions. How to achieve online transfer learning across different machinery and extract common features from multiple sources is an interesting problem.

**Author Contributions:** Conceptualization, W.M. and B.S.; methodology, W.M.; software, L.W.; validation, B.S. and L.W.; formal analysis, W.M.; investigation, W.M.; resources, B.S.; data curation, W.M.; writing, original draft preparation, B.S.; writing, review and editing, B.S.; visualization, B.S.; supervision, L.W.; funding acquisition, W.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China Grant Number U1704158, Henan Province Technologies R & D project of China Grant Number 212102210103 and 202102210361, DOE Key Scientific Research Project of Henan Province Grant Number 20A520039, and the funding scheme of University Young Core Instructor in Henan Province Grant Number 2019GGJS279.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are openly available in PCoE Datasets at https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/, reference number [34].

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Entropy* Editorial Office E-mail: entropy@mdpi.com www.mdpi.com/journal/entropy

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com

ISBN 978-3-0365-3209-7