1. Introduction
Rotating machines (RM) are widely used in intelligent equipment such as computerized numerical control (CNC) machines, aircraft engines, wind turbines, etc. Economic losses and the closure of some facilities are the results when RM fails or stops. More specifically, most RM failures are caused by typical components such as bearings, gears, motors, etc. These components operate in a complex environment and have different fault classes. Therefore, timely and accurate fault diagnosis for these typical components can reduce unnecessary malfunctions and downtime, which is essential for improving the reliability and safety of RM.
In general, the main methods of fault diagnosis consist of model-driven methods and data-driven methods. The model-driven method is needed to analyze the fault mechanism based on experience and prior knowledge [
1]. In contrast, with the widespread development of deep learning (DL) in multiple research fields [
2,
3], extensive research has been conducted on data-driven fault diagnosis combined with signal processing technology, which can achieve end-to-end fault diagnosis without requiring extensive expertise [
4,
5]. Furthermore, fault diagnosis based on contactless sensing data has begun to be studied. Li et al. [
6] provided a new contactless health monitoring and fault diagnosis method by collecting visual data on vibration through event-based cameras for the first time. However, data-driven methods as presented by DL require huge amounts of labeled data and balanced sample data between different classes. However, it is difficult to collect adequate and balanced data because RM is usually in normal operation in modern manufacturing [
7]. Imbalanced samples can cause the model to excessively learn features from healthy samples. resulting in “underreporting”, which will reduce the accuracy and reliability of the model and cause losses to the production safety of the enterprise. Therefore, it is of practical importance to explore the methods of fault diagnosis of RM in the presence of imbalanced data.
Numerous methods and strategies have been developed for dealing with the imbalanced data from RM, generally divided into model-based and data-based methods. Model-based methods learn features from the imbalanced samples by constructing an algorithm model. Li et al. [
8] constructed a cost-sensitive multi-decision tree algorithm, which increases the fault cost of learning samples from minority classes and makes the model more sensitive to minority class data. Sun et al. [
9] proposed an automatic imbalance diagnosis method based on a Bayesian optimizer that optimizes the parameters of oversampling models and classifier models through a hierarchical parameter space, achieving diagnostic tasks under various imbalance ratios. Currently, designing a model structure to enhance its feature extraction ability is a more intelligent method. Wang [
10] proposed a normalized softmax loss with adaptive angle margin to supervise neural networks learning imbalanced data. However, it is difficult to formulate the cost strategy and improve the model’s ability to learn features. Data-based methods mainly refer to resampling techniques, including under-sampling methods (USM) for multi-class samples and over-sampling methods (OSM) for the few class samples, all designed to balance the class distribution [
11]. Tang et al. [
12] used extreme gradient boosting feature selection and improved whale optimization random forest to diagnose the fault of a wind turbine gearbox by under-sampling the normal data. Although the influence of imbalanced data on the model is eliminated to some extent by USM, some feature information from the normal data was lost in the process. In contrast, OSM is more commonly used because it expands the samples of a few classes based on the existing data. Zhang et al. [
13] proposed a weighted minority OSM and used an improved deep auto-encoder (AE) as the backbone of feature extraction, which can avoid generating incorrect or unnecessary samples. Wei et al. [
14] used k-nearest neighbors to filter out noisy points from OSM-generated samples and made a transition from multiple binary class imbalances to multiple class imbalances for RM. In traditional OSM, such as the synthetic minority over-sampling technique (SMOTE) [
15], the pseudo-samples generated by OSM have poor generalization and some noisy points. Although the above-mentioned improved OSM overcomes the traditional problems, there is still the problem that the sampling distribution features cannot be learned automatically. Moreover, the generative adversarial network (GAN) [
16] has been widely used for imbalanced data because it can compensate for imbalanced data by generating pseudo-samples. Mao et al. [
17] used the spectrum data of the bearing vibration signal to generate samples with few classes using GAN and a stacked denoising model AE to perform fault diagnosis. Zhao et al. [
18] improved the accuracy and diversity of the generated data by using an improved GAN, which combined AE and an online sample filter, and then introduced an additional classifier to train 2D images transformed by wavelet transform from the bearing vibration signal. Zareapoor et al. [
19] proposed the minority oversampling generative adversarial network (MoGAN), which not only produces high-quality patterns with few classes but also enables the discrimination of pseudo-patterns. The samples generated by GAN and its derivatives have the same distribution as the original samples, but this method is still limited by the quality of the original samples.
Recently, deep transfer learning (DTL) has been used in the fault diagnosis of RM to overcome the overdependence on the original samples [
20], which is realized by transferring the knowledge from the source domain (SD) to the target domain (TD). In general, DTL can be divided into three patterns: instance-based transfer, feature-based transfer, and parameter-based transfer (PTL) [
21]. The first two methods assume that the samples or learned features of SD and TD have a similar distribution. Liu et al. [
22] proposed selective multiple instance transfer learning, which measures the correlation between tasks in the source and target domains by investigating the similarity of features between two tasks. This method solves the problem of knowledge security transfer in multi-instance learning. Wang et al. [
23] constructed a domain-adaptive transfer learning network by minimizing the maximum mean discrepancy between source and target domains to reduce marginal distribution bias. For the above methods, it is difficult to develop an algorithm with generalization ability to reduce the feature differences between different domains. In comparison, PTL has a wider range of applications and ensures user data privacy and security. In the PTL strategy, the feature extraction backbone of DL is applied to SD for training to obtain the pre-training backbone, and then the pre-training backbone containing the trained weight parameters is applied to TD. Data sharing is not involved in the process of knowledge transfer. Zhang et al. [
24] proposed federated transfer learning based on prior distribution, which achieves local fault diagnosis for multiple users by uploading local models and downloading global models. Chen et al. [
25] used a one-dimensional convolutional neural network (CNN) as the feature extraction backbone to implement parameter transfer to bearing and motor datasets. Wen et al. [
26] used the image of the bearing vibration signal from the time domain as input to train the pre-training model of ResNet-50 from ImageNet [
27]. Although CNN and its numerous variants have achieved great success in PTL, the important features cannot be considered due to the limitations of convolutional layers and uniform feature consideration. With the success of the vanilla transformer [
28] in natural language processing (NLP) and computer vision (CV), it began to be used in DTL as an excellent backbone for feature extraction. Pei et al. [
29] used a vanilla transformer as a feature extraction backbone and CNN as a classifier to improve fault diagnosis from multiple classes to a few classes on bearing and gearbox datasets, respectively. In the current study, the vanilla transformer is more successful in fault diagnosis with balanced data. Ding [
30] combines time-frequency signal analysis with vanilla transformers to achieve fault diagnosis of bearing datasets by mining important features in time-frequency maps. Tang [
31] uses a vision transformer (ViT) [
32] to perform preliminary diagnosis on time-frequency maps of different frequency bands and fuse sub-results through the soft voting method to obtain the final diagnostic decision. Moreover, PTL fault diagnosis is more likely to be performed between different operating states of the same RM component, which is not possible for different RM components due to the large distance between domains.
The above methods are mainly used for a specific RM component, and there is no single method applicable to most RM components. Therefore, this work explores a paradigm that can apply fault diagnosis to multiple RM components based on imbalanced data. On the one hand, synchrosqueezed wavelet transforms (SWT) [
33] are further improved in this work to compress the frequency scale of samples and obtain time-frequency characteristics of different samples under complex working conditions. On the other hand, a hierarchical window transformer pipeline obeying a dynamic seesaw (HWT-SS) has been designed to improve the feature extraction capability for imbalanced samples. The proposed methods are verified on two bearing datasets, one gearbox dataset, and one motor dataset to demonstrate their excellent performance. At the same time, the model realizes the visualization of attention by the weighted sum of key features by Grad-CAM [
34], which improves the interpretability of the model. The main contributions of this work are as follows:
- (1)
The improved SWT performs scale compression in the frequency dimension and normalizes the amplitude energy of the frequency. Thus, the difference between different components is reduced, and the most important features are represented more intensively in the time-frequency plane.
- (2)
A novel transformer-based pipeline (HWT-SS) uses the hierarchical window transformer (HWT) as a backbone. The seesaw loss function is applied to realize the dynamic equilibrium of different classes of samples in the training process.
- (3)
Cross-component transfer learning experiments (THWT-SS) on four datasets with multiple imbalanced ratio samples can effectively improve the accuracy and robustness of RM fault diagnosis with imbalanced data.
The remaining paper is organized as follows.
Section 2 describes the background theory of the transformer backbone. The details of the proposed method are summarized in
Section 3.
Section 4 presents the experimental details of the proposed method using four datasets. Finally, the conclusions are presented in
Section 5.
5. Conclusions
In this paper, we proposed a novel THWT-SS to achieve fault diagnosis of RM with imbalanced data, which is composed of applying PTL to HWT-SS. The proposed THWT-SS has the following features: (1) We creatively apply the PTL to various RM components to solve the practical problem of RM and adopt the improved SWT to improve the signal feature expression in the time-frequency domain and reduce the feature difference between different domains. (2) The proposed HWT-SS adopts a hierarchical window transformer as the feature extraction backbone and dynamic seesaw loss as the loss function, which improves the feature extraction ability and reduces the impact of imbalanced data.
The advantages of the model are verified using two public and two self-generated datasets, respectively. First, in Case 1, the average accuracy of HWT-SS was increased by 3.75%, 11.29%, and 14.9% under the extreme imbalance condition compared to HWT-CE, ResNet, and VGG, respectively. In Case 2, the highest diagnostic accuracy of THWT-SS can reach more than 99.06% by transferring learning between different component datasets under an extremely imbalanced ratio condition. The comparisons with benchmark models and published methods prove that THWT-SS can solve the problem of RM imbalanced data by cross-component transfer learning. However, the model proposed in this paper still requires a small number of fault samples to complete fault diagnosis. In future work, we will further improve the model to solve the more extreme imbalanced problem and further explore the application of the model in domains other than RM, such as electrical appliances.