*3.1. Weighted Subdomain Adaptation Network*

In order to achieve efficient partial transfer fault diagnosis, we design a novel weighted subdomain adaptation network (WSAN). The details of the proposed model are clearly presented in Figure 3. The feature generator *G* is a deep structure based on one dimensional convolutional neural network (1D-CNN) that is expected to extract domain invariant deep features. The auxiliary classifier *CA* is set to obtain the class-level weights of the source samples, which is achieved by adversarial training. After acquiring class-level weights, weighted subdomain adaptation can be carried out in activation layers of the classifier *C* based on WLMMD. The objective function can be written as:

$$\begin{array}{ll} F\_{0}(\boldsymbol{\theta}\_{\mathcal{G}},\boldsymbol{\theta}\_{\mathcal{C}}) &= \frac{1}{n\_{s}} \sum\_{i}^{n\_{s}} L\_{\mathcal{L}}\big(\boldsymbol{G}(\mathbf{x}\_{si}),\boldsymbol{y}\_{si}\big) \\ &- \frac{\lambda\_{0}}{n\_{s} + n\_{t}} \sum\_{\mathbf{x}\_{i} \in \left(\mathcal{D}\_{s} \cup \mathcal{D}\_{t}\right)} L\_{d}(\boldsymbol{D}(\boldsymbol{G}(\mathbf{x}\_{i})),\boldsymbol{d}\_{i}) \\ &+ \gamma\_{0} \hat{d}\_{I}(\mathcal{D}\_{s},\mathcal{D}\_{t}) \end{array} \tag{7}$$

where *λ*<sup>0</sup> and *γ*<sup>0</sup> are the penalty coefficients, *ysi* and *di* are the source sample label and domain label, and *L* denotes the condition prediction loss.

**Figure 3.** The structural composition of the proposed model.

#### *3.2. Adversarial Training-Based Class-Level Weights Obtaining*

Due to the asymmetry of the fault classes in the two domains, samples of redundant types in the source domain may cause a negative transfer. Therefore, these redundant subdomains must be selected to block the classification knowledge that is unfavorable to the recognition of target samples. Inspired by generative adversarial networks (GAN), we set up an auxiliary classifier *CA* to play the mini-max game with the feature generator. Specifically, given input *xs* or *xt* with the label 1 or 0, after multiple layers of extraction, the feature generator *G* narrows the domain shift to make classifier cannot distinguish the true source of the input sample. The auxiliary classifier is trained to give the correct label. The objective of the adversarial training can be defined as:

$$\begin{array}{lcl}\underset{\begin{subarray}{c}\mathbf{C}\in\mathcal{C}\_{A}\end{subarray}}{\min}\mathcal{L}(\mathbf{C}\_{A},\mathbf{G}) &=\frac{1}{n\_{s}}\sum\_{i=1}^{n\_{s}}\log\left(\mathbb{C}\_{A}\left(\mathbf{G}(\mathbf{x}\_{si})\right)\right) \\ &+\frac{1}{n\_{t}}\sum\_{j=1}^{n\_{t}}\log\left(1-\mathbb{C}\_{A}\left(\mathbf{G}(\mathbf{x}\_{tj})\right)\right) \end{array} \tag{8}$$

The distribution differences of the deep features of shared fault types will be narrowed in the training process, so the auxiliary classifier will be unable to distinguish samples of these types and give an output close to 0, while the output of the unique source samples will be close to 1. The aim of adversarial training is to learn the relative importance of source samples, suggesting that the outlier samples should be assigned a relatively small weight. Therefore, the weight function is inversely related to *CA*(*G*(*x*)) and the importance weights function can be defined as:

$$w\_{\mathbb{C}}(\mathbf{x}) = 1 - \mathbb{C}\_{A}(G(\mathbf{x})),\tag{9}$$

After obtaining the class weights of the source samples, the overall objective can be rewritten as:

$$\begin{aligned} F(\theta\_{\mathcal{G}}, \theta\_{\mathcal{C}}) &= \frac{1}{n\_s} \sum\_{i}^{n\_t} \sum\_{\mathbf{x}\_{si} \in \mathcal{D}\_s^j} w\_{cj} L\_c \big( G \big( \mathbf{x}\_{si} \big), \mathbf{y}\_{si} \big) \\ &+ \frac{\lambda\_1}{n\_s} \sum\_{i=1}^{n\_t} \sum\_{\mathbf{x}\_{si} \in \mathcal{D}\_s^j} w\_{cj} \log \big( \mathbf{C}\_A \big( G \big( \mathbf{x}\_{si} \big) \big) \big) \\ &+ \frac{\lambda\_2}{n\_t} \sum\_{j=1}^{n\_t} \log \big( 1 - \mathbf{C}\_A \big( G \big( \mathbf{x}\_{tj} \big) \big) \big) \\ &+ \gamma \hat{d}\_l (\mathcal{D}\_{\mathbf{s}i} \big\big) \\\\ F\big( \theta\_{\mathcal{C}\_A} \big) &= \ -\frac{1}{n\_s} \sum\_{i=1}^{n\_t} \sum\_{\mathbf{x}\_{si} \in \mathcal{D}\_s^j} w\_{cj} \log \big( \mathbf{C}\_A \big( G \big( \mathbf{x}\_{si} \big) \big) \big) \\ &- \frac{\lambda}{n\_t} \sum\_{j=1}^{n\_t} \log \Big( 1 - \mathbf{C}\_A \big( G \big( \mathbf{x}\_{ij} \big) \big) \Big) \end{aligned} \tag{11}$$

where *wcj* and *<sup>D</sup><sup>j</sup> <sup>s</sup>* denote the weights and samples for the *j*-th source class, *ys* is the source sample labels and *γ* is a penalty coefficient.

#### **4. Experiments**

*4.1. Dataset Introduction*

The proposed framework is verified with the datasets collected in our laboratory to validate the performance in partial transfer fault diagnosis. Figure 4a indicates the experimental equipment used in our laboratory. The platform consists of a motor, two balancing rotors, two bearing seats, a planetary gearbox, and a magnetic brake for controlling load. Vibration sensors are installed on fixed holders at both ends of the gearbox, and the sampling frequency is 25.6 kHz.

**Figure 4.** Representation for rotating machinery fault diagnosis test bed in our laboratory (**a**) and different types of bearing fault (**b**) and gear fault (**c**). Red box indicates the location of damage.

(1) Bearing fault dataset

Five health conditions are involved in the bearing fault dataset, namely, normal, inner ring fault, outer ring fault, rolling element fault, and combined fault of the rolling element and outer ring. The fault parts are shown in Figure 4b. There are two damage sizes for each type of fault, specifically, 0.2 and 0.4 mm. Thus, the bearing fault dataset contains samples of nine health types, namely, NC, IF1, IF2, OF1, OF2, RF1, RF2, RO1, and RO2. Four datasets are obtained under different loads, specifically, L1 (80 N), L2 (60 N), L3 (40 N), and L4 (20 N). And the engine speed is 2000 r/min.

#### (2) Gear fault dataset

As shown in Figure 4c, the gear fault dataset contains samples of seven health types, namely, normal condition (NC), sun gear fracture (SF), sun gear pitting (SP), sun gear wear (SW), planet gear fracture (PF), planet gear pitting (PP), and planet gear wear (PW). We collected three datasets at different rotational speeds (without load), specifically, S1 (2200 r/min), S2 (2000 r/min), and S3 (1800 r/min).

The number of each type of samples is 500. Thus, the number of samples in bearing and gear datasets are 4500 and 3500, respectively. In order to give full play to the feature extraction and weight learning ability of the proposed method, 40% of the samples were used for training and the remaining for testing.
