*2.1. Proposed Framework*

The frame structure of the proposed approach is shown in Figure 2.

The proposed framework adopts two modules, i.e., the feature extractor G and the classifier C. The feature extractor G is composed of 4 fully connected layers, and the dimensions of the samples extracted from each layer are 512, 128, 64, and 16 dimensions. The classifier C is a two-layer Softmax classifier, which is used to diagnose fault sample types. The original source time-domain signal is processed by FFT and input into the feature extractor G to extract the features of the source domain fault sample. The extracted fault features are then classified by the classifier C to extract the feature center of each fault type from the feature signal of the source domain gradually. After the first model pre-classification, the features of the target domain fault samples are added multiple times with tiny noise containing their own features, and the distance from the feature center is measured. The model realizes the fault diagnosis of the target domain samples after many times of learning and training. The training process of the model is described in detail in Section 3.2. The model introduces the following target objects to improve the diagnostic performance and generalization ability.

**Figure 2.** Framework diagram of the proposed method.

#### 2.1.1. Classification Loss

Minimizing the classification error of source domain samples is the first optimization goal of the proposed framework. The classifier learns classification knowledge from the labeled samples in the source domain. The standard Softmax regression loss is selected as the objective function [24]. The specific function formula and explanation are as follows:

$$L\_{\mathbb{C}} = -\frac{1}{m} \left| \sum\_{i=1}^{m} \left( 1 - y^{(i)} \right) \log \left( 1 - h\_{\theta} \left( \mathbf{x}^{(i)} \right) \right) \right. \\ \left. + \sum\_{i=1}^{m} y^{(i)} \log h\_{\theta} \left( \mathbf{x}^{(i)} \right) \right| \tag{1}$$

where *x*(*i*) and *y*(*i*) represent the input signal of the *i-*th sample and the probability output corresponding to the sample, and *hθ*(*x*(*i*) ) is the set of probabilities of various fault types corresponding to the *i*-th sample.

#### 2.1.2. Feature Loss

Feature loss is used to correct the error loss caused by discarding useless fault type features in the process of extracting feature centers. Feature loss can be expressed as the absolute difference between the feature extracted from the source domain and the feature center generated by the learning process. The function formula [25] is as follows:

$$L\_F(\mathbf{x}\_\mathcal{c}, o\_\mathcal{c}) = \frac{1}{\mathcal{C}} \sum\_{c=1}^{\mathcal{C}} |f(\mathbf{x}\_\mathcal{c}) - f(o\_\mathcal{c})| \tag{2}$$

where *xc* and *oc* are the *c*-th features of the feature extraction and feature center.
