**2.** *T***<sup>2</sup> Statistics Fault Detection**

In the operation process of the complex equipment or systems, the common observation state can be divided into intrinsic and extrinsic parts. In general, the intrinsic part represents the main working state of the system, which has a certain trend, monotony, and periodicity. The extrinsic part represents system noise, which has a certain zero mean value, high frequency vibration, and statistical stability. For the intrinsic part, the state equation of the system can be used to describe the law. When a fault occurs in the intrinsic part, the symptoms are relatively significant, and the corresponding fault detection methods are relatively mature. However, for high-frequency vibration signals, the incipient fault is often hidden in the extrinsic part, which is easily covered by noise. Therefore, it is necessary to analyze the observed data in depth.

#### *2.1. Signal Decomposition*

In the initial operation stage of the equipment, the unstable operation of the system causes large data fluctuations, which will not only have a great effect on the system trend, but also affect the statistical characteristics of the data. Therefore, it is necessary to truncate the data to remove unstable signals [9]. The corresponding time of the time series after removing the nonstationary period data is *t*1, *t*2, ··· , *tm*, and the following *m* observation data are obtained:

$$\mathbf{Y} = [y(t\_1), y(t\_2), \dots, y(t\_m)].\tag{1}$$

Each sampling *y*(*ti*) contains *n* features, which are expressed as components in the form of

$$y(t\_i) = \left[y\_1(t\_i), y\_2(t\_i), \dots, y\_n(t\_i)\right]^T, i = 1, 2, \dots, m. \tag{2}$$

Then, the data *Y* can be decomposed into

$$
\Upsilon = \hat{\Upsilon} + \mathcal{R},
\tag{3}
$$

where *Y***ˆ** denotes the intrinsic part, which is composed of trend, and *R* denotes the extrinsic part, which is composed of observation noise and fault data.

The intrinsic part is composed of multiple signals. Selecting the appropriate basis function *<sup>f</sup>*(t) <sup>=</sup> [ *<sup>f</sup>*1(*t*), *<sup>f</sup>*2(*t*), ··· , *fs*(*t*)]<sup>T</sup> can help describe the intrinsic part. By traversing *m* data to model the nonlinear data *Y*,

$$[y\_1, y\_2, \dots, y\_m] = \begin{bmatrix} \beta\_1^{(1)} & \beta\_2^{(1)} & \cdots & \beta\_s^{(1)} \\ \beta\_1^{(2)} & \beta\_2^{(2)} & \cdots & \beta\_s^{(2)} \\ \vdots & \vdots & \ddots & \vdots \\ \beta\_1^{(n)} & \beta\_2^{(n)} & \cdots & \beta\_s^{(n)} \end{bmatrix} \begin{bmatrix} f\_0(t\_1) & f\_0(t\_2) & \cdots & f\_0(t\_m) \\ f\_1(t\_1) & f\_1(t\_2) & \cdots & f\_1(t\_m) \\ \vdots & \vdots & \ddots & \vdots \\ f\_s(t\_1) & f\_s(t\_2) & \cdots & f\_s(t\_m) \end{bmatrix} . \tag{4}$$

Note that

$$F \stackrel{\Delta}{=} \begin{bmatrix} f\_0(t\_1) & f\_0(t\_2) & \cdots & f\_0(t\_m) \\ f\_1(t\_1) & f\_1(t\_2) & \cdots & f\_1(t\_m) \\ \vdots & \vdots & \ddots & \vdots \\ f\_s(t\_1) & f\_s(t\_2) & \cdots & f\_s(t\_m) \end{bmatrix}, \\ \mathcal{B} \stackrel{\Delta}{=} \begin{bmatrix} \beta\_1^{(1)} & \beta\_2^{(1)} & \cdots & \beta\_s^{(1)} \\ \beta\_1^{(2)} & \beta\_2^{(2)} & \cdots & \beta\_s^{(2)} \\ \vdots & \vdots & \ddots & \vdots \\ \beta\_1^{(n)} & \beta\_2^{(n)} & \cdots & \beta\_s^{(n)} \end{bmatrix} \tag{5}$$

Then, Equation (4) can be expressed as

$$
\Upsilon = \mathcal{J}\mathcal{F}.\tag{6}
$$

Thus, the efficient estimator of *β* is

$$
\hat{\boldsymbol{\beta}} = \boldsymbol{\Upsilon} \boldsymbol{F}^{\mathsf{T}} \left( \boldsymbol{F} \boldsymbol{F}^{\mathsf{T}} \right)^{-1} . \tag{7}
$$

Using Equations (3) and (7), the signal can be decomposed into

$$\begin{cases} \quad \hat{\mathbf{Y}} = \hat{\mathbf{f}} \mathbf{F} = \mathbf{Y} \mathbf{F}^{\mathrm{T}} \left( \mathbf{F} \mathbf{F}^{\mathrm{T}} \right)^{-1} \mathbf{F} \\\ \quad \mathbf{R} = \mathbf{Y} - \hat{\mathbf{Y}} = \mathbf{Y} \left( I - \mathbf{F}^{\mathrm{T}} \left( \mathbf{F} \mathbf{F}^{\mathrm{T}} \right)^{-1} \mathbf{F} \right) \end{cases} \tag{8}$$

Usually, the choice of the basis function is a problem worthy of discussion, and it depends on prior knowledge of practical application scenarios; however, this is not the focus of this paper, and is therefore not covered here.

**Remark 1.** *For the bearing data, the data is generally stable and periodic. Therefore, Fourier transform is usually used to extract periodic features instead of more complex basis functions, such as a polynomial basis function and wavelet basis function.*
