*2.2. T*<sup>2</sup> *Statistics Detection*

For simplicity, remember *r<sup>i</sup>* = *r*(*ti*), *i* = 1, 2, ··· , *m*. According to Equation (8), the training data after signal decomposition are *R* = [*r*1,*r*2, ··· ,*rm*], which is generally considered a normal random vector with expectation of **0**, so that

$$r\_i \sim N(\mathbf{0}, \Sigma),\tag{9}$$

where **Σ** denotes the total covariance matrix. When the covariance matrix **Σ** is unknown, the unbiased estimation is given by

$$
\mathfrak{L} = \frac{\mathbf{R}\mathbf{R}^T}{m - 1}.\tag{10}
$$

Let *Z* = *z*1, *z*2,..., *z<sup>p</sup>* be the data in the test window to be tested; the sample mean value *z***¯** is

$$\overline{\mathbf{z}} = \frac{1}{p} \sum\_{i=1}^{p} \mathbf{z}\_i. \tag{11}$$

Then, *z***¯** is still normal distributed and

$$
\bar{\mathbf{z}} \sim \mathcal{N}\left(\mathbf{0}, \frac{1}{p}\boldsymbol{\Sigma}\right). \tag{12}
$$

The *T*<sup>2</sup> statistics can be constructed as

$$T^2 = p\mathbf{\bar{z}}^T \mathbf{\hat{z}}^{-1} \mathbf{\bar{z}}.\tag{13}$$

Reference Solomons and Hotelling [17] reports that the distribution of the *T*<sup>2</sup> statistic satisfies *<sup>m</sup>* <sup>−</sup> *<sup>n</sup>*

$$\frac{m-n}{n(m-1)}T^2 = \frac{p(m-n)}{n(m-1)}\mathbb{Z}^{\mathsf{T}}\mathbb{\hat{E}}^{-1}\mathbb{\hat{z}} \sim F(n, m-n).\tag{14}$$

Therefore, if the significance level is *α*, we can get that

$$\frac{m-n}{m(m-1)}T^2 = \frac{l(m-n)}{n(m-1)}\mathbf{2}^T \mathbf{\hat{S}}^{-1} \mathbf{\hat{z}} < F\_a(n, m-n). \tag{15}$$

The testing data *Z* and the training data *R* both come from the same mode; otherwise, they are considered different. The error rate of this criterion is *α*.

#### **3. Optimal Kernel Density Estimation**

Section 2 introduces the fault detection method based on *T*<sup>2</sup> statistics, including the signal decomposition technology and fault detection method based on the *T*<sup>2</sup> statistics. However, the fault detection method based on the *T*<sup>2</sup> statistics assumes that data satisfies the normal distribution, while the actual observation data may not meet the hypothesis, which can lead the discriminant performance of the *T*<sup>2</sup> statistics to not satisfy the design requirements. In addition, the statistics test the data from the angle of the intrinsic part *Y***ˆ** and covariance matrix **Σˆ** . These two attributes are not sufficient to describe all statistical characteristics of the system. When the incipient fault is submerged by data noise, it is easy to miss the detection. In this study, a KDE method for multidimensional data is constructed to describe the probability and statistical characteristics of the data more accurately.

#### *3.1. Optimal Bandwidth Theorem*

For the observed data, the frequency histogram can be used to show its statistical characteristics directly. However, in the actual application process, the frequency histogram is a discrete statistical method, the interval number of the histogram is difficult to divide, and more importantly, the discretization operation inconveniences the subsequent data processing. To overcome these limitations, the KDE method is proposed. This method is a nonparametric estimation method that estimates the population probability density distribution directly by sampling data.

For any point *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*n*, assuming that the probability density of a certain mode is *<sup>f</sup>*(*x*), the kernel density of *f*(*x*) is estimated based on the sampling data *R* = [*r*1,*r*2, ··· ,*rm*] in Section 2.1. As reported in reference Rao [18], the estimation formula is

$$\hat{f}\_K(\mathbf{x}, h\_m) = \frac{1}{m h\_m^n} \sum\_{i=1}^m K\left(\frac{r\_i - \mathbf{x}}{h\_m}\right),\tag{16}$$

where *m*, *n*, *K*(·), and *hm* denote the number of sampling data, dimension of sampling data, kernel function, and bandwidth, respectively.

For the sake of convenience in the following discussions, in the case of no doubt,

$$\begin{cases} \quad \hat{f}\_{\mathcal{K}}(\mathbf{x}) \triangleq \hat{f}\_{\mathcal{K}}(\mathbf{x}, h\_{\mathcal{m}})\\ \quad \int \mathbf{g}(\mathbf{x}) d\mathbf{x} \overset{\triangle}{=} \int\_{\mathbf{x} \in \mathbb{R}^{n}} \mathbf{g}(\mathbf{x}) d\mathbf{x} \end{cases} \tag{17}$$

The kernel function *<sup>K</sup>*(·) satisfies *<sup>K</sup>*(*x*)*d<sup>x</sup>* <sup>=</sup> 1; therefore, *<sup>K</sup> ri*−*<sup>x</sup> hm dx* = *h<sup>n</sup> <sup>m</sup>*, that is, ˆ *fK*(*x*)*dx* = 1. Thus, ˆ *fK*(*x*) satisfies both positive definiteness, continuity, and normality. Therefore, it is reasonable to use it as the KDE. The Gaussian kernel function is a good choice as given by

$$\mathcal{K}(\mathbf{x}) = (2\pi)^{-n/2} e^{-\left(\mathbf{x}^\mathrm \mathbf{T} \mathbf{x}\right)/2} \tag{18}$$

In this study, the performance of the kernel density estimator is characterized by the mean integral square error (MISE).

$$\text{MISE}\left(\hat{f}\_{\mathbf{K}}(\mathbf{x})\right) = \int \mathbb{E}\left[\hat{f}\_{\mathbf{K}}(\mathbf{x}) - f(\mathbf{x})\right]^2 d\mathbf{x} \tag{19}$$

Reference Rao [18] shows that the estimation result ˆ *fK*(*x*) is not sensitive to the selection of the kernel function *K*(·); that is, the MISE of the estimation results obtained using different kernel functions is almost the same, which is reflected in the subsequent derivation process. In addition, the MISE depends on the selection of the bandwidth *hm*. If *hm* is too small, the density estimation ˆ *fK*(*x*) shows an irregular shape because of the increase in the randomness. While *hm* is too large, density estimation ˆ *fK*(*x*) is too averaged to show sufficient detail.

The optimal bandwidth formula is provided in the following theorem, and it is one of the key theoretical results of this study.

**Theorem 1.** *For any dimensional probability density function f*(·) *and any kernel function K*(·) *with a symmetric form, if* ˆ *fK*(·) *in Equation (16) is used to estimate f*(·)*, and if the function* tr*∂*<sup>2</sup> *<sup>f</sup>*(*x*) *∂x∂x*<sup>T</sup> *with respect to <sup>x</sup> is integrable when the* MISE <sup>ˆ</sup> *fK*(·) *in Equation (19) is the minimum, the bandwidth hm satisfies*

$$h\_m = \left(\frac{md\_K^2}{n^3 c\_K} \int tr \left(\frac{\partial^2 f(\mathbf{x})}{\partial \mathbf{x} \partial \mathbf{x}^\mathrm{T}}\right)^2 d\mathbf{x}\right)^{-1/\left(n+4\right)}\tag{20}$$

*where cK and dK are two constant values given by*

$$\begin{cases} \quad c\_K = \int K^2(\mathbf{x})d\mathbf{x} \\\quad d\_K = \int \mathbf{x}^\mathrm{T} \mathbf{x} K^2(\mathbf{x})d\mathbf{x} \end{cases} \tag{21}$$

*Equation (20) is called the optimal bandwidth formula and hm denotes the optimal bandwidth.*

A detailed proof of this theorem is given below.

**Proof.** It can be proved that the following two equations hold

$$\begin{cases} \mathrm{E}\left[\hat{f}\_{\mathcal{K}}(\mathbf{x})\right] = \int K(\mathbf{u}) f(\mathbf{x} + h\_{\mathrm{m}} \mathbf{u}) d\mathbf{u} \\\\ \mathrm{E}\left[\hat{f}\_{\mathcal{K}}^{2}(\mathbf{x})\right] = \frac{\int K^{2}(\mathbf{u}) f(\mathbf{x} + h\_{\mathrm{m}} \mathbf{u}) d\mathbf{u}}{m! r\_{\mathrm{m}}^{\mathrm{n}}} + \frac{(m - 1) \left(\int K(\mathbf{u}) f(\mathbf{x} + h\_{\mathrm{m}} \mathbf{u}) d\mathbf{u}\right)^{2}}{m} \end{cases} \tag{22}$$

In fact,

$$\begin{split} \mathrm{E}\left[f\_{\mathrm{K}}(\mathbf{x})\right] &= \int \cdots \int \prod\_{i=1}^{m} f(r\_{i}) \frac{1}{m h\_{m}^{n}} \sum\_{i=1}^{m} \mathrm{K}\left(\frac{r\_{i} - \mathbf{x}}{h\_{m}}\right) dr\_{m} \cdots dr\_{1} \\ &= \frac{1}{m h\_{m}^{n}} \sum\_{i=1}^{m} \int f(r) \mathrm{K}\left(\frac{r - \mathbf{x}}{h\_{m}}\right) dr \\ &= \int f(\mathbf{x} + h\_{m} \mathbf{u}) \mathrm{K}(\mathbf{u}) d\mathbf{u}. \end{split} \tag{23}$$

In addition,

E ˆ *f* 2 *<sup>K</sup>*(*x*) <sup>=</sup> ··· *<sup>m</sup>* ∏ *i*=1 *f*(*ri*) (*mh<sup>n</sup> m*) <sup>−</sup><sup>1</sup> *<sup>m</sup>* ∑ *i*=1 *f*(*ri*)*K ri*−*<sup>x</sup> hm* <sup>2</sup> *dr*<sup>1</sup> ··· *dr<sup>m</sup>* = (*mh<sup>n</sup> m*) <sup>−</sup><sup>2</sup> ··· *<sup>m</sup>* ∏ *i*=1 *f*(*ri*) *m* ∑ *i*=1 *f*(*ri*)*K ri*−*<sup>x</sup> hm* <sup>2</sup> *dr*<sup>1</sup> ··· *dr<sup>m</sup>* = (*mh<sup>n</sup> m*) <sup>−</sup><sup>2</sup> ··· *<sup>m</sup>* ∏ *i*=1 *f*(*ri*) ) *<sup>m</sup>* ∑ *i*=1 *K*2 *ri*−*<sup>x</sup> hm* <sup>+</sup> *<sup>m</sup>* ∑ *i*=*j K ri*−*<sup>x</sup> hm K rj*−*<sup>x</sup> hm* \* *dr*<sup>1</sup> ··· *dr<sup>m</sup>* = (*mh<sup>n</sup> m*) <sup>−</sup><sup>2</sup> ··· ) *<sup>m</sup>* ∏ *i*=1 *f*(*ri*) *m* ∑ *i*=1 *K*2 *ri*−*<sup>x</sup> hm* <sup>+</sup> *<sup>m</sup>* ∏ *i*=1 *f*(*ri*) *m* ∑ *i*=*j K ri*−*<sup>x</sup> hm K rj*−*<sup>x</sup> hm* \* *dr*<sup>1</sup> ··· *dr<sup>m</sup>* = (*mh<sup>n</sup> m*) −2 ) *<sup>m</sup>* ∑ *i*=1 *<sup>f</sup>*(*ri*)*K*<sup>2</sup> *ri*−*<sup>x</sup> hm <sup>d</sup><sup>r</sup>* <sup>+</sup> *<sup>m</sup>* ∑ *i*=*j f*(*ri*)*K ri*−*<sup>x</sup> hm f rj K rj*−*<sup>x</sup> hm dridr<sup>j</sup>* \* = (*mh<sup>n</sup> m*) −2 *m f*(*r*)*K*<sup>2</sup> *r*−*<sup>x</sup> hm dr* + *m*(*m* − 1) *<sup>f</sup>*(*r*)*<sup>K</sup> r*−*<sup>x</sup> hm dr* 2 = (*mh<sup>n</sup> m*) −2 *mh<sup>n</sup> m <sup>K</sup>*2(*u*)*f*(*<sup>x</sup>* <sup>+</sup> *hmu*)*d<sup>u</sup>* <sup>+</sup> *<sup>m</sup>*(*<sup>m</sup>* <sup>−</sup> <sup>1</sup>)(*h<sup>n</sup> m f*(*x* + *hmu*)*K*(*u*)*du*) 2 . (24)

From Equation (23),

$$E\left[\hat{f}\_{\mathbf{K}}(\mathbf{x})\right] - f(\mathbf{x}) = \frac{h\_{\mathrm{m}}^{2}}{2} \int \mathbf{u}^{\mathrm{T}} \left(\frac{\partial^{2}f(\mathbf{x} + \theta h\_{\mathrm{m}}\mathbf{u})}{\partial \mathbf{x} \partial \mathbf{x}^{\mathrm{T}}}\right) \mathbf{u} K(\mathbf{u}) d\mathbf{u},\tag{25}$$

where *θ* represents a constant value between 0 and 1. According to Equations (23) and (24),

$$E\left[\hat{f}\_K^2(\mathbf{x})\right] - \left(E\left[\hat{f}\_K(\mathbf{x})\right]\right)^2 = \frac{\int K^2(\mathbf{u})f(\mathbf{x} + h\_m\mathbf{u})d\mathbf{u}}{mh\_m^n} - \frac{\left(\int K(\mathbf{u})f(\mathbf{x} + h\_m\mathbf{u})d\mathbf{u}\right)^2}{m}.\tag{26}$$

According to the Equations (25) and (26), the following equation holds.

$$\begin{split} E\left[f\_{K}(\mathbf{x}) - f(\mathbf{x})\right]^{2} &= E\left[f\_{K}^{2}(\mathbf{x})\right] - \left(E\left[f\_{K}(\mathbf{x})\right]\right)^{2} + \left(E\left[f\_{K}(\mathbf{x})\right] - f(\mathbf{x})\right)^{2} \\ &= \frac{\int K^{2}(\mathbf{u})f(\mathbf{x} + h\_{m}\mathbf{u})d\mathbf{u}}{mh\_{m}^{n}} - \frac{\left(\int K(\mathbf{u})f(\mathbf{x} + h\_{m}\mathbf{u})d\mathbf{u}\right)^{2}}{m} \\ &+ \left(\frac{1}{2}h\_{m}^{2}\int \mathbf{u}^{\mathrm{T}}\left(\frac{\partial^{2}f(\mathbf{x} + \theta h\_{m}\mathbf{u})}{\partial\mathbf{x}\partial\mathbf{x}^{\mathrm{T}}}\right)\mathbf{u}K(\mathbf{u})d\mathbf{u}\right)^{2} \end{split} \tag{27}$$

To facilitate the subsequent reasoning, the following theorem is given.

**Theorem 2.** *For any matrix* **Φ***, K*(·) *is a kernel density function with symmetric form; then,*

$$\int \mathbf{x}^{\mathsf{T}} \boldsymbol{\Phi} \mathbf{x} \boldsymbol{K}(\mathbf{x}) d\mathbf{x} = \frac{\mathrm{tr}(\boldsymbol{\Phi})}{n} \int \mathbf{x}^{\mathsf{T}} \mathbf{x} \boldsymbol{K}(\mathbf{x}) d\mathbf{x}.\tag{28}$$

**Proof.** If the odd function *<sup>g</sup>*(*x*) is integrable on <sup>R</sup>, there must be <sup>∫</sup> <sup>∞</sup> <sup>−</sup><sup>∞</sup> *<sup>g</sup>*(*x*)*dx* = 0. Similarly, it can be verified that the kernel function *K*(·) with a symmetric form satisfies

$$\int \cdots \cdots \int \sum\_{i \neq j} \Phi\_{ij} \mathbf{x}\_i \mathbf{x}\_j K(\mathbf{x}) dx\_1 \cdots dx\_n = 0. \tag{29}$$

Then,

$$\begin{split} &\int \mathbf{x}^{\mathsf{T}} \Phi \mathbf{x} K(\mathbf{x}) d\mathbf{x} = \int \cdots \cdots \int \mathbf{x}^{\mathsf{T}} \Phi \mathbf{x} K(\mathbf{x}) d\mathbf{x}\_{1} \cdots \cdots d\mathbf{x}\_{n} \\ &= \int \cdots \cdots \int \sum\_{i} \Phi\_{ii} \mathbf{x}\_{i}^{2} K(\mathbf{x}) d\mathbf{x}\_{1} \cdots d\mathbf{x}\_{n} + \int \cdots \cdots \int \sum\_{i \neq j} \Phi\_{ij} \mathbf{x}\_{i} \mathbf{x}\_{j} K(\mathbf{x}) d\mathbf{x}\_{1} \cdots d\mathbf{x}\_{n} \\ &= \text{tr}(\Phi) \int \cdots \cdots \int \mathbf{x}\_{1}^{2} K(\mathbf{x}) d\mathbf{x}\_{1} \cdots d\mathbf{x}\_{n} \\ &= \frac{\text{tr}(\Phi)}{n} \int \cdots \cdots \int \mathbf{x}^{\mathsf{T}} \mathbf{x} K(\mathbf{x}) d\mathbf{x}\_{1} \cdots d\mathbf{x}\_{n} \\ &= \frac{\text{tr}(\Phi)}{n} \int \mathbf{x}^{\mathsf{T}} \mathbf{x} K(\mathbf{x}) d\mathbf{x}. \end{split} \tag{30}$$

Thus, the Theorem 2 is proved.

For any unit length vector *<sup>u</sup>* <sup>∈</sup> <sup>R</sup>*n*, the Taylor expansion can be used to obtain

$$\begin{cases} f(\mathbf{x} + h\_m \mathbf{u}) = f(\mathbf{x}) + h\_m \mathbf{u}^\mathsf{T} \nabla (f(\mathbf{x})) + o(h\_m) \\ \frac{\partial^2 f(\mathbf{x} + \theta h\_m \mathbf{u})}{\partial \mathbf{x}\_i \partial \mathbf{x}\_j} = \frac{\partial^2 f(\mathbf{x})}{\partial \mathbf{x}\_i \partial \mathbf{x}\_j} + \theta h\_m \mathbf{u}^\mathsf{T} \nabla \left(\frac{\partial^2 f(\mathbf{x})}{\partial \mathbf{x}\_i \partial \mathbf{x}\_j}\right) + o(h\_m) \end{cases} \tag{31}$$

If the bandwidth *hm* satisfies the condition

$$\begin{cases} \lim\_{m \to \infty} (h\_m) = 0, \\ \lim\_{m \to \infty} \left( \frac{1}{mh\_m^n} \right) = 0, \end{cases} \tag{32}$$

Then, from Equations (22)–(32), we get that

$$\mathbb{E}\left[\hat{f}\_{\mathbf{K}}(\mathbf{x}) - f(\mathbf{x})\right]^2 = \frac{c\_{\mathbf{K}}f(\mathbf{x})}{mh\_m^n} + o\left(\frac{1}{mh\_m^n}\right) + \frac{h\_m^4 d\_{\mathbf{K}}^2}{4n^2} \left(\text{tr}\left(\frac{\partial^2 f(\mathbf{x})}{\partial \mathbf{x} \partial \mathbf{x}^\mathbf{T}}\right)\right)^2 + o\left(h\_m^4\right). \tag{33}$$

In fact,

E ˆ *fK*(*x*) − *f*(*x*) 2 = *K*2(*u*)*f*(*x* + *hmu*)*du mh<sup>n</sup> m* − ( *K*(*u*)*f*(*x* + *hmu*)*du*) 2 *m* + *h*<sup>2</sup> *m* 2 *u*T *∂*<sup>2</sup> *f*(*x* + *θhmu*) *∂x∂x*<sup>T</sup> *uK*(*u*)*du* 2 <sup>=</sup> *cK <sup>f</sup>*(*x*) *mh<sup>n</sup> m* + *o* 1 *mh<sup>n</sup> m* <sup>−</sup> *<sup>f</sup>*(*x*) 2 *<sup>m</sup>* <sup>+</sup> *<sup>o</sup>* 1 *m* + *h*<sup>2</sup> *m* <sup>2</sup>*<sup>n</sup>* tr*∂*<sup>2</sup> *<sup>f</sup>*(*<sup>x</sup>* <sup>+</sup> *<sup>θ</sup>hmu*) *∂x∂x*<sup>T</sup> *u*T*uK*(*u*)*du* 2 <sup>=</sup> *cK <sup>f</sup>*(*x*) *mh<sup>n</sup> m* + *o* 1 *mh<sup>n</sup> m* + *h*<sup>2</sup> *m* <sup>2</sup>*<sup>n</sup>* tr*∂*<sup>2</sup> *<sup>f</sup>*(*x*) *∂x∂x*<sup>T</sup> *dK* + *o h*2 *m* <sup>2</sup> <sup>=</sup> *cK <sup>f</sup>*(*x*) *mh<sup>n</sup> m* + *o* 1 *mh<sup>n</sup> m* + *h*4 *md*<sup>2</sup> *K* 4*n*<sup>2</sup> tr*∂*<sup>2</sup> *<sup>f</sup>*(*x*) *∂x∂x*<sup>T</sup> <sup>2</sup> + *o h*4 *m* . (34)

Based on Equation (33), if tr*∂*<sup>2</sup> *<sup>f</sup>*(*x*) *∂x∂x*<sup>T</sup> is integrable, there is

$$\begin{split} \text{MISE}\left(\hat{f}\_{\mathbf{K}}(\mathbf{x})\right) &= \int \left(\frac{c\_{\mathbf{K}}f(\mathbf{x})}{mh\_{m}^{n}} + \frac{h\_{m}^{4}}{4n^{2}} \left(d\_{\mathbf{K}}\text{tr}\left(\frac{\partial^{2}f(\mathbf{x})}{\partial\mathbf{x}\partial\mathbf{x}^{\mathsf{T}}}\right)\right)^{2}\right) d\mathbf{x} + o\left(\frac{1}{mh\_{m}^{n}}\right) + o(h\_{m}) \\ &= \frac{c\_{\mathbf{K}}}{mh\_{m}^{n}} + \frac{1}{4n^{2}}h\_{m}^{4}d\_{\mathbf{K}}^{2} \int \text{tr}\left(\frac{\partial^{2}f(\mathbf{x})}{\partial\mathbf{x}\partial\mathbf{x}^{\mathsf{T}}}\right)^{2} d\mathbf{x} + o\left(\frac{1}{mh\_{m}^{n}}\right) + o(h\_{m}). \end{split} \tag{35}$$

When MISE <sup>ˆ</sup> *fK*(·) is the smallest, the derivative of Equation (35) with respect to *hm* is 0, which means

$$\frac{\partial MSE(\hat{f}\_K(\mathbf{x}))}{\partial h\_m} = 0.\tag{36}$$

Thus, the optimal bandwidth *hm* in Theorem 1 is obtained as

$$h\_{\rm mf} = \left(\frac{md\_K^2}{n^3 c\_K} \int \text{tr}\left(\frac{\partial^2 f(\mathbf{x})}{\partial \mathbf{x} \partial \mathbf{x}^\mathrm{T}}\right)^2 d\mathbf{x}\right)^{-1/\left(n+4\right)}.\tag{37}$$

**Remark 2.** *When the number of samples m is determined, the appropriate bandwidth hm can be selected using Equation (20) to construct the KDE, which can better fit the sample distribution. In Equation (20), the influence of the kernel function on bandwidth selection is on cK and dK, which are almost the same under different kernel function selection, and they have a slight effect on the final bandwidth selection.*
