*3.2. Optimal Bandwidth Algorithm*

The optimal bandwidth formula is given by Equation (20). However, *f*(*x*) is unknown in Equation (20), and therefore, <sup>∫</sup> tr*∂*<sup>2</sup> *<sup>f</sup>*(*x*) *∂x∂x*<sup>T</sup> *dx* is also unknown. An approximate value of the bandwidth parameter *hm* can be obtained by replacing *f*(*x*) with ˆ *fK*(*x*) in Equation (16). Furthermore, an iterative algorithm can be used to calculate a more accurate bandwidth parameter. Theorem 3 shows that the algorithm is convergent.

**Theorem 3.** *For any n-dimensional probability density function f*(·) *and Gaussian kernel function <sup>K</sup>*(·)*, if* <sup>ˆ</sup> *fK*(·) *in Equation (16) is used to estimate f*(·)*, then the iterative calculation formula of hm is obtained as*

$$h\_{m,k+1} = \left(\frac{md\_K^2}{n^3 c\_K} \int \text{tr}\left(\frac{\partial^2 \hat{f}\_K(\mathbf{x}, h\_{m,k})}{\partial \mathbf{x} \partial \mathbf{x}^\mathrm{T}}\right)^2 d\mathbf{x}\right)^{-1/(n+4)}\tag{38}$$

*and it is convergent, where hm*,*<sup>k</sup> is the value of hm during the k* − th *iteration.*

**Proof.** For a particular Gaussian kernel function

$$K(\mathfrak{u}) = (2\pi)^{-n/2} e^{-\left(\mathfrak{u}^\mathrm{T}\mathfrak{u}\right)/2} \tag{39}$$

*dK* is a *χ*<sup>2</sup> distribution with degree of freedom *n*, and the expectation is equal to the degree of freedom.

$$d\_K = \int \mathfrak{u}^{\mathsf{T}} \mathfrak{u} K(\mathfrak{u}) d\mathfrak{u} = n \tag{40}$$

In addition,

$$\mathfrak{c}\_{K} = \int \mathcal{K}^{2}(\mathfrak{u})d\mathfrak{u} = \int \left(2\pi\right)^{-n} \mathfrak{e}^{-\mathfrak{u}^{\mathsf{T}}\mathfrak{u}} d\mathfrak{u} = \left(2\sqrt{\pi}\right)^{-n}.\tag{41}$$

Substituting Equations (39)–(40) into Equation (20) and substituting ˆ *fK*(*x*) in Equation (16) for *f*(*x*), the iterative form of calculating *hm* is obtained as

$$\begin{split} h\_{m,k+1} &= \left(\frac{n}{m}\right)^{1/\left(n+4\right)} \left(2\sqrt{\pi}\right)^{-n/\left(n+4\right)} \left(\int \text{tr}\left(\frac{\partial^2 f\_K(\mathbf{x})}{\partial x \partial \mathbf{x}^1}\right)^2 d\mathbf{x}\right)^{-1/\left(n+4\right)} \\ &= \left(mnh\_{m,k}^{2n}\right)^{1/\left(n+4\right)} \left(2\sqrt{\pi}\right)^{-n/\left(n+4\right)} \left(\int \text{tr}\left(\frac{\partial^2}{\partial x \partial \mathbf{x}^1} \left(\sum\_{i=1}^m K\left(\frac{r\_i - x}{\hbar\_{mk}}\right)\right)\right)^2 d\mathbf{x}\right)^{-1/\left(n+4\right)} \end{split} \tag{42}$$

To facilitate the subsequent reasoning, the following lemma is given as

**Lemma 1.** *For any function f*1, *f*2, ··· , *fn, inequality*

$$\int \left(f\_1 + f\_2 + \dots + f\_n\right)^2 d\mathbf{x} \le \int n \left(f\_1^2 + f\_2^2 + \dots + f\_n^2\right) d\mathbf{x}.\tag{43}$$

*If and only if f*1(*x*) = *f*2(*x*) = ··· = *fn*(*x*) *holds almost everywhere.*

**Proof.** In fact, for any function *f*1, *f*2, ··· , *fn*, there are

$$0 \le \left(f\_1(\mathbf{x}) + f\_2(\mathbf{x}) + \dots + f\_n(\mathbf{x})\right)^2 \le n\left(f\_1(\mathbf{x})^2 + f\_2(\mathbf{x})^2 + \dots + f\_n(\mathbf{x})^2\right). \tag{44}$$

Thus, the two sides of Equation (44) are integrated as

$$\int \left(f\_1 + f\_2 + \dots + f\_n\right)^2 d\mathbf{x} \le \int n \left(f\_1^2 + f\_2^2 + \dots + f\_n^2\right) d\mathbf{x}.\tag{45}$$

It is obvious that the sign of Equation (43) holds the condition that *f*1(*x*) = *f*2(*x*) = ··· = *fn*(*x*) is almost everywhere.

Because the second derivative of Equation (39) with respect to *xi* is

$$\frac{\partial^2}{\partial \mathbf{x}\_i \partial \mathbf{x}\_i} \mathbb{K}(\mathbf{x}) = (2\pi)^{-n/2} e^{-\left(\mathbf{x}^T \mathbf{x}\right)/2} \left(\mathbf{x}\_i^2 - 1\right). \tag{46}$$

In addition,

$$\begin{split} \int \left( \frac{\partial^2}{\partial \mathbf{x}\_j \partial \mathbf{x}\_j} \left( K \left( \frac{\mathbf{r}\_i - \mathbf{x}}{h\_{m,k}} \right) \right) \right)^2 d\mathbf{x} &= \int \left( \frac{\partial^2}{\partial \mathbf{x}\_j \partial \mathbf{x}\_j} \left( (2\pi)^{-n/2} e^{-(\mathbf{r}\_i - \mathbf{x})^\intercal (\mathbf{r}\_i - \mathbf{x})} \Big/ 2h\_{m,k}^2 \right) \right)^2 d\mathbf{x} \\ &= \frac{3}{4} (2\sqrt{\pi})^{-n} h\_{m,k}^{n-4} . \end{split} \tag{47}$$

From Lemma 1 and Equation (47)

$$\begin{split} \int \int \text{tr}\left(\frac{\hat{\partial}^2}{\partial \mathbf{x} \partial \mathbf{x}^T} \left(\sum\_{i=1}^m K\left(\frac{\mathbf{r}\_i - \mathbf{x}}{h\_{m,k}}\right)\right)\right)^2 d\mathbf{x} &\leq \int nm \sum\_{i,j} \left(\frac{\hat{\partial}^2}{\partial \mathbf{x}\_j \partial \mathbf{x}\_j} \left(K\left(\frac{\mathbf{r}\_i - \mathbf{x}}{h\_{m,k}}\right)\right)\right)^2 d\mathbf{x} \\ &= \frac{3}{4} (nm)^2 \left(2\sqrt{\pi}\right)^{-n} h\_{m,k}^{n-4} .\end{split} \tag{48}$$

When *hm*,*<sup>k</sup>* is sufficiently large, we can assume that *K ri*−*<sup>x</sup> hm*,*<sup>k</sup>* is almost the same everywhere, i.e., the equal sign in Equation (48) is tenable.

$$\begin{split} h\_{m,k+1} &= \left(\frac{mmh\_{m,k}^{2n}}{2\sqrt{\pi}}\right)^{1/\left(n+4\right)} \left(\frac{3}{4} (nm)^2 \left(2\sqrt{\pi}\right)^{-n} h\_{m,k}^{n-4}\right)^{-1/\left(n+4\right)} \\ &= h\_{m,k} \left(\frac{3}{4}nm\right)^{-1/\left(n+4\right)} < h\_{m,k} \end{split} \tag{49}$$

When *hm*,*<sup>k</sup>* is large, the iterative process decreases. Because *hm*,*<sup>k</sup>* has a lower bound, the algorithm converges.

In summary, the KDE method based on optimal bandwidth is given (see Algorithm 1), and the flowchart of the KDE method is shown in Figure 1.


**Figure 1.** Flowchart of KDE method based on optimal bandwidth.

#### **4. Fault Detection Method Based on JS Divergence Distribution**

In Section 3, we construct a multidimensional KDE method based on the optimal bandwidth; this method can accurately describe the density distribution of multidimensional data. JS divergence is used to measure the distribution difference, and thus, it can highlight the difference in the statistical characteristics of different mode data.

#### *4.1. Mode Difference Index*

In Section 3, the probability density estimation of multidimensional data is obtained using the kernel function method, and the optimal bandwidth formula is derived. When the system fails, the state of the system will inevitably change, and the statistical characteristics of the system output will also change, thereby leading to significant changes in the density distribution of the observed data. For two groups of the sample window data *R* and *Z*, the cross entropy *H*(*R*, *Z*) can be used to measure the distribution difference of *R* and *Z*.

$$H(\mathbf{R}, \mathbf{Z}) = \int -\hat{f}\_{K, \mathbf{Z}}(\mathbf{x}) \log \left( \hat{f}\_{K, \mathbf{R}}(\mathbf{x}) \right) d\mathbf{x},\tag{50}$$

where ˆ *fK*,*R*, ˆ *fK*,*<sup>Z</sup>* represents the optimal KDE of *R* and *Z* calculated using Equation (16).

*H*(*R*, *Z*) does not satisfy the definition of distance because *H*(*R*, *Z*) does not necessarily satisfy positive definiteness and symmetry; that is, *H*(*R*, *Z*) < 0 or *H*(*R*, *Z*) = *H*(*R*, *Z*).


The JS divergence *JS*(*R*, *Z*) was used as a measure of the distribution difference between *R* and *Z* in reference Zhang et al. [19], Bruni et al. [20] as follows:

$$\log(\mathbf{R}, \mathbf{Z}) = \int \begin{pmatrix} f\_{\mathbf{K}, \mathbf{R}} \log \left( f\_{\mathbf{K}, \mathbf{R}} \right) & + f\_{\mathbf{K}, \mathbf{Z}} \log \left( f\_{\mathbf{K}, \mathbf{Z}} \right) \\ & - \left( \hat{f}\_{\mathbf{K}, \mathbf{R}} + \hat{f}\_{\mathbf{K}, \mathbf{Z}} \right) \log \left( \left( \hat{f}\_{\mathbf{K}, \mathbf{R}} + \hat{f}\_{\mathbf{K}, \mathbf{Z}} \right) / 2 \right) \end{pmatrix} d\mathbf{x}. \tag{51}$$

It is easy to get that

$$\begin{cases} \int \mathbf{J}(\mathbf{R}, \mathbf{Z}) \ge 0 \\ \int \mathbf{J}(\mathbf{R}, \mathbf{Z}) = \mathbf{J} \mathbf{S}(\mathbf{Z}, \mathbf{R}) \end{cases} \tag{52}$$

In this paper, Equation (52) is used to measure the distribution difference between testing data *Z* and training data *R* for realizing fault detection and isolation.
