*2.1. CVA Monitoring Model*

A power system is a classical dynamic process [26,27], where the measurement data demonstrate the clear trend along the sampling time. The measured three-phase electric field and current waveform change with time, and the current data point has a certain correlation with the historical samples. Therefore, it is more reasonable to apply the dynamic data analysis tool to extract the process features.

Canonical variate analysis (CVA) is an effective dynamic data analysis tool, which has been applied to the model identification and control in the multivariate dynamic system [28,29]. This paper introduces it to deal with power system data. For a certain power transmission line, the data points under normal system operation have a fixed correlation along the time dimension. When a disturbance occurs, this correlation may be destroyed. By monitoring the correlation among the time series data, CVA can find the system disturbance effectively. When CVA is applied to data modeling, the training data are firstly divided into the historical data set and the future data set, and the CVA optimization problem is designed to find the maximum correlation between these two data sets for describing the data dynamic features. The algorithm details are clarified as follows.

For the power system measurement data vector *xh* ∈ *Rm* at the *h*-th sampling instant, its corresponding historical data vector *ph* and future data vector *fh* are constructed as

$$\mathbf{p}\_h = [\mathbf{x}\_h^T, \mathbf{x}\_{h-1}^T \cdot \dots \cdot, \mathbf{x}\_{h-l+1}^T]^T \in \mathbf{R}^M \tag{1}$$

$$f\_{\mathbf{h}} = [\mathbf{x}\_{\mathbf{h}+1}^T, \mathbf{x}\_{\mathbf{h}+2\mathbf{1}}^T, \dots, \mathbf{x}\_{\mathbf{h}+l}^T]^T \in \mathbb{R}^M \tag{2}$$

where *M* = *m* × *l*, and *l* represents the time lag order.

Given the projection vectors *a* and *b*, they are used to transform the historical and future vectors into their respective projections *d* = *a<sup>T</sup> ph* and *v* = *b<sup>T</sup> f<sup>h</sup>*. CVA is to optimize the vector pair *a* and *b* so that the correlation between *d* and *v* is maximized, which are also called canonical variates. This can be described by the mathematical expression as

$$\begin{cases} \max\_{a,b} \rho(d, v) = a^T \boldsymbol{\Sigma}\_{pf} \boldsymbol{b} \\ \text{s.t. } var(d) = a^T \boldsymbol{\Sigma}\_{pp} \boldsymbol{a} = 1 \\ \quad var(v) = b^T \boldsymbol{\Sigma}\_{ff} \boldsymbol{b} = 1 \end{cases} \tag{3}$$

where **Σ***p f* represents the cross-covariance matrix of the historical and future data vectors, and **<sup>Σ</sup>***pp*, **Σ***f f* denote the covariance matrix of the historical and future data vectors, respectively.

Suppose that the training data set includes *n* samples as *X* = [*x<sup>T</sup>* 1 , *x<sup>T</sup>* 2 , ··· , *x<sup>T</sup> n* ] *T* ∈ *Rn*<sup>×</sup>*m*, then the historical and future data matrix can be expressed by

$$P = [p\_l^T, p\_{l+1}^T, \dots, p\_{n-l}^T]^T \in \mathbb{R}^{N \times M} \tag{4}$$

$$F = [f\_l^T, f\_{l+1}^T, \dots, f\_{n-l}^T]^T \in \mathbb{R}^{N \times M} \tag{5}$$

where *N* = *n* − 2*l* + 1 is the sample number of the historical and future data matrix. Then the covariance matrices defined in Equation (3) can be calculated by

$$
\Sigma\_{pf} = \frac{1}{N-1} \mathbf{P}^T \mathbf{F} \tag{6}
$$

$$
\Sigma\_{pp} = \frac{1}{N-1} \mathbf{P}^T \mathbf{P} \tag{7}
$$

$$
\Sigma\_{ff} = \frac{1}{N-1} \mathbf{F}^T \mathbf{F} \tag{8}
$$

Solving the optimization problem described by Equation (3) leads to a singular value decomposition on the matrix **Ξ** = **<sup>Σ</sup>**−1/2 *pp* **Σ***p f* **<sup>Σ</sup>**−1/2 *f f* , which is expressed by

$$
\Xi = \mathbf{U}\Lambda\mathbf{V}^T\tag{9}
$$

The solution of Equation (9) is further used to build the a series of the projection vectors *ai* and *bi*(<sup>1</sup> ≤ *i* ≤ *<sup>M</sup>*), which are computed by

$$\mathfrak{a}\_{i} = \mathfrak{L}\_{pp}^{-1/2} \mathbf{U}(:, i), \; \mathfrak{b}\_{i} = \mathfrak{L}\_{ff}^{-1/2} V(:, i), \tag{10}$$

where (:, *i*) represents the *i*-th column of the matrix. The vectors *ai* and *bi* are ordered by the corresponding correlation degree, which is given in the diagonal elements of matrix **Λ**, also meaning the correlation coefficients. The first *s* pairs of projection vectors {*<sup>a</sup>i*, *bi*, 1 ≤ *i* ≤ *s*} describe the stronger correlation and indicate the close relationship between the historical data and the future data. Therefore, a projection matrix *As* = [*<sup>a</sup>*1*a*2 ··· *<sup>a</sup>s*] is defined to extract the canonical variate vector *dh* as

$$d\_{\rm li} = A\_{\rm s}^{\rm T} p\_{\rm h}. \tag{11}$$

which describes the main dynamic features of process data. Here, *s* is determined so that the corresponding canonical variate vectors describe a cumulative percentage of 90% of correlation coefficients.

As *As* only involves the first *s* projection directions, it cannot cover all the data information. The rest information in the CVA model can be described by the CVA residual vector *eh* as

$$\mathbf{e}\_{l} = \left(\mathbf{I} - \mathbf{A}\_{s}\mathbf{A}\_{s}^{T}\right) p\_{l} \tag{12}$$

Based on the canonical variate vector and CVA residual vector, two monitoring statistics *T*<sup>2</sup> and *Q* are often used to judge the process state. The *T*<sup>2</sup> statistic describes the changes of principal dynamic states, while the *Q* statistic monitors the changes of the residual information. For the *h*-th sample, the statistics are written by

$$T\_h^2 = d\_h^T d\_h \tag{13}$$

$$Q\_h = \mathbf{e}\_h^T \mathbf{e}\_h \tag{14}$$

In the normal operation, these two statistics should satisfy *T*2*h* ≤ *<sup>T</sup>*2*h*,*lim* and *Qh* ≤ *Qh*,*lim*, where *<sup>T</sup>*2*h*,*lim* and *Qh*,*lim* are the corresponding confidence limits. In some literature, the confidence limits of these two statistics can be obtained by assuming the prior distribution [30]. However, these distribution assumptions are often difficult to satisfy. Therefore, this paper applies the data-driven kernel density estimation to determine the confidence limit [31,32].

### *2.2. CVAkNN Model Based on kNN Monitoring Index*

As the measurement data of power transmission systems have the periodic fluctuation characteristic, the traditional CVA monitoring statistics *T*<sup>2</sup> and *Q* behave unsteadily with the periodic changes. In this case, disturbance detection by directly monitoring the amplitudes of monitoring statistics cannot discover the disturbance signals effectively and may lead to a high disturbance missing rate.

In order to overcome this defect, this paper introduces the *k*-nearest neighbor analysis (*k*NN) to enhance the basic monitoring statistics. *k*NN is one effective multimodal data analysis tool and does not depend on the amplitude changes before and after the disturbance. In the literature [33,34], *k*NN was introduced and adapted for real-time detection of system disturbances. By combining the CVA model and the *k*NN-based monitoring statistics, the improved method, which is called CVA*k*NN, has a stronger capability of dealing with the periodic oscillation data property. The main idea of CVA*k*NN is to first

reconstruct the monitoring statistic in the phase space and then build the monitoring index based on the distance between the reconstructed statistic vector and its *k*-nearest neighbor.

Phase space reconstruction is a good method to deal with time series analysis. This method regards one-dimensional time series as the result of nonlinear dynamic system motion and constructs the phase vectors by re-arranging the time series. This theory has been successfully applied in the fields of chaotic time series prediction and equipment failure data analysis [35,36]. Here it is introduced to deal with the CVA monitoring statistics for the further *k*NN analysis.

For the training data set with *n* samples *x*1, *x*2, ... , *xn*, the corresponding statistics vectors are obtained by the CVA modeling as

$$T^2 = \begin{bmatrix} T\_I^2 \ T\_{I+1}^2 \ \cdots \ T\_n^2 \end{bmatrix} \tag{15}$$

$$\mathbf{Q} = [\mathbf{Q}\_l \; \mathbf{Q}\_{l+1} \; \cdots \; \mathbf{Q}\_n] \tag{16}$$

Further, the phase reconstruction statistics matrix can be formulated as follows:

$$\mathbf{MT}^2 = \begin{bmatrix} T\_l^2 & \cdots & T\_{l+L-2}^2 & T\_{l+L-1}^2 \\ T\_{l+1}^2 & \cdots & T\_{l+L-1}^2 & T\_{l+L}^2 \\ \vdots & \vdots & \vdots & \vdots \\ T\_{n-L+1}^2 & \cdots & T\_{n-1}^2 & T\_n^2 \end{bmatrix} \tag{17}$$

$$\mathbf{M}\mathbf{Q} = \begin{bmatrix} Q\_l & \cdots & Q\_{l+L-2} & Q\_{l+L-1} \\ Q\_{l+1} & \cdots & Q\_{l+L-1} & Q\_{l+L} \\ \vdots & \vdots & \vdots & \vdots \\ Q\_{n-L+1} & \cdots & Q\_{n-1} & Q\_n \end{bmatrix} \tag{18}$$

where *L* is the embedding dimension defining the length of the reconstructed phase vector. Based on the results of the phase space reconstruction, the dynamic behavior of the statistics can be better described, which is conducive to the detection of power system disturbances.

In the online monitoring stage, a new testing sample *xt* is collected at the *t*-th sampling instant. Then the monitoring statistics can be computed by applying Equations (13) and (14), and the reconstructed phase vectors are described as

$$\mathbf{NT}\_t^2 = \begin{bmatrix} T\_{t-L+1}^2 \ \cdots \ T\_{t-1}^2 \ T\_t^2 \end{bmatrix} \tag{19}$$

$$N\mathbf{Q}\_t = \begin{bmatrix} \mathbf{Q}\_{t-L+1} \ \cdots \ \mathbf{Q}\_{t-1} \ \mathbf{Q}\_t \end{bmatrix} \tag{20}$$

To determine whether the test data *xt* is normal, it is necessary to compare the similarity between *NT*2*t* , *NQ*<sup>2</sup>*t* , and the reconstructed statistics matrix in Equations (17) and (18). If the reconstructed statistics *NT*2*t* , *NQ*<sup>2</sup>*t* are strongly similar to one column of the training statistics vectors in Equations (17) and (18), then the test data *xt* describe the normal working condition. Otherwise, it means that some faults occur in the power transmission system. Therefore, the key is how to perform this similarity comparison. This paper introduces the *k*-nearest neighbor (*k*NN) analysis to construct a *k*NN-based distance measurement indicator: statistical nearest neighbor distance (SNND).

The idea of SNND is to find the first *k*-th nearest neighbors of the test vector in the given matrix data and compute the distance between the test vector and the *k*-th nearest neighbors as a disturbance detection criterion. The SNND index for *NT*2*t*is defined as

$$DT\_t^2 = \left| \left| \mathbf{NT}\_t^2 - \mathbf{MT}^2(j\_{k'} \mathbf{:}) \right| \right|, \tag{21}$$

where *MT*<sup>2</sup>(*jk*, :) represents the *jk*-th row in the *MT*<sup>2</sup> matrix, which corresponds to the *k*-th nearest neighbor of *NT*<sup>2</sup> *t* , and ||.|| represents the L2 norm calculation. By analogy, the SNND indicator of *NQt* can be established as

$$DQ\_t = ||\mathbf{N}\mathbf{Q}\_t - \mathbf{M}\mathbf{Q}(j\_{k\nu}; \cdot)||.\tag{22}$$

Under normal operating conditions, the above two indicators should fluctuate within a relatively small range. That means *DT*<sup>2</sup> *t* ≤ *DT*2*lim* and *DQt* ≤ *DQlim* for the normal running status. Once the threshold is exceeded, it means that there is a system disturbance. The threshold can be obtained by the kernel density estimation method.

### *2.3. SLCVA*k*NN Model Assisted by Statistical Local Analysis*

In the power transmission system, some weak disturbances are often difficult to detect, such as the high-impendence single-phase ground fault. When this kind of disturbance occurs, the changes reflected by the measure voltage and current variables are very small. Further, considering the influence of modeling error and process noise, this kind of disturbance may be concealed and viewed as the normal process changes. Therefore, enhacning the weak disturbance detection is of grea<sup>t</sup> value to ensure the safety of power transmission systems. In this paper, we integrate the statistical local analysis (SLA) with CVA*k*NN and propose an improved SLCVA*k*NN monitoring model for better weak disturbance monitoring performance.

SLA was originally proposed by Basseville [37] for inspecting the process parameter changes. In recent years, some researchers have introduced it into the chemical process fault detection and demonstrated its effectiveness [38–40]. In this paper, we will perform the statistical local analysis on the CVA model. To look back into the CVA monitoring statistics in Equations (13) and (14), it is found that the monitoring statistics used to indicate the process status are composed of the canonical variate vector *dh* and the CVA residual vector *eh*. Therefore, if we attempt to improve the weak disturbance monitoring of CVA statistics, the vectors *dh* and *eh* must be improved with stronger disturbance sensitivity.

According to the statistical local analysis theory, given the system observation *zj* and the system parameter *ϑ*, a primary residual vector *<sup>ϕ</sup>*(*<sup>z</sup>j*, *ϑ*) can be defined for disturbance detection if it meets the following conditions: [37,38]

• *<sup>E</sup>*{*ϕ*(*<sup>z</sup>j*, *ϑ*)} = 0, if *ϑ* = *ϑ*0;


Here *ϑ*0 represents the parameters under the normal condition.

By investigating the *i*-th element in the vector *dh*, which is denoted as *dh*,*i*, it is easily derived by Equation (11) that *dh*,*<sup>i</sup>* = *a<sup>T</sup> iph*. Naturally, the variance of *dh*,*<sup>i</sup>* can be computed as

$$E\left\{d\_{h,i}^{2}\right\} = a\_i^T E\left\{p\_h p\_h^T\right\} a\_i \tag{23}$$

For the statistical samples, *<sup>E</sup>*{*php<sup>T</sup> h* } is factually equal to the covariance matrix **<sup>Σ</sup>***pp*. Further combining the first constraint on the vector *a* in Equation (3), it is known that *a<sup>T</sup> i<sup>E</sup>*{*php<sup>T</sup> h*}*<sup>a</sup>i* = 1. Therefore, we build the SLA primary residual of the canonical variate as

$$
\varphi\_{d\_{h,i}} = d\_{h,i}^2 - 1.\tag{24}
$$

which meets the condition *<sup>E</sup>*{*ϕdh*,*<sup>i</sup>*} = 0 in the normal condition.

 Similarly, we analyze the variance of *eh*,*<sup>i</sup>* to obtain

$$E\{e\_{h,i}^2\} = A\_{\varGamma}(i,:)E\{p\_h p\_h^T\}A\_{\varGamma}(i,:)^T \tag{25}$$

As *Ar* can be obtained in the model training procedure, the above expression must be equal to a fixed value, which is denoted as *σi* = *<sup>A</sup>r*(*<sup>i</sup>*, :)*E*{*ph<sup>p</sup>Th* }*<sup>A</sup>r*(*<sup>i</sup>*, :)*<sup>T</sup>*. Therefore, the SLA primary residual of the CVA residual can be built as

$$
\varphi\_{c\_{h,i}} = e\_{h,i}^2 - \sigma\_i. \tag{26}
$$

which meets the condition *<sup>E</sup>*{*ϕeh*,*<sup>i</sup>*} = 0 for the normal data.

For a more sensitive disturbance detection, the SLA improved residual is applied in a moving window with the width of *w*, which is expressed by

$$\psi\_{d\_{h,i}} = \begin{cases} \frac{1}{\sqrt{n}} \sum\_{j=1}^{h} \varphi\_{d-j,i}, h < w \\\frac{1}{\sqrt{w}} \sum\_{j=h-w+1}^{h} \varphi\_{d\_{j,i}}, h \ge w \end{cases} \tag{27}$$

$$\psi\_{\mathfrak{e}\_{h,i}} = \begin{cases} \frac{1}{\sqrt{h}} \sum\_{j=1}^{h} \mathfrak{q}\_{\mathfrak{e}\_{j,i'}} h < w \\\\ \frac{1}{\sqrt{w}} \sum\_{j=h-w+1}^{h} \mathfrak{q}\_{\mathfrak{e}\_{j,i'}} h \ge w \end{cases} \tag{28}$$

Up to now, we can obtain the SLA improved residual vectors *ψ<sup>d</sup>*,*<sup>h</sup>* = [*ψdh*,<sup>1</sup> *ψdh*,<sup>2</sup> ··· *ψdh*,*<sup>s</sup>* ]*T* and *ψ<sup>e</sup>*,*<sup>h</sup>* = [*ψeh*,<sup>1</sup> *ψeh*,<sup>2</sup> ··· *ψeh*,*<sup>M</sup>* ]*T*. These residual vectors are used to replace the original CVA features *dh* and *eh* so that the monitoring model is modified to the SLCVAkNN model.

With the SLA improved residual vectors, the monitoring statistics are constructed as follows:

$$T\_h^2 = \Psi\_{d,h}^T \Psi\_{d,h} \tag{29}$$

$$Q\_h = \boldsymbol{\Psi}\_{\boldsymbol{c},h}^T \boldsymbol{\Psi}\_{\boldsymbol{c},h} \tag{30}$$

### **3. Disturbance Detection Procedure Based on SLCVA***k***NN**

Power transmission system disturbance detection based on SLCVA*k*NN method is divided into two stages: offline modeling stage and online detection stage. The corresponding flowchart is shown in Figure 1.

Stage 1: offline modeling stage


Stage 2: online detection stage

1. Obtain online new data *xt* and normalize it with the training data.


Here, it is pointed out that the local neighborhood standardization (LNS) [41] may be used to enhance the traditional z-score standardization. Compared with the traditional z-score method, LNS has better capability to deal with the non-steady data with the periodic oscillations.

**Figure 1.** Flow chart of disturbance detection by SLCVA*k*NN.
