Next Article in Journal
Electricity Generation in India: Present State, Future Outlook and Policy Implications
Previous Article in Journal
Investigating the Influence of Reaction Conditions and the Properties of Ceria for the Valorisation of Glycerol
Previous Article in Special Issue
Data-Driven Optimization Control for Dynamic Reconfiguration of Distribution Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Detection and Analysis of Multiple Events Based on High-Dimensional Factor Models in Power Grid

1
Department of Electrical Engineering, Center for Big Data and Artificial Intelligence, State Energy Smart Grid Research and Development Center, Shanghai Jiaotong University, Shanghai 200240, China
2
Department of Electrical and Computer Engineering, Tennessee Technological University, Cookeville, TN 38505, USA
*
Author to whom correspondence should be addressed.
Energies 2019, 12(7), 1360; https://doi.org/10.3390/en12071360
Submission received: 21 March 2019 / Revised: 31 March 2019 / Accepted: 5 April 2019 / Published: 9 April 2019
(This article belongs to the Special Issue Data-Driven Methods in Modern Power Engineering)

Abstract

:
Multiple event detection and analysis in real time is a challenge for a modern grid as its features are usually non-identifiable. This paper, based on high-dimensional factor models, proposes a data-driven approach to gain insight into the constituent components of a multiple event via the high-resolution phasor measurement unit (PMU) data, such that proper actions can be taken before any sporadic fault escalates to cascading blackouts. Under the framework of random matrix theory, the proposed approach maps the raw data into a high-dimensional space with two parts: (1) factors (spikes, mapping faults); (2) residuals (a bulk, mapping white/non-Gaussian noises or normal fluctuations). As for the factors, we employ their number as a spatial indicator to estimate the number of constituent components in a multiple event. Simultaneously, the autoregressive rate of the noises is utilized to measure the variation of the temporal correlation of the residuals for tracking the system movement. Taking the spatial-temporal correlation into account, this approach allows for detection, decomposition and temporal localization of multiple events. Case studies based on simulated data and real 34-PMU data verify the effectiveness of the proposed approach.

1. Introduction

A multiple event is composed of several constituent components that occur successively within a short time period in the power system, such as load fluctuations, oscillations and faults. For a large-scale power grid, multiple events can hardly be identified properly as it is difficult to distinguish their features via the raw data. The uncertainty and non-transparency of multiple events pose a serious threat to the system; it may lead to wide-spread blackouts, such as the August 2003 U.S. northeastern blackouts and the July 2012 India blackouts [1,2].
The development of data-driven algorithms [3,4], and the increasing deployment of synchronous phase measurement units (PMUs) allow for real-time situational awareness of the system. However, it is not feasible for us to process the raw data in a handcrafted way, which motivates the need for multivariate statistical approaches to assist multi-event detection and analysis. For example, in [5], Rafferty utilized moving-window principal component analysis (PCA) for real-time multiple event detection and classification. In his approach, a threshold of the cumulative percentage of total variation is selected in advance and then the number of principal components is determined accordingly. The proposed statistics T 2 and Q are derived from the above PCA model. In [6], sequential events can be effectively distinguished if their occurring time is sufficiently apart such that the event signal can be segmented into ones from multiple single events. In [7], Wang proposed the nonnegative sparse event unmixing (NSEU) approach to extract each single event from the mixed multiple event in a small power grid. They then extended their work to large-scale power systems through cluster-based sparse coding in [8]. Some other research on multiple event analysis are mainly model-based [9,10]. They use many power system parameters to model the power grid and strongly depend on the topology of the grid network.
Along the well-established research line of random matrix theory (RMT) [11,12,13,14], this paper proposes a novel statistical approach, namely, high-dimensional factor models, for the multiple event detection, decomposition and temporal localization in a modern grid. Given the characteristics of PMU measurements, we explore the spatial correlation (cross-correlation) and temporal correlation (autocorrelation) in PMU data structure. The autocorrelation exists between the measurements of adjacent sampling points, whereas the cross-correlation exists between different PMUs at different locations [15,16]. The proposed method extracts spatial-temporal information from the raw data, respectively, in the form of factors (principal components) and residuals. The factors are related to some certain events (anomaly signals or faults) occurring in a power grid, whereas the residuals are associated with white/non-Gaussian noises or normal fluctuations within the raw PMU measurements. The formulation of both time and space jointly is very demanding analytically. Due to the special feature of free probability (a modern tool in random matrix theory), the analytical difficulty is first overcome in [17], and in recent years, it has been well studied in economics [18] and statistics [19]. This paper, for the first time, extends their approach to the context of a power grid.
The main advantages of our approach are summarized as follows: (1) It takes advantage of unique properties of high-dimensional factor models: under reasonable assumptions of noise distributions (white/non-Gaussian), the PMU data matrix can be exactly decomposed into a signal subspace and a noise subspace by minimizing the distance between two spectrums. Then, the spatial correlation and the temporal correlation in the PMU data matrix are extracted from the signal subspace and the noise subspace, respectively, in the form of the number of factors and the autoregressive rate of residuals. (2) The number of factors can reflect the number of constituent components and their respective details (e.g., occurring timings) in a multiple event, so as to realize the multi-event decomposition. (3) The autoregressive rate of residuals can extract the temporal correlation within noises or fluctuations, so as to track the system movement. It is noted that the proposed approach has the advantage of handling both white and non-Gaussian noises. (4) This approach is purely data-driven, requiring no knowledge of topologies or parameters of the power system, which makes it able to adapt to changes in topology. (5) It is practical for both on-line and off-line analysis using moving-split windows.
To the best of our knowledge, for the first time, an algorithm aiming at multiple event detection and analysis (in which successive constituent components occurring within a short time period) for power systems based on random matrix theory has been proposed. The rest of this paper is organized as follows. In Section 2, the raw PMU data is processed into a normalized spatial-temporal matrix, and the main idea of high-dimensional factor models is introduced. In Section 3, the concrete formulation of high-dimensional factor models is given. The number of factors and the autocorrelation rate of residuals are estimated by minimizing the distance between two spectrums. In Section 4, both simulated data and real 34-PMU data are utilized to validate the effectiveness of the proposed approach. Comparisons with other algorithms for feature selection, such as deep learning and principal component analysis (PCA), are included in Section 5. Conclusions of this research is given in Section 6.

2. Problem Formulation

In this section, we first construct the PMU data into a spatial-temporal matrix and normalize it. Then, we give the general idea of high-dimensional factor models.

2.1. Data Processing

In this paper, PMU measurements (e.g., voltage magnitudes data, power flow data) are employed as status data for analysis. We consider N observation variables jointly for a duration of T sampling points. The observation window is represented by a matrix of size N × T . With sampling points at time t , t T , the spatial-temporal matrix of PMU measurements X ^ t C N × T is formed as
X ^ t = ( x ^ t T + 1 , x ^ t T + 2 , , x ^ t )
where x ^ t = ( x ^ 1 , t , x ^ 2 , t , , x ^ N , t ) H is the N-dimensional vector measured at sampling time t. If we keep the last sampling time as the current time, with the moving split-window, the real-time analysis can be conducted.
Then, we convert the raw data matrix X ^ t obtained at each sampling time t into a standard non-Hermitian matrix X ˜ t with the following normalization:
x ˜ i , j = ( x ^ i , j μ ( x ^ i ) ) × σ ( x ˜ i ) σ ( x ^ i ) + μ ( x ˜ i )
where x ^ i = ( x ^ i , 1 , x ^ i , 2 , , x ^ i , T ) , μ ( x ˜ i ) = 0 , σ ( x ˜ i ) = 1 , i = 1 , 2 , , N and j = 1 , 2 , , T . X ˜ t is the normalized spatial-temporal matrix built at sampling time t.

2.2. High-Dimensional Factor Models

For a certain window, i.e., the one obtained at sampling time t, we aim to decompose the standard non-Hermitian matrix X ˜ t , as given in Equation (2), into factors and residuals as follows:
X ˜ t = L ( p ) F ( p ) + U ( p )
where p is the number of factors, F ( p ) is the factor, L ( p ) is the corresponding loading, and U ( p ) is the residual. Usually, only X ˜ t is available, whereas L ( p ) , F ( p ) and U ( p ) need to be estimated.
Factor model aims to simultaneously provide estimators of the number of factors and the autocorrelation rate of residuals. Considering a minimum distance between the empirical spectral distribution ρ real ( p ) and the theoretical spectral distribution ρ model ( b ) , the parameter-estimation problem is turned into a minimum-distance problem. The empirical one ρ real ( p ) , depending on the sampling data, is obtained as empirical spectral distribution (ESD) of Σ r e a l ( p ) in Equation (6), and the theoretical one ρ model ( b ) , based on Sokhotskys formula, is given as ρ model ( b ) in Equation (10). Therefore, we turn the factor model estimation problem into a classical optimization as
( p ^ , b ^ ) = arg min D ( ρ real ( p ) , ρ model ( b ) )
where D is a spectral distance measure or loss function. The solution of this minimization problem gives the number of factors, in the form of p ^ , and the parameter for the autocorrelation structure of residuals, in the form of b ^ . It is noted that the proposed approach considers the spatial-temporal correlation of the PMU data structure. It explores the spatial- (cross-) correlation of the data by estimating the factors. Meanwhile, it measures the temporal correlation (autocorrelation) by analyzing the residuals obtained by subtracting factors from the raw data.

3. High-Dimensional Factor Model Analysis

In this section, we further discuss the calculation of the high-dimensional factor models, so as to obtain the spatial indicator p and the temporal indicator b. In this process, free random variable techniques proposed in Burda’s work [20] are used to calculate the modeled spectral density. Then, Jensen-Shannon divergence is used to find the optimal parameter set ( p ^ , b ^ ) .

3.1. Empirical Spectral Distribution

According to Equation (3), p-level empirical residuals are generated as:
U ( p ) = X ˜ t L ( p ) F ( p )
where F ( p ) is a p × T matrix of p factors, each row of which is the j-th ( j = 1 , , p ) principal component of X ˜ t T X ˜ t , L ( p ) is an N × p matrix of factor loadings, estimated by multivariate least squares regression of X ˜ t on F ( p ) . Then the covariance matrix of p-level residuals is obtained as
Σ r e a l ( p ) = 1 T U ( p ) U ( p ) T
The subscript “real” indicates that Σ r e a l ( p ) is obtained from real data. The idea behind this is simple. We keep subtracting factors until the bulk spectrum from the residuals using real data becomes close to that from modeled residuals. The steps are summarized in Table 1.

3.2. Theoretical Spectral Distribution

Reference [21] provides a fundamental formulation to model noises. We assume there are cross- and auto-correlated structures within the noises such that U is represented as
U = A N 1 / 2 G B T 1 / 2
where G is an N × T matrix with independent identically distributed (i.i.d) Gaussian entries, and A N and B T are N × N and T × T symmetric non-negative definite matrices, representing cross- and auto-correlations, respectively. The theoretical spectral distribution with general A N and B T , however, is difficult to be obtained due to too much computation via Stieltjes transform. To simplify the modeling, referring to Yeo’s work [17], we make two assumptions here.
  • The cross-correlations are effectively removed by subtracting p principal components, where p is the true number of factors, and the residual U has sufficiently negligible cross-correlations: A N I N × N .
  • The autocorrelations of U are exponentially decreasing. That is, B T is in the form of exponential decays with respect to time lags, as: ( B T ) i , j = b i j , where i j is the distance between time i and time j, b < 1 . This is equivalent to modeling the residual as an autoregressive model, i.e., the AR(1) process: U i , t = b U i , t 1 + ξ i , t , where ξ i , t N ( 0 , 1 b 2 ) . When b = 0 , the AR(1) process degenerates into Gaussian white noise.
As a result, the correlation structure of the time-series residuals following autoregressive processes can be represented as ρ model ( b ) . This simplification enables us to calculate the modeled spectral density much more easily. In this paper, we calculate ρ model ( b ) using free random variable techniques proposed in [20]. For the autoregressive model U , by using the Moment Generating Function M M Σ m o d e l ( z ) , and its inverse relation to N-transform, we can derive the polynomial equation as Equation (8), by solving which we can obtain M Σ m o d e l ( z ) . Then, we can calculate the Green’s function G Σ m o d e l ( z ) by Equation (9). The theoretical spectral density ρ model ( b ) can be reconstructed from the imaginary part of G Σ m o d e l ( z ) by using the Sokhotsky’s formula as Equation (10).
The steps of calculating ρ model ( b ) are summarized as follows. For the key concepts and derivation details of the calculation process, please refer to Appendix A.
Steps of calculating ρ model ( b ) :
(1) Solve the polynomial equation for M M Σ mo d e l ( z ) ( a = 1 b 2 , c = N / T , z = λ + i ε ):
a 4 c 2 M 4 + 2 a 2 c ( ( 1 + b 2 ) z + a 2 c ) M 3 + ( ( 1 b 2 ) 2 z 2 2 a 2 c ( 1 + b 2 ) z + ( c 2 1 ) a 4 ) M 2 2 a 4 M a 4 = 0
(2) The Green’s Function G Σ m o d e l ( z ) can be obtained from the Moment Generating Function M Σ m o d e l ( z )
G Σ m o d e l ( z ) = M Σ m o d e l ( z ) + 1 z ,   f o r z 0
(3) Get the theoretical spectral density from the Green’s Function G Σ m o d e l ( z ) by using Sokhotsky’s formula:
ρ model ( b ) = 1 π lim ε 0 + Im G Σ m o d e l ( λ + i ε )

3.3. Distance Measure

The spectral distance measure D is used to find the optimal parameter set ( p ^ , b ^ ) by minimizing the distance between ρ real ( p ) and ρ model ( b ) . Since the empirical spectrum contains spikes, a distance measure which is sensitive to the presence of spikes should be given. This paper uses Jensen-Shannon divergence, which is a symmetric version of Kullback-Leibler divergence. Jensen-Shannon divergence is defined as follows:
D JS ( ρ real ρ model ) = 1 2 D KL ( ρ real ρ ) + 1 2 D KL ( ρ model ρ )
where ρ real and ρ model are probability densities, ρ = 1 2 ( ρ real + ρ model ) . Equations (12) and (13) are given to calculate the Kullback-Leibler divergence used in Equation (11).
D K L ( ρ real ρ ) = i ρ real i log ρ real i ρ i
D K L ( ρ model ρ ) = i ρ model i log ρ model i ρ i
where ρ i = ρ ( i h ) with eigenvalue grid size h. To deal with zero elements that may appear in ρ i , we use:
ρ ˜ i = α ρ i ε ρ i > 0 ρ i = 0
where ε is a small enough positive number. Denote the number of zero elements in ρ i as n u m , α = 1 n u m × ε . In this way, the probability density satisfies i ρ ˜ i = 1 . Note that the Kullback-Leibler distance becomes larger if one density has a spike at a point while the other is almost zero at the same point. It is sensitive to the presence of spikes.
Based on Section 3.1, Section 3.2 and Section 3.3, the spectrum of the PMU data is decomposed into two parts: spikes and a bulk. As we subtract p factors from data, then p spikes that correspond to the p largest eigenvalues are removed from the spectrum of the original data. Meanwhile, the parameter b controls the region of smaller eigenvalues. The parameters p and b give an overall description of the PMU measurements.

4. Case Studies

In this section, the proposed method is tested with both the simulated data from the IEEE 118 bus system and real 34-PMU data. The MATPOWER package is used to generate the simulated PMU data [22]. To mimic the normal fluctuations of active load, noise is added so that the signal-to-noise ratio (SNR) is set as 22 dB. In these simulations, three kinds of events, i.e., a short-circuit fault, a disconnection fault, and a generator tripping event, are considered as constituent components of a multiple event, respectively, whereas the magnitudes of bus voltages are used as raw data for analysis. The topology of the IEEE 118 bus system is shown in Figure 1.

4.1. Case Study with Simulated Data

In this set of cases, the simulated data contains 118 voltage measurements with 700 sampling points, respectively. The size of the moving split-window is set as 118 × 250 , i.e., X ^ t C 118 × 250 . Then, Equation (4) is used in each split-window to estimate the parameters p and b as p ^ and b ^ : the p ^ for the number of the constituent components in a multiple event, and the b ^ for the autocorrelation structure of the residuals. The idea behind this is as follows. First, raw voltage magnitude data in each split-window is normalized using Equation (2). Then, we keep subtracting different numbers of factors from the normalized voltage magnitude data using Equation (5) until the residuals become close to the autoregressive model with autoregressive rate b as we calculate in Equation (8). The Jensen-Shannon divergence in Equation (11) is utilized to measure how “close” the two residual spectral distributions are close to each other. The optimal parameter set ( p ^ , b ^ ) is finally obtained once the Jensen-Shannon divergence is minimized. Three cases are designed in this part to validate the proposed approach. In Section 4.1.1, Section 4.1.2 and Section 4.1.3, we set different numbers of constituent components in a multiple event, and the corresponding results are shown in Figure 2a–c. The experiments are repeated for 20 times and the results are averaged. We need to point out that the estimation of p and b begins at t s = 250 due to the length of the split-window. We amplify the value of b ^ for thirty times to make it obvious in Figure 2, i.e., b ^ 0 = b ^ × 30 .

4.1.1. Case 1—A Single Event

In Case 1, a single event, i.e., a short-circuit fault is set at Bus 64, which is shown in Table 2.
Figure 2a shows that:
  • During the sampling time t s = 250 499 (250 = 1 (the beginning of the signal) + 250 (length of the split-window) − 1). p ^ and b ^ remain steady.
  • At t s = 500, p ^ starts to decline to around 1. Also, b ^ declines slightly.
Actually, as can be seen from Table 2: no events occur in the system when p ^ keeps steady. Right at t s = 500, a short-circuit fault happens at Bus 64. Therefore, we can conduct the single event detection with the proposed approach. Moreover, in this case, for the split-window t [ 251 , 500 ] to t [ 451 , 700 ] , there exists a single event (a short-circuit fault at Bus 64 at t s = 500) and 1 factor, i.e., 1 event, p ^ 1 .

4.1.2. Case 2—A Multiple Event with Two Constituent Components

In this case, a multiple event is set with two constituent components. One is a short-circuit fault at Bus 64, and the other is a disconnection fault at the line connected by Bus 23 and Bus 24, which is shown in Table 3.
Figure 2b shows that:
  • During the sampling time t s = 250 499 , p ^ and b ^ remain steady, which is consistent with Table 3: no events occur.
  • At t s = 500, p ^ starts to decline and then keeps around 1 till t s = 599. For the split-window t [ 251 , 500 ] to t [ 350 , 599 ] , there exists a single event (the short-circuit fault at Bus 64 at t s = 500) and 1 factor, i.e., 1 event, p ^ 1 .
  • At t s = 600, p ^ starts to raise and then keeps around 2. For the split-window t [ 351 , 600 ] to t [ 451 , 700 ] , there exist two constituent components (a short-circuit fault at Bus 64 at t s = 500 and a disconnection fault at the line connected by Bus 23 and Bus 24 at t s = 600) and 2 factors, i.e., 2 constituent components, p ^ 2 .

4.1.3. Case 3—A Multiple Event with Three Constituent Components

In this case, a multiple event is set with three constituent components at Bus 64, the line connected by Bus 23 and Bus 24, and Bus 107, respectively, which is shown in Table 4.
Figure 2c shows that:
  • During the sampling time t s = 250 499 , p ^ and b ^ remain steady, which is consistent with Table 4: no events occur.
  • At t s = 500, p ^ starts to decline and then keeps around 1 till t s = 549. For the split-window t [ 251 , 500 ] to t [ 300 , 549 ] , there exists a single event (a short-circuit fault at Bus 64 at t s = 500) and 1 factor, i.e., 1 event, p ^ 1 .
  • At t s = 550, p ^ starts to raise and then keeps around 2 till t s = 599. For the split-window t [ 301 , 550 ] to t [ 350 , 599 ] , there exist two constituent components (a short-circuit fault at Bus 64 at t s = 500 and a disconnection fault at the line connected by Bus 23 and Bus 24 at t s = 550) and 2 factors, i.e., 2 constituent components, p ^ 2 .
  • At t s = 600, p ^ raises again and then keeps around 3. For the split-window t [ 351 , 600 ] to t [ 451 , 700 ] , there exist three constituent components (a short-circuit fault at Bus 64 at t s = 500, a disconnection fault at the line connected by Bus 23 and Bus 24 at t s = 550, and a generator tripping event at Bus 107 at t s = 600) and 3 factors, i.e., 3 constituent components, p ^ 3 .

4.1.4. More Discussions of p ^

The results of Section 4.1.1, Section 4.1.2 and Section 4.1.3 are summarized in Table 5. Based on the three cases, we can deduce that the estimated value of factors ( p ^ ) is equal to the number of constituent components in a multiple event ( n event ).
To validate the above conclusion, we explore the empirical spectral distribution (ESD) of the covariance matrix of X ˜ in Case 3. The ESD typically exhibits two aspects: a bulk and several spikes (i.e., the deviating eigenvalues). The bulk arises from noise or normal fluctuations, while the spikes are caused by faults.
Figure 3 shows that:
  • At t s = 400, no spikes (factors) are observed. It is noted that p ^ remains almost at 5 in Figure 2c when no events occur. The most likely explanation is that some normal fluctuations of active load create several weak factors. Our approach is sensitive to weak factors under normal operating conditions. However, this phenomenon does not interfere with the judgment of the number of constituent components in a multiple event, because the number of factors is stable under normal conditions, whereas when a multiple event occurs, the change of the number of factors has a regularity.
  • At t s = 520, one spike (factor) is observed. It is caused by the single event in the window, which is consistent with the results in Table 5.
  • At t s = 570, two spikes (factors) are observed. They are caused by the two constituent components of a multiple event in the window.
  • At t s = 620, three spikes (factors) are observed. They are caused by the three constituent components in the multiple event.
Note that p ^ is a spatial indicator that maps the events occurring in the system.

4.1.5. More Discussions of b ^

In Figure 3, we can see that the estimations of p ^ and b ^ are simultaneous and interdependent. The deviating eigenvalues are factors, whose number is estimated by the indicator p ^ , whereas the region of the bulk is closely related to b ^ . In other words, p ^ derives from the signal subspace, denoting the number of signals or events, whereas b ^ derives from the noise subspace, denoting the autocorrelation rate of residuals. The number of factors can be determined only if the bulk is well fitted by the theoretical autoregressive model with parameter b ^ .
For the PMU data structure, p ^ reveals the correlation (cross-correlation) in space, and meanwhile, b ^ tracks that in time (autocorrelation). Although the number of constituent components within a multiple event is contained in the spatial information, we can see that the parameter b ^ can also reflect the status of the system based on the results in the above results. Every time a constituent component of the multiple event occurs, b ^ changes together with p ^ and can act as a temporal indicator to assist judgment.

4.2. Case Study with Real Data

In this section, we evaluate the proposed algorithm with real 34-PMU data. In the following experiment, the real power flow data of a multiple event happened in the China power grids in 2013 is utilized as the raw data for analysis. The PMU number, the sampling rate and the total sampling time are 34, 50 Hz and 284 s, respectively. The multiple event happened from 65.4 s to 73.3 s. Figure 4a,b depict the three-dimensional power flow data under normal conditions and in the case of multiple events. It can be seen that the power flow is relatively steady in the normal state whereas changes irregularly when the multiple event occurs.
In this experiment, we use a window of size 34 × 400 (i.e., X ^ t C 34 × 400 ) and move it at each sampling point. In each split-window, we estimate the number of factors p ^ and the autoregressive rate of residuals b ^ of the power flow data as the synthetic voltage magnitude data we deal with in Section 4.1. Near the occurrence of the multiple event, p ^ and b ^ are estimated, and the indicators are shown in Figure 5. From the p ^ t and b ^ t curves, we can realize multiple event detection and analysis as follows:
  • From 60.00 s to 65.38 s, the values of p ^ and b ^ remain steady, indicating that no events occur during this period. It is noted that the value of p ^ stays around 7 rather than 5 as it is in the simulated data. One explanation is that even though noise with an SNR of 22 dB is added to the simulated data, it is still difficult to accurately mimic the fluctuations of real PMU data. The normal fluctuations of real power flow data cause some weak factors, which are detected by our algorithm.
  • At 65.40 s, p ^ decreases to 1 and remains at 1 until 69.68 s. One factor is detected during this period, indicating that the first constituent component is observed in the split-window.
  • From 69.70 s to 72.98 s, two factors are observed. We can speculate that at 69.70 s, the second constituent component of the multiple event occurs.
  • At 73.30 s, the value of p ^ increases by 1, indicating that the third constituent component of the multiple event occurs at this moment.
As the window continues to move, the number of constituent components in the window decreases, corresponding to the decrease of p ^ value from 73.38 s to 80.98 s. In addition, as a temporal indicator to describe the autocorrelation of residuals, b ^ decreases every time a new constituent component occurs. p ^ , together with b ^ , reveals the occurrence of the multiple event.

5. Comparisons and Discussions

In this section, we compare our algorithm with two frameworks for feature selection, i.e., deep learning and PCA. We first give a review of the frameworks, and then make further analysis and comparisons.

5.1. Comparison with Deep Learning

In recent years, deep artificial neural networks have undergone a rapid development in pattern recognition and machine learning. The deep learning algorithms can automatically select features from massive data sets. The most successful form of deep learning, is supervised learning. In a typical supervised deep-learning system, hundreds of millions of labelled examples are needed to train the deep neural networks [23]. However, the real data deficiency of multiple events, especially those with labels, has greatly hindered the practical application of the supervised learning technology. In addition, the interpretability and robustness of deep learning are still being questioned [24,25,26,27]. The undesirable behaviour of deep learning has brought about concerns for AI safety [28], making it difficult to be applied to power grids with high security requirements as a result.
In contrast, the proposed high-dimensional factor model is not only “label-free”, but also has no requirements for data quantity. Moreover, our method is deeply rooted in random matrix theory and has the advantage of transparency in that our results are provable. With the aid of Equation (7), our approach is capable of dealing with spatial-temporal correlation, which is a difficulty for many common algorithms. Using powerful capabilities of free probability, Equation (7) can be analyzed in a closed form so iterative procedures can be developed to search for the two fundamental parameters, i.e., p ^ and b ^ , as defined in Equation (4).

5.2. Comparison with Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a classical method for feature selection and dimensionality reduction. It transforms highly correlated variables in the datasets into a set of simplified, uncorrelated variables that best summarize the information in the raw data. These simplified, uncorrelated variables are called Principal Components (PCs). They are ordered and preserve most of the variation that was present in the original data [29]. For more details of the PCA algorithm, readers can refer to [29].
We compare our method with PCA using moving split-windows. That is, we first choose a threshold of cumulative variance contribution rate, and then the number of PCs is determined according to the empirically determined threshold defined as follows.
η k = j = 1 k λ j j = 1 m λ j > α
where λ j is the eigenvalue of the raw matrix X ^ t after subtracting mean values within each dimension. The eigenvalues are sorted in descending order and m is their total number. η k is the cumulative variance contribution rate of the front k PCs and α is the pre-defined threshold, α [ 0 % , 100 % ] . The idea behind this is simple, we keep adding the eigenvalues from large to small until the inequation (15) holds for the first time, and the k is the number of PCs we need.
Apparently, the number of PCs is closely related to the predetermined cumulative variance contribution rate α . In this experiment, we choose different cumulative variance contribution rates as shown in Figure 6. Real 34-PMU data in Section 4.2 is utilized for analysis. The size of the split-window is set the same as in Section 4.2, i.e., 34 × 400 , so that the multiple events are the same in each split-window of the two comparative experiments, and the two methods can be compared directly.
Figure 7 is a top view of Figure 6, and the number of principal components is shown by the color bar on the right side. The two figures indicate that the cumulative variance contribution rate has a significant influence on the number of principal components. The threshold, however, is set empirically in advance. Furthermore, Figure 7 shows that the number of PCs determined by PCA cannot effectively reflect the number of constituent components in the multiple event no matter what cumulative variance contribution rate we choose.
In contrast, our method can determine the number of factors without any pre-defined conditions and thus outperforms the moving split-window PCA. Both spatial and temporal information are taken into account and the respective parameters p ^ and b ^ are selected automatically to detect and analyze the multiple events.

6. Conclusions

Based on high-dimensional factor models, this paper proposes a data-driven method for multiple event detection and analysis.Jointly considering both spatial and temporal correlation, this approach is able to reveal (or even decompose) a multiple event: its constituent components and their respective details, e.g., occurring timings. This joint consideration is made possible by the recent advance in random matrix theory-based statistical methods. In particular, following [17], we model the joint correlation using Equation (7). This model is very difficult to solve analytically. Using free random variables, the analytical solution is possible. Our paper exploits this mathematical trick.
The proposed method is purely data-driven, requiring no prior knowledge of the network topology or parameters, which enables it to adapt to topology changes. Both simulated data and real PMU data is used to verify the effectiveness of the algorithm. In the future work, more extensions are possible. For example, more general modeling for noise can be employed. If we consider the auto regressive and moving average model ARMA(1,1) process, we have up to 6-th order polynomial equations [20].

Author Contributions

Conceptualization, R.C.Q.; methodology, Z.L.; software, F.Y.; validation, H.Y.; formal analysis, F.Y.; resources, R.C.Q.; writing—original draft preparation, Y.F.; writing—review and editing, X.H.; visualization, F.Y.; supervision, R.C.Q.; project administration, R.C.Q.; funding acquisition, R.C.Q.

Funding

This work is supported by the National Key R&D Program of China under Grant 2018YFF0214705 and N.S.F of China under Grant No. 61571296.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Key Concepts and Derivation Details

We summarize the key concepts and derivation details of free random variable techniques that we employ to derive ρ model ( b ) in Section 3.2. First, consider the relationship between the Green’s Function ( G H ( z ) ) and the spectral density ( ρ H ( λ ) ).
Definition A1.
The Green’s Function (or Stieltjes Transform).
G H ( z ) = ρ H ( λ ) z λ d λ
where z C , Im ( z ) > 0 , ρ H ( λ ) is the spectral density of the random Hermitian matrix H R N × N and is defined as ρ H ( λ ) = 1 N k = 1 N δ ( λ λ k ( H ) ) , where λ k ( H ) ( k = 1 , , N ) denote the N eigenvalues of H .
ρ H ( λ ) can be reconstructed from the imaginary part of the Green’s Function G H ( z ) .
ρ H ( λ ) = 1 π lim ε 0 + Im G H ( λ + i ε )
Then, we explore the relationship between the Green’s Function and the Moment Generating Function.
Definition A2.
Moment.
m n = ρ H ( λ ) λ n d λ
where m n is the n-th moment of ρ H ( λ ) .
Definition A3.
Moment Generating Function.
G H ( z ) = n 0 m n z n + 1
M H ( z ) = n 1 m n z n + 1
Therefore, the relationship between G H ( z ) and M H ( z ) is
M H ( z ) = z G H ( z ) 1
The N-transform is the inverse transform of the Moment Generation Transform.
Definition A4.
N-transform.
M H ( N H ( z ) ) = N H ( M H ( z ) ) = z
Now, recall the residual U defined in Equation (7). Its covariance matrix Σ mo d e l can be expressed as follows.
Σ mo d e l = 1 T U U T = 1 T A N 1 / 2 G B T G T A N 1 / 2
The N-transform of Σ mo d e l can be derived as
N Σ mo d e l ( z ) = N 1 T A N 1 / 2 G B T G T A N 1 / 2 ( z ) = N 1 T G B T G T A N ( z ) = z 1 + z N 1 T G B T G T ( z ) N A N ( z ) = z 1 + z N 1 T G T G B T ( r z ) N B T ( z ) = z 1 + z r z 1 + r z N 1 T G T G ( r z ) N B T ( r z ) N A N ( z ) = r z N B T ( r z ) N A N ( z )
where free random variable (FRV) multiplication law and the cyclic property of trace are used. Note that N 1 T G T G ( z ) = ( 1 + z ) ( r + z ) z .
Using the Moment Generating Function M M Σ mo d e l ( z ) and its inverse relation to N-transform, we can obtain
z = r M N B T ( r M ) N A N ( M )
For the AR(1) process we assume in Section 3.2, the cross-correlation matrix is A N I N × N . Then, N A ( z ) = N I ( z ) = 1 + 1 / z . Therefore, Equation (A10) can be written as
z = r M N B T ( r M ) ( 1 + 1 / M ) = r ( 1 + M ) N B T ( r M ) z r ( 1 + M ) = N B T ( r M ) r M = M B T ( z r ( 1 + M ) )
Then we need to find the Moment Generation Function of B T . Recall that the auto-covariance matrix of AR(1) process can be written as ( B T ) i , j = b i j in Section 3.2. Using its Fourier-transform, we can derive M B T as
M B T ( z ) = 1 1 z 1 ( 1 + b 2 ) 2 1 b 2 z
Therefore, we can obtain Equation (8).

References

  1. Dagle, J.E. Data management issues associated with the August 14, 2003 blackout investigation. In Proceedings of the Power Engineering Society General Meeting, Denver, CO, USA, 6–10 June 2004; Volume 2, pp. 1680–1684. [Google Scholar]
  2. Lai, L.L.; Zhang, H.T.; Lai, C.S.; Xu, F.Y.; Mishra, S. Investigation on July 2012 Indian blackout. In Proceedings of the International Conference on Machine Learning and Cybernetics, Lanzhou, China, 13–16 July 2014; pp. 92–97. [Google Scholar]
  3. Roman, R.C.; Precup, R.E.; David, R.C. Second order intelligent proportional-integral fuzzy control of twin rotor aerodynamic systems. Proc. Comput. Sci. 2018, 139, 372–380. [Google Scholar] [CrossRef]
  4. Han, J.; Wang, H.; Jiao, G.; Cui, L.; Wang, Y. Research on active disturbance rejection control technology of electromechanical actuators. Electronics 2018, 7, 174. [Google Scholar] [CrossRef]
  5. Rafferty, M.; Liu, X.; Laverty, D.M.; Mcloone, S. Real-time multiple event detection and classification using moving window PCA. IEEE Trans. Smart Grid 2016, 7, 2537–2548. [Google Scholar] [CrossRef]
  6. Bykhovsky, A.; Chow, J.H. Power system disturbance identification from recorded dynamic data at the Northfield substation. Int. J. Electr. Power Energy Syst. 2003, 25, 787–795. [Google Scholar] [CrossRef]
  7. Wang, W.; He, L.; Markham, P.; Qi, H. Detection, recognition, and localization of multiple attacks through event unmixing. In Proceedings of the IEEE International Conference on Smart Grid Communications, Vancouver, BC, Canada, 21–24 October 2013; pp. 73–78. [Google Scholar]
  8. Song, Y.; Wang, W.; Zhang, Z.; Qi, H.; Liu, Y. Multiple event detection and recognition for large-scale power systems through cluster-based sparse coding. IEEE Trans. Power Syst. 2017, 32, 4199–4210. [Google Scholar] [CrossRef]
  9. Huang, Z.; Wang, C.; Ruj, S.; Stojmenovic, M.; Nayak, A. Modeling cascading failures in smart power grid using interdependent complex networks and percolation theory. In Proceedings of the Industrial Electronics and Applications, Melbourne, VIC, Australia, 19–21 June 2013; pp. 1023–1028. [Google Scholar]
  10. Soltan, S.; Mazauric, D.; Zussman, G. Cascading failures in power grids: analysis and algorithms. In Proceedings of the 5th International Conference on Future Energy Systems, Cambridge, UK, 11–13 June 2014; pp. 195–206. [Google Scholar]
  11. He, X.; Ai, Q.; Qiu, R.C.; Huang, W.; Piao, L.; Liu, H. A big data architecture design for smart grids based on random matrix theory. IEEE Trans. Smart Grid 2017, 8, 674–686. [Google Scholar] [CrossRef]
  12. Chu, L.; Qiu, R.C.; He, X.; Ling, Z.; Liu, Y. Massive streaming PMU data modeling and analytics in smart grid state evaluation based on multiple high-dimensional covariance tests. IEEE Trans. Big Data 2016, 4, 55–64. [Google Scholar] [CrossRef]
  13. He, X.; Qiu, R.C.; Ai, Q.; Chu, L.; Xu, X.; Ling, Z. Designing for situation awareness of future power grids: An indicator system based on linear eigenvalue statistics of large random matrices. IEEE Access 2016, 4, 3557–3568. [Google Scholar] [CrossRef]
  14. Qiu, R.C.; Antonik, P. Smart Grid and Big Data: Theory and Practice; Wiley: Hoboken, NJ, USA, 2017. [Google Scholar]
  15. Chakhchoukh, Y.; Vittal, V.; Heydt, G.T. PMU based state estimation by integrating correlation. IEEE Trans. Power Syst. 2014, 29, 617–626. [Google Scholar] [CrossRef]
  16. Hassanzadeh, M.; Evrenosoğlu, C.Y.; Mili, L. A short-term nodal voltage phasor forecasting method using temporal and spatial correlation. IEEE Trans. Power Syst. 2016, 31, 3881–3890. [Google Scholar] [CrossRef]
  17. Yeo, J.; Papanicolaou, G. Random matrix approach to estimation of high-dimensional factor models. arXiv, 2016; arXiv:1611.05571. [Google Scholar]
  18. Forni, M.; Giovannelli, A.; Lippi, M.; Soccorsi, S. Dynamic factor model with infinite-dimensional factor space: Forecasting. J. Appl. Econ. 2018, 33, 625–642. [Google Scholar] [CrossRef] [Green Version]
  19. Sun, Z.; Liu, X.; Wang, L. A hybrid segmentation method for multivariate time series based on the dynamic factor model. Stoch. Environ. Res. Risk Assess. 2017, 31, 1291–1304. [Google Scholar] [CrossRef]
  20. Burda, Z.; Jarosz, A.; Nowak, M.A.; Snarska, M. A random matrix approach to VARMA processes. New J. Phys. 2010, 12, 075036. [Google Scholar] [CrossRef] [Green Version]
  21. Lixin, Z. Spectral Analysis of Large Dimentional Random Matrices. Ph.D. Thesis, National University of Singapore, Singapore, 2007. [Google Scholar]
  22. Zimmerman, R.D.; Murillo-Sánchez, C.E.; Thomas, R.J. MATPOWER: Steady-state operations, planning, and analysis tools for power systems research and education. IEEE Trans. Power Syst. 2011, 26, 12–19. [Google Scholar] [CrossRef]
  23. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar] [CrossRef] [PubMed]
  24. Dumitru, M.A.K.R.M.; Pieter, E.B.K.S.D.; Kindermans, J.; Schütt, K.T. Learning how to explain neural networks: Patternnet and patternattribution. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  25. Ribeiro, M.T.; Singh, S.; Guestrin, C. Why should i trust you? Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
  26. Lee, K.; Lee, H.; Lee, K.; Shin, J. Training confidence-calibrated classifiers for detecting out-of-distribution samples. arXiv, 2017; arXiv:1711.09325. [Google Scholar]
  27. Liang, S.; Li, Y.; Srikant, R. Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv, 2017; arXiv:1706.02690. [Google Scholar]
  28. Amodei, D.; Olah, C.; Steinhardt, J.; Christiano, P.; Schulman, J.; Mané, D. Concrete problems in AI safety. arXiv, 2016; arXiv:1606.06565. [Google Scholar]
  29. Jolliffe, I. Principal Component Analysis; Springer: Berlin, Germany, 2011. [Google Scholar]
Figure 1. Topology of the standard IEEE 118-bus system.
Figure 1. Topology of the standard IEEE 118-bus system.
Energies 12 01360 g001
Figure 2. Case Study with Simulated Data. The estimated number of constituent components in a multiple event, i.e., p ^ , is equal to the actual number of constituent components, i.e., nevent that we assume in a multiple event.
Figure 2. Case Study with Simulated Data. The estimated number of constituent components in a multiple event, i.e., p ^ , is equal to the actual number of constituent components, i.e., nevent that we assume in a multiple event.
Energies 12 01360 g002
Figure 3. The ESD of the covariance matrix of X ˜ at different sampling points in Case 3. The bulk can be fitted well by our built model with b ^ . The numbers of spikes are different when different numbers of constituent components occurring within a multiple event, which is consistent with the results in Table 5.
Figure 3. The ESD of the covariance matrix of X ˜ at different sampling points in Case 3. The bulk can be fitted well by our built model with b ^ . The numbers of spikes are different when different numbers of constituent components occurring within a multiple event, which is consistent with the results in Table 5.
Energies 12 01360 g003
Figure 4. The real 34-PMU power flow data. (a) The real 34-PMU power flow data under normal conditions; (b) The real 34-PMU power flow data when the multiple event occurs.
Figure 4. The real 34-PMU power flow data. (a) The real 34-PMU power flow data under normal conditions; (b) The real 34-PMU power flow data when the multiple event occurs.
Energies 12 01360 g004
Figure 5. Indicators of the real 34-PMU power flow data around the occurrence of the multiple event.
Figure 5. Indicators of the real 34-PMU power flow data around the occurrence of the multiple event.
Energies 12 01360 g005
Figure 6. The number of principal components (PCs) of the real 34-PMU power flow data when different cumulative variance contribution rates are selected under the framework of PCA.
Figure 6. The number of principal components (PCs) of the real 34-PMU power flow data when different cumulative variance contribution rates are selected under the framework of PCA.
Energies 12 01360 g006
Figure 7. A top view of Figure 6. The number of principal components is shown by the color bar on the right side of the figure.
Figure 7. A top view of Figure 6. The number of principal components is shown by the color bar on the right side of the figure.
Energies 12 01360 g007
Table 1. Steps of calculating ρ real ( p ) .
Table 1. Steps of calculating ρ real ( p ) .
(1) Calculate F ( p ) : each row of which is the j-th ( j = 1 , , p ) principal component of X ˜ t T X ˜ t ; denote as: F ( p ) = ( f 1 , f 2 , , f p ) T .
(2) Calculate least squares regression of X ˜ t on F ( p ) : L ( p ) = X ˜ t F ( p ) T .
(3) Calculate the p-level residual: U ( p ) = X ˜ t L ( p ) F ( p ) .
(4) Calculate the covariance matrix of the p-level residual: Σ r e a l ( p ) = 1 T U ( p ) U ( p ) T .
(5) Calculate the empirical spectral distribution of Σ r e a l ( p ) .
Table 2. A single event assumed in Case 1.
Table 2. A single event assumed in Case 1.
BusSampling TimeAssumed Events
64 t s = 500 A short-circuit fault
Table 3. A multiple event with two constituent components assumed in Case 2.
Table 3. A multiple event with two constituent components assumed in Case 2.
BusSampling TimeAssumed Events
64 t s = 500 A short-circuit fault
23, 24 t s = 600 A disconnection fault at the line connected by Bus 23 and Bus 24
Table 4. A multiple event with three constituent components assumed in Case 3.
Table 4. A multiple event with three constituent components assumed in Case 3.
BusSampling TimeAssumed Events
64 t s = 500 A short-circuit fault
23, 24 t s = 550 A disconnection fault at the line connected by Bus 23 and Bus 24
107 t s = 600 A generator tripping event
Table 5. Relationship between the number of constituent components in a multiple event ( n event ) and the number of factors ( p ^ ).
Table 5. Relationship between the number of constituent components in a multiple event ( n event ) and the number of factors ( p ^ ).
CaseSplit-Window n event p ^
1 [ 251 , 500 ] [ 451 , 700 ] 11
2 [ 251 , 500 ] [ 350 , 599 ] 11
2 [ 351 , 600 ] [ 451 , 700 ] 22
3 [ 251 , 500 ] [ 300 , 549 ] 11
3 [ 301 , 550 ] [ 350 , 599 ] 22
3 [ 351 , 600 ] [ 451 , 700 ] 33

Share and Cite

MDPI and ACS Style

Yang, F.; Qiu, R.C.; Ling, Z.; He, X.; Yang, H. Detection and Analysis of Multiple Events Based on High-Dimensional Factor Models in Power Grid. Energies 2019, 12, 1360. https://doi.org/10.3390/en12071360

AMA Style

Yang F, Qiu RC, Ling Z, He X, Yang H. Detection and Analysis of Multiple Events Based on High-Dimensional Factor Models in Power Grid. Energies. 2019; 12(7):1360. https://doi.org/10.3390/en12071360

Chicago/Turabian Style

Yang, Fan, Robert C. Qiu, Zenan Ling, Xing He, and Haosen Yang. 2019. "Detection and Analysis of Multiple Events Based on High-Dimensional Factor Models in Power Grid" Energies 12, no. 7: 1360. https://doi.org/10.3390/en12071360

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop