1. Introduction
The power system is one of the most complicated physical networks in the world. Almost all electricity demands are served through power systems. It is of utmost importance to study the fundamentals of power systems including power flow analysis. Power flow model is involved in a great number of problems and is incorporated in various application models. For instance, it is incorporated into the formulations of state estimation [
1,
2], security-constrained economic dispatch [
3,
4,
5], security-constrained unit commitment [
6,
7], transmission maintenance scheduling [
8], and transmission expansion planning [
9,
10,
11,
12]. Hence, power flow analysis is remarkably important for power system planning as well as power system operations. Currently, the two most popular power flow models are the full AC model and the linearized DC model.
The full AC power flow model, following Kirchhoff’s circuit laws and Ohm’s law, can accurately represent the physical power system. This model captures all electric variables of interest, including active power, reactive power, phase angle and voltage magnitude. However, it is not uncommon to observe the divergence of AC power flow problems even with commercial software such as PSS/E and DSATools. In addition, incorporating AC power flow model into an optimization model will make it impossible to solve.
Though a number of algorithms were proposed in literature to improve the convergence performance of the AC power flow model [
13,
14], it still remains to be an unresolved issue. The computational complexity due to its non-linearity and non-convexity makes it impossible to use the AC power flow model in a variety of optimization problems. For instance, though the classical formulation for AC model based economic dispatch was first created by Carpentier in 1962 [
15], no robust and reliable algorithm has been developed since then to solve the problem in a timely manner due to its non-linearity, non-convexity, and large-scale features. As a result, the industry today still uses a linearized DC power flow model that ignores reactive power and voltage magnitude.
To relieve computational burden, the simplified traditional DC power flow model is adopted when only active power and phase angle are of concern. As the DC model is simple, efficient, and reliable, it is widely used in the power industry and many power system applications [
16,
17,
18,
19,
20]. For instance, instead of the AC model, the DC model is employed in the day-ahead energy markets and real-time energy markets.
The DC model is a good approximation of the AC model in terms of active power solution for high voltage transmission networks, of which the X/R ratios are typically very high. However, DC power flow may fail to perform properly in some scenarios. For instance, the work in [
21] shows that DC model based cascading failure simulators fail to capture the power system behaviors in several circumstances. Though the average error for active power is limited to 5%, significant errors are still observed on several individual lines [
22]. Three DC power flow models are investigated in [
23]. It is concluded that the α-matching model has the most accurate results. However, α-matching method is hot-start, and thus its utilization is only restricted to near-real-time applications with the knowledge of initial system status. Furthermore, DC model cannot be utilized in the cases that voltage magnitude and reactive power are of interest.
Though the full AC power flow model is accurate, its computational complexity and unstable characteristics restrict its utilization. DC power flow model can reduce the computational burden significantly, but it may suffer from an inaccuracy issue and report no information regarding reactive power and voltage magnitude. Therefore, for situations when the reactive power and voltage information are needed while the solution time is limited, a fast linearized AC (LAC) power flow model that can capture reactive power and voltage magnitude is desired.
Three linear-programming AC power flow models, derived with polyhedral relaxation and Taylor series, are proposed in [
24]. Numerical simulation demonstrates the effectiveness of the proposed models. Another linear approximation of the AC model is presented in [
25]. The error of this approximation is less than 6% for voltage magnitude. It is worth noting that the approximation error of the proposed model in this paper is only about 1%. A linear relaxation of AC power flow model using polynomial optimization is proposed in [
26]. The linearized model is applied to transmission planning problem, and the case studies demonstrate its capability of obtaining approximate solutions in a reasonable time. However, the solution is not checked and compared with the accurate full AC model. An iterative linear power flow method proposed in [
27] seems to be accurate and fast. However, this iterative method may fail to converge. All the above work [
24,
25,
26,
27] is demonstrated on small-scale standard test cases only, and further efforts are needed to investigate the model accuracy on large-scale practical power systems.
To solve the scalability and accuracy issues related to AC power flow linearization, a data-driven linearized AC (DLAC) power flow model is proposed in this paper. This model captures all system state variables including active power, phase angle, reactive power and voltage magnitude. First, a regular linearization of full AC power flow model is conducted by ignoring higher order terms. Then, coefficients are assigned to the terms that remain in the model. The regression analysis technique is performed to determine those coefficients, which can reflect the system’s typical or recent status. The philosophy behind this idea is that the system condition does not change significantly in a short timeframe such as a day. For instance, generator voltage setting points typically do not change or change within very narrow ranges, which indicates that voltage magnitude for other neighboring buses would also not change significantly. Another noticeable fact is that the voltage magnitudes for high voltage transmission networks are typically higher than one per unit.
The proposed enhanced DLAC power flow model can (i) substantially reduce the error associated with active power flows as compared with the traditional DC power flow model; (ii) and obtain much more accurate voltage profile and significantly improve reactive power flow solutions as compared with the traditional linearized AC model. In addition, the proposed DLAC model can be easily incorporated in various power system applications such as day-ahead unit commitment or real-time economic dispatch, which can substantially improve their performance. The effectiveness of the proposed DLAC model is demonstrated with a practical U.S. power system.
The rest of this paper is organized as follows.
Section 2 presents an overview on the power flow model.
Section 3 introduces the regression analysis technique.
Section 4 discusses the regular LAC model and the proposed enhanced DLAC model. Case studies are presented in
Section 5. Finally,
Section 6 concludes the paper.
2. AC Power Flow Model
Power flow studies are the basis of power system analysis [
28]. The per unit system and single-line diagram are usually used for simplification. In the power flow studies, the following assumptions are typically made:
The system is three-phase balanced and thus, only the positive sequence network is of concern.
The Pi-equivalent circuit model can accurately represent the transmission network.
Individual generation and load are known except for the generation at the slack bus.
Given these assumptions, the following state variables can be obtained by solving an AC power flow problem through computer programs:
voltage magnitude and phase angle at each bus,
active power and reactive power generations at each bus,
active power flow and reactive power flow in both directions on each branch,
loss on each branch.
Figure 1 shows the single-line diagram of a two-terminal circuit. A power system network consists of a number of those 2-terminal circuits. Note that
denotes active power while
stands for reactive power. Normally, the power flowing out of one end-bus does not equal to the power flowing into the other bus because of (i) the reactive power produced by transmission lines, and (ii) the losses on the branch connected to the two end buses, which means that
and
.
The power flow equations for a branch are given below.
where
and
denote the active power and reactive power on branch
k flowing from bus i to bus
j respectively,
denotes the phase angle difference across this branch, while
and
are the voltage magnitude of bus
i and bus
j, respectively.
and
denote the series admittance and parallel susceptance of Pi-equivalent circuit respectively.
and
are the real part and imaginary part of
respectively, and they can be calculated with the following equation,
where
,
, and
denote line impedance, resistance and reactance of the Pi-equivalent circuit respectively.
Other important equations for the power flow studies are the nodal power balance constraints as shown in Equations (4) and (5).
where
denotes the set of buses that are directly connected to bus
i when bus
i is designated as the sending end, and
denotes the set of buses that are directly connected to bus
i when bus
i is designated as the receiving end.
and
represent the total active power and total reactive power produced by the generators at bus
i, respectively,
and
are the total active power load and total reactive power load at bus
i respectively.
and
denote the branch active power and reactive power flowing from bus
i to bus
h respectively.
3. Regression Analysis
Regression analysis is a widely used statistical technique for estimating the relationships among variables and determining the model for those variables [
29,
30]. It typically involves a dependent variable that is also referred to as a response variable, and one or multiple independent variables that are often called as regressors or predictors. The most popular method is the least squares approximation. A general multiple linear regression model with
k regressors is defined as follows:
where
is the
jth regressor,
denotes the coefficient, and
denotes the error.
Provided a sample set of
n observations, all parameters
can be determined as
through regression analysis. The estimated value
and residual
for the
ith observation in the sample space are defined as Equations (7) and (8) respectively.
where
is the observed value of the
ith observation. Residual
denotes the difference between the observed value and the estimated or fitted value while error
represents the discrepancy between the observed value and the true value.
After creating a regression model, it is important to (i) conduct model adequacy checking to ensure the model fits the data and (ii) perform model validation to demonstrate the effectiveness of the regression model.
3.1. Model Adequacy Checking
One statistical metric to evaluate the overall model adequacy is the Coefficient of Determination, which is a percentage number denoted as
.
quantifies how good the regression model is and how much variation can be explained with the regression model. In other words,
is the proportion of variation in the response variable explained and predicted by the regressors. 100% indicates that the regression model explains all the variability around the mean.
is defined in the equation below,
where
denotes the residual sum of squares and
denotes the total sum of squares.
Residual analysis can effectively discover several types of model inadequacies and measure how good the regression model fits the data [
30]. Scaled residuals such as standardized residuals may be a better analysis technique to find outliers and analyze the regression model. The standardized residuals with zero mean and approximately unit variance is defined in the equation below,
where
is the residual mean square that estimates the average variance of residuals.
Another commonly used scaled residual is the R-student residual, defined in Equation (11) shown below. R-student residuals are often used since their variance is constant.
where
is an element of the hat matrix and
denotes the estimate of variance with the
ith observation being moved from the dataset.
A plot of the residuals against the fitted values can help detect several types of model inadequacy. If the residuals in the plot are contained in a horizontal band, then, there is no indication of model deficiency. If the residuals against fitted values form a pattern such as a funnel, double bow, or nonlinear, then it indicates that defects may exist in the regression model and further investigation is required.
3.2. Model Validation
In the regression analysis domain, the best fitted model to the sample dataset may not accurately describe the relationship between variables. One key concern is the danger of extrapolation. Though a regression model is often used for extrapolation, it does not apply to the power flow analysis since the power system status does not change significantly especially in a short time frame. The case studies section in this paper demonstrate that the proposed regression model for linearized power flow equations has very similar performance in different system scenarios.
Multicollinearity may occur when regressors are highly linearly dependent and it has serious negative effects on the regression model. The regression coefficients may be poorly estimated when multicollinearity exists, which may result in inaccuracy of the regression model. One popular technique to detect multicollinearity is variance inflation factor (VIF). VIF measures how much variance of the estimated regressor coefficient is inflated. Each regressor in the regression model corresponds to one VIF value. This will be very useful to determine whether that regressor is involved in multicollinearity. High VIF indicates the associated regressor may have poor coefficient. VIFs below 3 suggests multicollinearity does not exist [
31]. A VIF of one means there is no correlation between the associated regressor and the other regressors.
5. Case Studies
The proposed data-driven DC model and data-driven linearized AC model are tested against the practical Tennessee Valley Authority (TVA) system and compared with the traditional full AC power flow model and simplified DC power flow model. TVA is a U.S. federally owned corporation that covers a population of about 10 million people [
35].
This system has 1779 buses and 2301 branches. There are 72 consecutive hourly cases that covers 3 days’ scenarios [
36]; they are used as the test cases in this work. They are referred to as hour 1 case, hour 2 case, …, and hour 72 case in this paper. AC power flow simulation is conducted on hour 1 case to obtain the system status that is used as the training data for regression analysis: the data used for determining the coefficient for data-driven DC model are line reactance, line active power flow and phase angle difference across the line; the data used for determining the coefficients for data-driven linearized AC model include line reactance and susceptance, line active power flow and reactive power flow, phase angle difference across the line, and voltage magnitudes at two end buses. The other cases are used to validate the proposed models. In this work, R is used as the statistical tool to perform regression analysis [
37].
5.1. Data-Driven DC Model
With the power flow results obtained from the full AC power flow simulation on hour 1 case, regression analysis determines the coefficient in Equation (18) to be 1.12, which is above one unit. This is consistent with the fact that the average voltage magnitude for the same hour 1 case is 1.04 which is also above one unit.
The coefficient of determination for this regression model is 0.9964. This indicates that the regression model, , can explain 99.64% of the variation of the response variable . VIF does not apply to regression model with one single regressor. Thus, the DDC model won’t have any multicollinearity or overfitting issues as it has only one single regressor.
Figure 2 shows the plots of different types of residuals against the fitted values of branch active power flows for the hour 1 case. The residuals in
Figure 2a–c correspond to the original residuals, standardized residuals, and R-student residuals, respectively. Note that the residual scale for
Figure 2b,c is about 10 times larger than
Figure 2a since
Figure 2b,c show scaled residuals while
Figure 2a does not. In
Figure 2a, majority of the original residuals reside within the range of [−0.2, 0.2] and very few of them goes beyond the boundary of [−0.5, 0.5]. This indicates the errors of the regression model are within an acceptance range. Similarly,
Figure 2b,c show that the residuals are mostly located within 3 units of the standard deviation. In conclusion,
Figure 2 illustrates the effectiveness of the DDC power flow model.
Note that
in Equation (18) is mathematically equivalent to
in Equation (12); in other words, the proposed DDC model will achieve the same
while it can improve the solution for
. The power flow solutions obtained from the DC model and the DDC model for the TVA system condition represented by hour 1 case are presented in
Table 1. The branch active power
P calculated from both models are the same while the improvement on
is significant. The improvement for branches with flows of over 50 MW is more than 50% on average. As shown in
Table 2, very similar results are observed for hour 2 case, which represents a different scenario with the one used to train the DDC model. This further demonstrates that the proposed DDC model can substantially improve
as compared to the traditional DC model.
5.2. Data-Driven Linearized AC Model
In the DLAC model, two separate regression models are built for branch active power and reactive power respectively. The results of coefficients are presented in
Table 3. All five coefficients deviate from the value of one that is used for the regular LAC model. The 95% confidence interval for each regression coefficient is very narrow, which indicates that the coefficients are very stable and accurate for the given dataset or system conditions.
As presented in
Table 4, the coefficients of determination
are 99.74% and 98.71% for the two regression models respectively. This shows that both regression models can explain almost all variance of the branch flows to their means.
Table 4 shows the VIFs for the regressors of both regression models are all very small, well below 3, which means that multicollinearity has little effect on the regression models and there is no overfit issue.
Table 5 shows that the residual mean of the regression model
Q is about five times higher than that of the regression model
P. This indicates the proposed DLAC model is more accurate for branch active power than branch reactive power.
Table 6 shows the ANOVA analysis for regression model
P, which are used for testing the significance of regression. As the last column indicates, the possibility of the two regressors being insignificance to the response is negligible. In other words, both the terms
and
are necessary in regression model
P. This implies that DC power flow models with only one term
has obvious room for accuracy improvement. This is consistent with the comparison between
Table 1 and
Table 7. The active power solutions obtained from the LAC and DLAC are very similar, but they are 20% more accurate as compared to the solution obtained with the DC and DDC models.
Figure 3 shows the scatter plot for hour 1 case, which indicates the fitted branch active power flows of the proposed DLAC model, are closely in line with the solutions obtained from the full AC model.
Table 7,
Table 8 and
Table 9, show the statistical results of branch active power, reactive power and bus voltage obtained with DLAC and LAC models on hour 1 case respectively. It is observed that (i) the proposed DLAC model can improve branch reactive power accuracy by 34.5% on branches with flows exceeding 10 MVA and (ii) improve voltage solution by 35.0% against the regular LAC model. As shown in
Table 10,
Table 11 and
Table 12, very similar observations are made from the results by applying both models to hour 2 case. Thus, it is concluded that the proposed DLAC model can significantly improve reactive power accuracy against the regular LAC model, and the active power derived from the LAC/DLAC model is more accurate than the DC/DDC model.
It is important to analyze how much improvement the proposed DLAC model can achieve over the conventional LAC model in terms of branch complex power flow that is used to monitor line loading level against line thermal capacity limit. To avoid the effects of small amount of branch flows, the following analysis focuses on branches that have flows of at least 10 MVA. With this filter, the statistics for hour 1 case and hour 2 case are presented in
Table 13. For hour 1 case, the average branch flow error is 5.62% for LAC while it is only 4.65% for DLAC, which corresponds to 17.2% model improvement of DLAC over LAC. For hour 2 case, the average line flow error is 4.77% for LAC while it is 3.85% for DLAC, which corresponds to 19.1% improvement. The sum of absolute deviation in branch complex power flow (SADCP) is also compared for different models. For the traditional LAC model on hour 1 case, the SADCP is 8781 MVA over 1876 branches with an average flow error of 4.68 MVA per branch. With DLAC, the SADCP drops by 16.0% down to 7379 MVA, which correspond to 3.93 MVA per line on average. For hour 2 case, the SADCP is 7238 MVA over 1864 branches with an average flow error of 3.88 MVA per branch using LAC; with DLAC, the SADCP drops by 16.4% down to 6047 MVA, which corresponds to 3.24 MVA per line. Thus, it is concluded that the proposed DLAC model can substantially reduce the branch complex power flow error.
Two power flow simulations based on the regular LAC model and the proposed DLAC model are conducted on hour 1 case. As compared to the full AC model, the branch complex flow errors in percent for both LAC and DLAC are calculated and presented in
Figure 4 where branches with less than 10 MVA flow and branches with flow errors less than 5% for LAC model are removed in order to clearly show the performance difference between the regular LAC model and the proposed DLAC model. There are 683 branches in
Figure 4 and they are ordered based on flow errors of the proposed DLAC model. It is clearly observed from
Figure 4 that the complex power flow errors of DLAC are well lower than LAC for most branches. To verify the proposed DLAC model on a different system operating condition, similar simulations are performed on hour 2 case representing the system scenario of the next following hour and the results are shown in
Figure 5. Very similar observations can be made from
Figure 5, which validates the proposed DLAC model. It is concluded the proposed DLAC model can substantially improve the regular LAC model with regression analysis.
With above analysis, it is demonstrated that the proposed data-driven linearized AC power flow model can largely enhance the system reactive power profile and voltage profile for the training case on which regression analysis is performed and the case of the very next following hour. Moreover, the proposed DLAC model and the traditional LAC model are tested on 70 more consecutive cases. The results are shown in
Table 14 and
Table 15.
The average voltage improvement with the proposed DLAC model against the regular LAC model over all 72 cases has a mean of 48.7% and a standard deviation of 14.2%; the average branch reactive power improvement with DLAC against LAC is 39.8% with a standard deviation of 4.7%. This demonstrates the proposed DLAC power flow model significantly improve the regular LAC model. The average error of voltage solutions obtained from DLAC is only 0.67% among 72 system scenarios. The standard deviation of voltage error is 0.33%, which indicates the proposed DLAC model is stable and robust over different system conditions. For branch reactive power, the average error is 19.4% with a standard deviation of 6%, and although reactive power solution is not as accurate as voltage, the proposed DLAC can provide reactive power information within an acceptable range, which shows its superiority over the traditional DC power flow model.
It is worth noting that determining regressors’ coefficients with linear regression is fast and the regression model can be easily retrained with the latest system operating data. However, frequent model retraining may not be necessary unless the system voltage profile and operating points have significantly deviated from the previous training data. This feature can enable the proposed DLAC model to be integrated in several power system applications, such as economic dispatch, unit commitment, and grid expansion planning.