**Fei Mei 1,\*, Qingliang Wu 1, Tian Shi 2, Jixiang Lu 3, Yi Pan <sup>2</sup> and Jianyong Zheng <sup>2</sup>**


Received: 28 February 2019; Accepted: 2 April 2019; Published: 9 April 2019

**Abstract:** Recently, a large number of distributed photovoltaic (PV) power generations have been connected to the power grid, which resulted in an increased fluctuation of the net load. Therefore, load forecasting has become more difficult. Considering the characteristics of the net load, an ultrashort-term forecasting model based on phase space reconstruction and deep neural network (DNN) is proposed, which can be divided into two steps. First, the phase space reconstruction of the net load time series data is performed using the C-C method. Second, the reconstructed data is fitted by the DNN to obtain the predicted value of the net load. The performance of this model is verified using real data. The accuracy is high in forecasting the net load under high PV penetration rate and different weather conditions.

**Keywords:** net load forecasting; phase space reconstruction; deep neural network

### **1. Introduction**

In recent years, an increasing number of photovoltaic (PV) power generations have been connected to the distribution network. The use of new energy brings huge benefits to human beings. However, it also has negative impacts on the power grid while improving the environment. The PV power is greatly influenced by weather conditions and fluctuates with the change in irradiance. To ensure the quality of the power supply and safe operation of the power system, it is necessary to maintain equal amounts of power generation and power consumption in the power system. The power load forecasting is an indispensable mean of maintaining this dynamic balance. In addition, power load forecasting is also of great significance for the planning and scheduling of power systems and the planning of power maintenance. However, the fluctuations in PV power can cause fluctuations in the load when many PV systems are connected. Therefore, load forecasting will become more difficult.

Generally, load forecasting can be divided into long-term forecasting, medium-term forecasting, short-term forecasting, and ultrashort-term forecasting [1]. Recently, many systematic and fruitful studies on traditional load forecasting have been conducted. The load forecasting methods mainly include the similar day prediction method [2,3], time series prediction method [4,5], expert system [6,7], and regression analysis method [8,9]. Artificial intelligence and machine learning algorithm are types of prediction methods that have rapidly developed in recent years. Support vector regression (SVR) [10], relevance vector regression (RVR) [11], artificial neural network (ANN) [12], deep neural network (DNN) [13], and their improved hybrid algorithms [14] have been applied in the field of

load forecasting. In Moon et al. [15], the random forest and multilayer perceptron are combined to predict the daily electrical load. In Nazar et al. [16], the wavelet and Kalman machines, Kohonen self-organizing map (SOM), multi-layer perceptron artificial neural network (MLP-ANN) and adaptive neuro-fuzzy inference system (ANFIS) are used to establish a hybrid three-stage forecasting model. In Zhang et al. [17], a short-term power load forecasting method with wavelet neural network (WNN) and an adaptive mutation bat optimization algorithm (AMBA) are proposed. Liang et al. [18] propose a hybrid model that combines the empirical mode decomposition (EMD), minimal redundancy maximal relevance (mRMR), general regression neural network (GRNN), and fruit fly optimization algorithm (FOA). In Dai et al. [19], a load forecasting model is proposed based on the complete ensemble empirical mode decomposition (EEMD) with adaptive noise and support vector machine (SVM), which is optimized by the modified grey wolf optimization (MGWO) algorithm.

Since traditional load data are typically in hours as the smallest unit, short-term and ultrashort-term forecasts are often not strictly distinguished. With the large-scale access of distributed PV, the short-term load forecasting usually in hours cannot satisfy the requirements of the real-time safety analysis. For the real-time safety analysis of power systems and the reliable operation of economic dispatch, a more detailed ultrashort-term prediction is required. Considering the volatility of the distributed PV power generation and the real-time requirements of the ultrashort-term prediction, the PV power should also be considered a load and merged with the traditional load to form a net load [20–22]. Because the net load is a set of nonlinear time series with large volatility, in this paper, the phase space reconstruction of net load data is first performed to project the data into the moving point with certain regularity of the trajectory in the phase space. Then, the excellent nonlinear fitting ability of the deep neural network is used to fit the moving point trajectory to obtain the final prediction value. Finally, the actually measured load data is applied to verify the prediction effectiveness and prediction effect of the model under different weather conditions. The phase space is a tool to feature a dynamic system that is reconstructed from a univariate or multivariate time series [23], which is widely used in forecasting models. The DNN is the development of the traditional ANN and suitable for net load forecasting because the nonlinear fitting ability is strengthened [24]. The high accuracy in forecasting the net load under high PV penetration rate and different weather conditions is verified using real data. The contribution of this paper can be summarized as follows:


#### **2. Fundamental of Phase Space Reconstruction and Deep Neural Network**

#### *2.1. Phase Space Reconstruction*

Phase space reconstruction is a method proposed by Takens to analyze the time series. The basic idea of the phase space reconstruction is to regard the time series as a component produced by a certain nonlinear dynamic system. The equivalent high-dimensional phase space of the power system can be reconstructed by the variation law of the component. Among them, the key to reconstruction is to determine the optimal embedding dimension *mopt* and delay *topt*.

In this paper, the optimal embedding dimension *mopt* and delay *topt* are simultaneously obtained by the C-C method. If a set of time series is *<sup>x</sup>* <sup>=</sup> {*x*1, *<sup>x</sup>*<sup>2</sup> ··· *xN*}, the embedding dimension is *<sup>m</sup>*, and the time delay is *t*, then the set of points in the reconstructed phase space can be expressed by Formula (1), where *M* = *N* − (*m* − 1)*t* [25].

$$
\begin{bmatrix}
\mathbf{X}\_1 \\
\mathbf{X}\_2 \\
\vdots \\
\mathbf{X}\_M
\end{bmatrix} = \begin{bmatrix}
\mathbf{x}\_1 & \mathbf{x}\_{1+t} & \cdots & \mathbf{x}\_{1+(m-1)t} \\
\mathbf{x}\_2 & \mathbf{x}\_{2+t} & \cdots & \mathbf{x}\_{2+(m-1)t} \\
\vdots & \vdots & \cdots & \vdots \\
\mathbf{x}\_M & \mathbf{x}\_{M+t} & \cdots & \mathbf{x}\_N
\end{bmatrix}.
\tag{1}
$$

At this time, the correlation integral is Formula (2), where *θ*(*x*) = 0 *x* < 0 <sup>1</sup> *<sup>x</sup>* <sup>≥</sup> <sup>0</sup> . According to the statistical conclusion of BDS, the range of *M* and *rk* can be obtained when *N* > 3000; *m* ∈ {2, 3, 4, 5}, and *rk* = *k* × 0.5*σ* which is a real number representing a given range of distances. *σ* is the standard deviation of time series, and *k* ∈ {1, 2, 3, 4}. Correlation integral indicates the probability that the distance between any two points in the phase space is less than *rk*.

$$\mathcal{C}(m, N, r\_{k'}t) = \frac{2}{M(M-1)} \sum\_{1 \le i < j \le M} \theta \left(r\_k - ||\mathbf{X}\_i - \mathbf{X}\_j||\right). \tag{2}$$

We define the test statistic *S* and Δ*S*, and use the block averaging strategy, as shown in Formula (3).

$$\begin{cases} \begin{array}{rcl} S(m\_\prime N\_\prime r\_{k\prime} t) &=& \frac{1}{t} \sum\_{i=1}^{M} \mathbb{C}\_i \Big( m\_\prime \frac{N\_\prime}{t} r\_{k\prime} t \Big) - \mathbb{C}\_i^m \Big( m\_\prime \frac{N\_\prime}{t} r\_{k\prime} t \Big) \\\ \Delta S(m\_\prime N\_\prime t) &=& \max \Big[ S(m\_\prime N\_\prime r\_{k\prime} t) \Big] - \min \Big[ S(m\_\prime N\_\prime r\_{k\prime} t) \Big] \end{array} . \end{cases} \tag{3}$$

Formula (4) is the average of *S* and Δ*S*. Rounding the *t* value of the first zero of *S* or the first minimum of Δ*S* is the optimal delay *topt*.

$$\begin{cases} \begin{array}{rcl} \overline{S}(t) &=& \frac{1}{4 \times 4} \sum\_{m=2}^{5} \sum\_{k=1}^{4} S(m, N, r\_k, t) \\ & \overline{\Delta S}(t) =& \frac{1}{4} \sum\_{m=2}^{5} \Delta S(m, N, t) \end{array} . \end{cases} \tag{4}$$

Formula (5) is the test statistic. The global minimum of *Scor*(*t*) is the optimal embedded window *tω*.

$$S\_{cor}(t) := \overline{\Delta S}(t) + |\overline{S}(t)|. \tag{5}$$

Then:

$$t\_{\omega} = (m\_{opt} - 1)t\_{opt}.\tag{6}$$

Therefore, the optimal delay *topt* determined by Equation (4) and the optimal embedded window *tω* determined by Equation (5) can be substituted into Formula (6) and rounded to obtain the optimal embedding dimension *mopt*.

#### *2.2. Deep Neural Network*

A neural network generally consists of three layers: An input layer, a hidden layer, and an output layer. As shown in Figure 1, it is a simple neural network with three inputs and two outputs and four neurons in a single hidden layer. The neurons between layers are connected by weight *ω*. The DNN contains multiple hidden layers, which has a significant improvement in the nonlinear fitting ability of the DNN compared with the single hidden layer. However, too many hidden layers are likely to cause over-fitting. The learning process of the network is the process of adjusting and determining the connection weights *ω* of each neuron through training samples.

**Figure 1.** Single-hidden-layer neural network.

In this paper, the LMBP algorithm is used to train DNN. Compared with the traditional back propagation (BP) algorithm, the LMBP algorithm has faster convergence speed and higher convergence reliability. It is more suitable for training the DNN and can also satisfy the requirements of real-time ultrashort-term prediction. Unlike the traditional BP algorithm, which uses a gradient descent, the LMBP algorithm is based on the Gauss-Newton method of the least squares solution and takes the square of error *v* as the objective function.

$$E(\omega) = \mathfrak{v}^{\mathsf{T}}(\omega)\mathfrak{v}(\omega). \tag{7}$$

The second-order Taylor expansion and derivation of the objective function of Equations (2)–(7) can obtain the change of weight *ω* as follows:

$$
\Delta\omega = -\left[\nabla^2 E(\omega)\right]^{-1} \nabla E(\omega). \tag{8}
$$

where:

$$\begin{cases} \nabla E(\omega) = 2\mathbf{J}^{\mathrm{T}}(\omega)\boldsymbol{\nu}(\omega) \\ \nabla^{2}E(\omega) = 2\mathbf{J}^{\mathrm{T}}(\omega)\mathbf{J}(\omega) + 2\nabla^{2}\mathbf{\nu}^{\mathrm{T}}(\omega)\boldsymbol{\nu}(\omega) \end{cases} \tag{9}$$

*<sup>J</sup>*(*ω*) is the Jacobian matrix of *<sup>v</sup>*(*ω*). If *<sup>v</sup>*(*ω*) consists of *<sup>a</sup>* elements, *<sup>J</sup>*(*ω*) can be written as follows:

$$J(\omega) = \begin{bmatrix} \frac{\partial \upsilon\_1}{\partial \omega\_1} & \frac{\partial \upsilon\_1}{\partial \omega\_2} & \cdots & \frac{\partial \upsilon\_1}{\partial \omega\_n} \\ \frac{\partial \upsilon\_2}{\partial \omega\_1} & \frac{\partial \upsilon\_2}{\partial \omega\_2} & \cdots & \frac{\partial \upsilon\_2}{\partial \omega\_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial \upsilon\_a}{\partial \omega\_1} & \frac{\partial \upsilon\_a}{\partial \omega\_2} & \cdots & \frac{\partial \upsilon\_a}{\partial \omega\_n} \end{bmatrix}. \tag{10}$$

Since 2∇2*v*T(*ω*)*v*(*ω*) is usually negligible, Formula (8) can be rewritten as follows:

$$
\Delta\omega = \left[\boldsymbol{J}^{\mathrm{T}}(\omega)\boldsymbol{J}(\omega)\right]^{-1}\boldsymbol{J}^{\mathrm{T}}(\omega)\boldsymbol{v}(\omega). \tag{11}
$$

Considering that *<sup>J</sup>*T(*ω*)*J*(*ω*) may be irreversible, Formula (11) is modified by adding the correction coefficient *<sup>μ</sup>*, where *I* is the unit matrix.

$$
\Delta\omega = -\left[\boldsymbol{\J}^{\mathrm{T}}(\omega)\boldsymbol{\J}(\omega) + \mu\boldsymbol{I}\right]^{-1}\boldsymbol{\J}^{\mathrm{T}}(\omega)\boldsymbol{v}(\omega). \tag{12}
$$

Similar to the BP algorithm, the modification of weight *ω*(*k*+1) in the *k*th iteration is shown in Formula (13). Δ*ω*(*k*) can be obtained by Formula (12). When *E* - *ω*(*k*+1) < *ε*, the algorithm has converged, where *ε* is the given error limit.

$$
\omega^{(k+1)} = \omega^{(k)} + \Delta\omega^{(k)}.\tag{13}
$$

The initial value of *μ* generally takes a small positive number such as 0.001. If the objective function *E* - *ω*(*k*) becomes lower in the *k*th iteration, *μ*(*k*) is divided by a factor *θ* as *μ*(*k*+1) of the next iteration. If the objective function *E* - *ω*(*k*) becomes higher in the *k*th iteration, the iteration will be restarted and multiply *μ*(*k*) by the factor xxx as *θ* of this iteration. *θ* generally takes a number greater than 1, such as 4.
