**Short-Term Load Forecasting 2019**

Editors

**Antonio Gabald ´on Mar´ıa Carmen Ruiz-Abell ´on Luis Alfredo Fern´andez-Jim´enez**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Antonio Gabaldon´ Universidad Politecnica de Cartagena Spain

Mar´ıa Carmen Ruiz-Abell´on Universidad Politecnica de Cartagena Spain

Luis Alfredo Fern´andez-Jim´enez Universidad de La Rioja Spain

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Energies* (ISSN 1996-1073) (available at: https://www.mdpi.com/journal/energies/special issues/ STLF2019).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Article Number*, Page Range.

**ISBN 978-3-03943-442-8 (Hbk) ISBN 978-3-03943-443-5 (PDF)**

© 2021 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

### **Contents**



### **About the Editors**

**Antonio Gabaldon´** Professor), Industrial Engineer, received his M.Sc. (1988) and Ph.D. (1991) from the Universidad Politecnica ´ de Valencia (Spain). In 1989, he visited the Universite´ de Montreal with a predoctoral grant. He has been a professor both at the University of Murcia (1990–1999) and the Spanish Air Force Academy of San Javier (1993–2008). Since 2003, he has been Professor in the Electrical Engineering Department at the Politecnica ´ de Cartagena (UPCT). His lines of research are focused on the analysis of electrical distribution systems, electrical markets, demand response, energy efficiency, railway traction, and non-invasive monitoring techniques. He has participated as a researcher in several R&D projects funded by public calls, both national and international (European Commission, NATO Grants). He has been Main Investigator (IP) in four R&D projects funded by the Spanish Government (2007–2019), projects focused on demand response and energy efficiency. He is also a member of the research network REDYD-2050 (main topic: distributed energy resources in the energy horizon 2050). Moreover, he has participated in R&D contracts of special relevance with companies and utilities, serving as responsible researcher in some of them. Some of the results of his research activity are published in 30 research articles in international scientific journals, 20 of them in well-known indexed journals (IEEE, IET, Elsevier, Springer, MDPI, or Compel). These works have been referenced more than 880 times in 750 scientific articles (source: Scopus 2020). He was coordinator of doctorate programs with a "quality label" (Spanish Government). He is also a reviewer of Spanish National Research Agencies and scientific journals listed in the JCR-ISI. He has also held academic positions as Dean of the ETSII of Cartagena (1999–2008) and Vice-Rector for Academic Affairs and Doctorate (2008–2010).

**Mar´ıa Carmen Ruiz-Abellon´** (Associate Professor) received her Ph.D. and M.Sc. degrees in Mathematics from the University of Murcia (Spain) in 2002 and 1998, respectively. Since October 1999, she has been with the Department of Applied Mathematics and Statistics, Universidad Polit´ecnica de Cartagena, Spain. Her research interests mainly include forecasting methods, machine learning, time series analysis, clustering, statistical inference, and information theory. She has participated in 10 projects and 7 private R&D contracts, including the management of 5 of them. She has co-authored 25 research articles published in indexed journals. These works have been referenced 275 times in 243 scientific articles (source: Scopus 2020).

**Luis Alfredo Fernandez-Jim ´ enez ´** (Associate Professor), Industrial Engineer, received his M.Sc. (1992) from the University of Zaragoza (Spain) and Ph.D. (2007) from the University of La Rioja (Spain). Since 1992, he has been University Professor in the Electrical Engineering Department of the University of La Rioja. His lines of research are focused on planning, operation, and control of power systems. He has been Main Investigator (IP) in three R&D projects funded by the Spanish Government (2010–2019), projects focused on the development of forecasting models for applications in the electric power sector. He has authored or co-authored two university texts related to electrical engineering and 22 research articles published in indexed journals. These works have been referenced more than 680 times in 640 scientific articles (source: Scopus 2020). Since 2012, he has been the head of the Electrical Engineering Department of the University of La Rioja.

### **Preface to "Short-Term Load Forecasting 2019"**

The future of power systems and markets is exciting but presents a number of risks for customers, utilities, network operators, and society as a whole. The integration of renewable generation sources by 2030–2050 [1] and the potentiation of "active customers" [2] will lead to quite different planning and operation tasks in this new scenario [3]. Traditional tools will no longer perform as they do at present. For instance, the ability of the future generation mix to meet load demands, at all times, becomes a more complex task with interesting opportunities for new actors both in the demand and supply sides of a power system. Uncertainties and randomness concerns related to electricity demand appear in the literature: how can we manage the availability of energy outputs from renewable generation resources and the flexibility of customers? New and complex forecasting methods [4] can provide a partial solution to this challenge. In our case, short-term load forecasting methods (STLF) are used to evaluate demand, and perhaps supply.

While STLF has usually been applied to non-responsive customers, this scenario is anticipated to change to "active customers". This concept refers to a new dynamic actor that can participate in electricity markets (energy, capacity, or balance), alone or through demand aggregators, and changes its demand due to economic and technical considerations. This participation requires an estimation of demand in the short term (much more complex) to avoid penalties for non-compliance at lower aggregation levels (from hundreds of kW to some MW), where STLF methodologies should be revisited and modified to improve their performance.

This book compiles thirteen papers published in the Special Issue titled "Short-Term Load Forecasting 2019", which represent a research advance inside a wide range of specific topics described below, all of them of great importance in the field of STLF. The usefulness and relevance of hybrid or combined models, together with multistep methodologies, is indisputable. There are many recent papers in this context. All of them try to overcome the drawbacks of other existing methods, while at the same time seeking to gain robustness and improve predictions. To some extent, all the articles of this Special Issue employ combined or/and multistep methods to provide predictions: in some cases, they employ them as an intermediate tool, and the novelty of the research is focused on other aspects; in other cases, the hybrid method proposed by the authors represents the main contribution. The latter group of studies includes the following papers: [5], where a novel model combining a data pre-processing technique, forecasting algorithms, and an advanced optimization algorithm is developed; [6], which proposes a very short-term bus load prediction model based on a phase space reconstruction and deep belief network; [7], which proposes a hybrid load-forecasting method that combines classical time series formulations with cubic splines to model electricity load; and [8], where the electricity demand time series is divided into two major components (deterministic and stochastic) and both components are estimated using different regression and time series methods with parametric and nonparametric estimation techniques. These last two papers remind us that we must not forget the usefulness of classical methods.

Despite the great number of papers on this topic, there is an issue that remains open: how to guide researchers to employ proper hybrid technology for different datasets [4]. Two review papers of this book represent an advance on this topic: [9], which discusses four categories of state-of-the-art STLF methodologies (similar pattern, variable selection, hierarchical forecasting, and weather station selection), where each of these methods proposes a specific solution for load prediction, and [10], where the authors highlight the necessity of developing additional and case-specific performance criteria for electricity load forecasting (better accuracy does not imply lower costs caused by forecasting errors).

Another aspect of interest related to the mix of methodologies can be found in the context of demand response programs in hybrid energy systems. In [11], a methodology is proposed that could help power systems or aggregators to make up for the lack of accuracy of the current forecasting methods when predicting renewable energy generation, whereas [12] utilizes both long and short data sequences to propose a model that supports the demand response program in hybrid energy systems, especially systems using renewable and fossil sources.

There are many features we must consider developing a good STLF model, such as climatic factors, seasonality, and calendar effects. The authors of [13] highlight the importance of distinguishing different types of special days (those on which working or social habits differ from the ordinary) to reduce the greatest forecasting errors and propose several ways to classify those special days.

Current forecasting methods have shown high efficiency and accuracy, mainly at the power system and great consumer levels. However, there is much to be done at the residential level due to the high volatility and uncertainty of the electric demand of a single household. This topic is dealt with by [14] and [15]: the former presents a scalable system for day-ahead household electrical energy consumption forecasting, based on a deep residual neural network, and extracts features from the historical load of the particular household and all households present in the dataset; the latter proposes a forecasting method based on convolution neural networks combined with a data-augmentation technique that can artificially enlarge the training data.

The issue caused by a lack of historical data or limited data is also addressed in [16] and [17]: in the first case, the authors propose a novel STLF model to predict energy consumption for buildings with limited data sets by using multivariate random forest to construct a transfer learning-based model; in the second case, the author introduces the problem of load "nowcasting" to the energy forecasting literature, where one predicts the recent past using limited available metering data from the supply side of the system.

#### **References**


### **Antonio Gabald ´on, Mar´ıa Carmen Ruiz-Abell ´on, Luis Alfredo Fern´andez-Jim´enez** *Editors*

### *Article* **Research and Application of a Novel Combined Model Based on Multiobjective Optimization for Multistep-Ahead Electric Load Forecasting**

#### **Yechi Zhang 1, Jianzhou Wang 1,\* and Haiyan Lu <sup>2</sup>**


Received: 10 April 2019; Accepted: 16 May 2019; Published: 20 May 2019

**Abstract:** Accurate forecasting of electric loads has a great impact on actual power generation, power distribution, and tariff pricing. Therefore, in recent years, scholars all over the world have been proposing more forecasting models aimed at improving forecasting performance; however, many of them are conventional forecasting models which do not take the limitations of individual predicting models or data preprocessing into account, leading to poor forecasting accuracy. In this study, to overcome these drawbacks, a novel model combining a data preprocessing technique, forecasting algorithms and an advanced optimization algorithm is developed. Thirty-minute electrical load data from power stations in New South Wales and Queensland, Australia, are used as the testing data to estimate our proposed model's effectiveness. From experimental results, our proposed combined model shows absolute superiority in both forecasting accuracy and forecasting stability compared with other conventional forecasting models.

**Keywords:** electric load forecasting; data preprocessing technique; multiobjective optimization algorithm; combined model

#### **1. Introduction**

It is known that the electric power industry plays a vital role in many aspects of people's lives [1]. Effective forecasting enables adjustments to be made of power generation according to market demand, and to the reduction of management and operational costs [2]. On this basis, accurate power load forecasting is necessary in daily operations of power systems [3]. However, due to various uncertainties and climate change, economic fluctuations, industrial structure, and national policy and other social environment complexity, it is difficult to meet expectations in terms of the accuracy of power load forecasting [4]. Inaccurate forecasting often results in considerable loss of power systems. For example, overestimated forecasts often result in wasted energy, while underestimated forecasts will result in economic loss [5]. With the development of society, the expansion of urbanization, and the continuous improvement of industry, the demand for electricity is continuously increasing, which poses a challenge to electric load prediction systems [6]. Accurate power load forecasting is indispensable to the whole society, which not only reflects the economic rationality of power dispatching, but can also be reflected in power construction planning and power supply reliability. Therefore, developing a novel and robust model to improve forecasting performance is essential for power load forecasting [7]. In the past few years, in order to achieve accurate short-term time series forecasting of power load, a lot of research has been carried out. There are mainly four types of related algorithms: (i) physical arithmetic, (ii) spatial correlation arithmetic, (iii) conventional statistical arithmetic, (iv) and artificial intelligence arithmetic [8].

#### **2. Literature Review**

"Physical algorithm" is a general term referring to models that primarily use physical data such as temperature, velocity, density, and terrain information based on a numerical weather prediction (NWP) model to predict wind speeds in subsequent periods [9]. The NWP model is a computer program designed to solve atmospheric equations. Based on the NWP wind resource assessment method, Cheng et al. [10] evaluated wind speed distribution by comparing three deterministic probabilities. From their experiment results, they found that NWP could not only achieve reliable probability assessment but also supply precise forecasting estimates. However, physical methods cannot handle time series for short-term horizons [11]. Moreover, when using an NWP model, much calculation time and many computing resources are required [12]. Spatial correlation models, which are applied to solve time series forecasting to make up for the shortcomings of physical algorithms, take the relationships of time series from different locations into consideration [13]. A classic case is a novel model proposed by Tascikaraoglu et al. [14] utilizing a spatiotemporal method and a wavelet transform, successfully improving the performance of forecasting compared to other benchmark models. However, spatial correlation arithmetic is always difficult to use in practice because of its requirements of strict measurements and a large amount of meticulous measuring in many spatially related sites [15].

Traditional prediction methods also include random time series models such as exponential smoothing, autoregressive (AR) methods, filtering methods, autoregressive moving average (ARMA) methods, and the well-known autoregressive integrated moving averages (ARIMA) and seasonal ARIMA models, mainly focusing on regression analysis [16,17]. The regression model is aimed at establishing a relationship between historical data, treated as dependent variables, and influencing factors, treated as independent variables [18]. For example, Lee and Ko [19] adopted an ARIMA-based model to forecast and simulate hourly electric load data of the Taipower system. Wang et al. [20] improved the accuracy of seasonal ARIMA applied to electricity demand forecasting by the use of residual modification models. They applied a seasonal ARIMA approach, an optimal Fourier model, and a combined model including seasonal ARIMA and the PSO optimal Fourier method. They used these three models to predict electric load time series data in northwestern China. After juxtaposing the results, they found that the combined model was the most accurate one. Bro˙zyna et al. [21] used the TBATS model to overcome the seasonality in data, which may bring difficulties when doing time series forecasting by using models such as ARIMA.

Modern forecasting methods include artificial neural networks (ANNs), support vector machines (SVMs), fuzzy systems, expert system forecasting methods, chaotic time series methods, gray models, adaptive models, optimization algorithms, etc. [22]. These modern methods are getting more popular among researchers when dealing with time series forecasting [23]. These artificial intelligence models can achieve good forecasting performance because of their unique characteristics, such as memory, self-learning, and self-adaptability, since the neural networks are products of biological simulation that follow the behavior of the human brain [24]. Park [25] showed good performance of this type of model after first applying ANNs in power load forecasting in 1991. He concluded that ANNs were highly effective in electrical load forecasting. After that, many time series forecasting studies were performed using various artificial neural networks by a lot of researchers [26]. Lou and Dong [27] proved that electric load forecasting with RFNN showed much higher variability with hourly data in Macau. Okumus and Dinler [28] integrated ANNs and the adaptive neuro-fuzzy inference system to predict wind power, and forecasting results proved that their proposed hybrid model was better than the classical methods in forecasting accuracy. Hong [29] selected better parameters for SVR by using the CPSO algorithm, while Che and Wang [30] established a hybrid model that was a combination of ARIMA and SVM, called SVRARIMA. Liu et al. [31] built a model integrating EMD, extended extreme learning machine (ELM), Kalman filter, and PSO algorithm. Although the hybrid model seemed better than individual classical models, the limitations of each model due to the nature of the structure seemed inevitable [32]. In order to solve this problem, a combined forecasting model is proposed. The combined forecasting theory has been developed through the joint efforts of three

generations of scientists. It was initiated by Bates and Granger [33] and developed by Diebold and Pauly [34], then further extended by Pesaran and Timmermann [35] as a combination of several individual models. Many kinds of ANNs have been combined into short-term forecasting models in order to fully utilize the advantages of individual models and at the same time overcome their shortcomings. There are some typical studies: Zhang et al. [36] successfully obtained promising results of wind speed forecasting by developing a combined model that consisted of CEEMDAN, five neural networks, CLSFPA, and no negative constraint theory (NNCT). In addition, Che et al. [37] developed a kernel-based SVR combination model in a study on electric load prediction.

It is obvious from the review of forecasting methods that there are shortcomings in both traditional and modern techniques. The shortcomings of these models are summarized as follows:

For physical algorithms, the main problem is that physical methods cannot deal with short-term horizons. Physical methods perform well when dealing with long-term forecasting problems [38]. Moreover, it costs a lot of computing time and resources when using NWP models because of their complex calculation process and high cost. Spatial correlation arithmetic requires detailed measurements from multiple spatially correlated sites, which increases the difficulty in searching for electric load data. Moreover, because of the strict measuring requirements and time delays, the model is always hard to implement [39].

For conventional statistical arithmetic, mainly known as the linear model, there are insurmountable shortcomings. First and foremost, these models cannot deal with nonlinear features of electric load time series [40]. Moreover, the regression method also fails to achieve the expected forecasting accuracy. Linear regression relies too much on historical data to cope with nonlinear forecasting problems; as time goes by, the forecasting effect of regression analysis models will become weaker and weaker [41]. In addition, when faced with complex objective data, it is hard to choose the appropriate influencing factors. The exponential smoothing model also has shortcomings, in that it cannot recognize the turning point of the data and does not perform well in long-term forecasting [42]. As for the autoregressive moving average model, it only gets results through historical and current data, ignoring potential influencing factors. In addition, strong random factors of the data may lead to instability of the model, which affects the accuracy of the forecasting performance [43]. All in all, none of these models meet the accuracy required by an electric load forecasting system.

For artificial intelligence arithmetic, although artificial intelligence neural network performance is superior to traditional forecasting techniques, ANNs are impeccable; the defects and shortcomings of their structure cannot be ignored. There are three major problems. First, it is hard to choose the parameters of ANN models, as a slight change in parameters may cause huge differences in the outcomes [44]. Second, ANNs are inclined to fall into local minima owing to their relatively slow self-learning convergence rate [45]. Lastly, the number of layers and neurons in a neural network structure has an effect on the forecasting result and computing time [46]. As to other models, SVM has a high requirement for storage space and expert systems strongly rely on knowledge databases, while gray forecasting models can produce decent results only under the condition of exponential growth trends [47]. To solve these problems, evolutionary algorithms are applied. When the optimization algorithms are combined with forecasting models, more reasonable parameters will be selected and more accurate results will be obtained.

To overcome the abovementioned drawbacks, in our proposed model, we use a data preprocessing method, no negative constraint theory (NNCT) [48], a multiobjective optimization algorithm, a linear forecasting method, autoregressive integrated moving average (ARIMA) [49], and three artificial intelligence forecasting algorithms, wavelet neural network (WNN) [50], extreme learning machine (ELM) [51], and back propagation neural network (BPNN) [52]. The proposed model improves forecasting performance by maximizing the benefits of both linear and nonlinear advantages by using each single model. It is worth mentioning that for the purpose of improving the forecasting effect of our model, a mechanism based on decomposition and reconstruction is employed to ensure that the main features of the original data are identified and extracted by removing high-frequency noise signals. Then, four individual models are applied to the electrical load forecasting. Lastly, a new weight decision technique based on the multiobjective grasshopper optimization algorithm and stay-one strategy was successfully used to integrate the four models. The experimental results show that our combined model has high forecasting accuracy and strong stability.

The main contributions and novelties of our proposed model are summarized as follows:


The rest of the paper is arranged as follows. In Section 2, we introduce the methodology we applied in the proposed model, including the data preprocessing technique, ARIMA, WNN, ELM, BPNN, the theory of combined models, and multiobjective grasshopper optimization. Section 3 describes the electric load time series we selected and three experiments aimed at verifying the effectiveness of our forecasting model. In Section 4, we provide an in-depth discussion of the proposed model, including a test of the performance of the proposed optimization algorithm, two tests of the effectiveness of the model, and a test showing the improvement of the model and a comparative experiment of the combination method.

#### **3. Methods**

In this section, we discuss the methods of the proposed combined model in detail, including the singular spectrum analysis (SSA) technique, the individual models used in the combined model, and the multiobjective grasshopper optimization algorithm (MOGOA). After that, a combined model that can significantly improve the definition of electric load forecasting is presented.

#### *3.1. SSA Technique*

SSA is a nonparametric spectral estimation method usually used for filtering in the preprocessing stage of time series forecasting. The advantage of SSA is that it always works well in both linear and nonlinear time series. Moreover, it performs well whether the time series is stationary or not. In short, the way SSA works is to identify the trend and noise parts of a time series, after which it reconstructs a new series.

#### *3.2. Wavelet Neural Network*

Wavelet neural network (WNN) is a modern artificial intelligence model. It is essentially a feed-forward neural network based on wavelet transform [53]. Its basic working principle is to use wavelet space as the feature space of pattern recognition to realize the feature extraction of signals by weighting the inner product of the wavelet base and the signal vector and combining the time-frequency localization of the wavelet transform and the self-learning function of the neural network. It has the advantage of being able to effectively learn the input/output characteristics of the system without the

need for a priori information such as data structures and characteristics. In addition, compared with traditional neural networks, wavelet neural networks can often achieve better prediction accuracy, faster convergence, and better fault tolerance when forecasting in complex nonlinear, uncertain, and unknown systems. So, we applied WNN as an individual nonlinear model in our proposed model.

#### *3.3. Extreme Learning Machine*

Extreme learning machine (ELM) is a kind of machine learning algorithm based on feed-forward neuron network [54]. Its main feature is that the hidden layer node parameters can be given randomly or artificially and do not need to be adjusted. The learning process only needs to calculate the output weight. ELM has the advantages of high learning efficiency and strong generalization ability and is widely used in time series forecasting. As a result, we applied ELM as an individual nonlinear model in our proposed model.

#### *3.4. Back Propagation Neural Network*

The back propagation neural network (BPNN), composed of an input layer, a hidden layer, and an output layer, is a concept that was proposed by scientists led by Rumelhart and McClelland in 1986 [55]. It is a multilayer feed-forward neural network trained according to the error back propagation algorithm. Learning and working stages are the whole process of BPNN. It is the most widely used neural network. It has arbitrary complex pattern classification ability and excellent multidimensional function mapping ability, which solves the exclusive OR (XOR) and other problems that cannot be solved by simple perception. In essence, the BP algorithm uses the network error squared as the objective function and the gradient descent method to calculate the minimum value of the objective function. Moreover, because of its flexible structure and strong nonlinear mapping capability, BPNN is widely applied in the engineering field. So, we applied it as an individual nonlinear model in our proposed model.

#### *3.5. Autoregressive Integrated Moving Average Model*

The ARIMA model, also known as the autoregressive moving average model, is a model used for time series forecasting with relatively high prediction accuracy. The ARIMA model mainly consists of 3 forms, a moving average MA model, an autoregressive AR model, and a mixture of autoregressive moving average ARMA models. Before using this model, it is necessary to first analyze whether the time series is stable. If the sequence is a nonstationary time series, the first step is to differentiate the time series, and the difference must be smoothed before the model is established, otherwise it cannot be used.

The difference between the ARIMA model and the ARMA model is that the ARMA model is built for stationary time series and the ARIMA model is used for nonstationary time series. In other words, to establish an ARMA model for a nonstationary time series, you first need to transform into a stationary time series and then build an ARMA model. We applied ARIMA as an individual linear model in our proposed model.

#### *3.6. Basic concepts of Multiobjective Optimization Problems*

Conventional relational operators such as >, <, and =, which are always found in single-objective optimization problems, cannot be applied in multiobjective optimization. To address this problem, a new concept of dominates was proposed and then extended by Edgeworth in 1881 and Pareto in 1964. Details of Pareto dominance are as follows:

#### **Definition 1** (*Pareto dominance*):

The definition of Pareto dominance is: vector *y* = (*y*1, *y*2,...*yz*) is dominated by vector *x* = (*x*1, *x*2,...*xz*) (i.e., *x* > *y*) when

$$\forall t \in [1, z], \left[ f(\mathbf{x}\_t) \ge f(y\_t) \right] \land \left[ \exists t \in [1, z] : f(\mathbf{x}\_t) > f(y\_t) \right] \tag{1}$$

where *z* represents the length of vectors.

#### *3.7. Multiobjective Grasshopper Optimization Algorithm*

MOGOA is the latest nature-inspired method, proposed by Mirjalili [56]. Essentially, MOGOA is a multiobjective version of GOA. GOA is a nature-inspired algorithm that simulates the swarming behavior of grasshoppers. The position of a grasshopper in the swarm representing a possible solution of a given single-objective optimization problem is the main principal of GOA. The details of MOGOA, and the main steps of building it, are as follows:

In order to replicate the real living conditions of grasshoppers in nature, MOGOA takes 3 factors—gravity force, social interaction, and wind advection—into the model. *Xi* means the position of the *i*th grasshopper and is represented by:

$$X\_{\bar{i}} = S\_{\bar{i}} + G\_{\bar{i}} + A\_{\bar{i}} \tag{2}$$

where *Si*, *Gi*, and *Ai* mean social interaction, gravity force, and wind advection, respectively.

Social interaction is the most important factor, calculated by the following equation:

$$S\_i = \sum\_{\substack{j=1 \\ j \neq i}}^N s(d\_{ij})\hat{d}\_{ij} \tag{3}$$

$$d\_{ij} = \left| \mathbf{x}\_j - \mathbf{x}\_i \right| \tag{4}$$

$$
\hat{d}\_{ij} = \begin{pmatrix} \mathbf{x}\_j - \mathbf{x}\_i \end{pmatrix} / d\_{ij} \tag{5}
$$

$$s(r) = f e^{-r/l} - e^{-r} \tag{6}$$

where *dij* means the distance between the *i*th and *j*th grasshoppers, and ˆ *dij* is a unit vector of *dij*. Function s defines that the values of parameters *f* and *l* are changed, so the social forces can be changed too. The distance between grasshoppers is limited to the interval of [1,4], because, according to common sense, when 2 grasshoppers are far apart, they will not have a strong social influence on each other. Gravity force is defined as:

$$\mathbf{G}\_{i} = -\mathbf{g}\mathbb{A}\_{\mathcal{S}}\tag{7}$$

where *g* is the gravitational constant and *e*ˆ*<sup>g</sup>* represents the unity vector toward the center of the earth. Wind advection is defined as:

$$\mathbf{A}\_{l} = u \mathbb{A}\_{w} \tag{8}$$

where *u* means constant drift and *e*ˆ*<sup>w</sup>* represents the unity vector in the wind direction. After replacing Equation (2) with the above 3 equations, we can get:

$$X\_i = \sum\_{\substack{j=1 \\ j \neq i}}^N s(\left| \mathbf{x}\_j - \mathbf{x}\_i \right|) \frac{\mathbf{x}\_j - \mathbf{x}\_i}{d\_{ij}} - g\mathbf{\hat{e}}\_{\mathcal{S}} + \mathbf{u}\mathbf{\hat{e}}\_{\mathcal{U}} \tag{9}$$

Considering that the influence of gravity force on grasshoppers is too weak and assuming that the wind direction is always toward the best solution *T*ˆ *<sup>d</sup>*, some parameters are added to the mathematical model to enhance the ability to explore and exploit for the purpose of solving the optimization problem more effectively. After that, the mathematical model turns to:

$$X\_i^d = c(\sum\_{\substack{j=1 \\ j \neq i}}^N c \frac{ub\_d - lb\_d}{2} s(\left| \mathbf{x}\_j^d - \mathbf{x}\_i^d \right|) \frac{\mathbf{x}\_j - \mathbf{x}\_i}{d\_{ij}}) + \mathcal{T}\_d \tag{10}$$

where *ub<sup>d</sup>* and *lb<sup>d</sup>* are the upper and lower bound of the *d*th dimension, respectively, and *T*ˆ *<sup>d</sup>* is the best solution's *d*th dimension value so far. For the purpose of reducing exploration and increasing exploitation proportional to *c*max at the same time, the parameter *c* is updated with the following equation:

$$\mathcal{L} = \mathfrak{c}\_{\text{max}} - l \frac{\mathfrak{c}\_{\text{max}} - \mathfrak{c}\_{\text{min}}}{L} \tag{11}$$

Compared with finding solutions from a series of Pareto optimal solutions obtained by MOGOA, it is easier to find the best solution calculated so far in a single-objective search. Because the archive has all the Pareto optimal solutions, the position of the target is determined. Finding the target that can improve the solution's distribution becomes the biggest problem. The possibility of choosing the target from the archive is calculated by:

$$P\_i = \frac{1}{N\_i} \tag{12}$$

where *Ni* represents the neighborhood of the *i*th solution's total number. With this probability, there are 2 advantages to using the roulette method when selecting a target from a file: first, the roulette method can improve the distribution of less distributed areas of the search space, and second, when premature convergence occurs, a solution with a crowded neighborhood can be selected as a target to solve the problem.

When updating the content of the archive regularly in MOGOA, 2 criteria are implemented: (1) give up an external solution as long as this external solution is dominated by one archive solution; and (2) add an external solution to the archive when the external solution does not dominate all solutions inside the archive. Moreover, as long as an external solution dominates a solution inside the archive, the inside one should be replaced by the external one. All in all, MOGOA can not only find Pareto optimal solutions, but also store them in an archive.

The pseudocode of MOGOA is as follows:

#### **Algorithm 1:** *MOGOA*

*Objective functions:*

*min <sup>O</sup>*<sup>1</sup> <sup>=</sup> *Bias*(*y*ˆ) *<sup>O</sup>*<sup>2</sup> <sup>=</sup> *std*(*<sup>y</sup>* <sup>−</sup> *<sup>y</sup>*ˆ)

#### *Input:*

*y*ˆ*<sup>B</sup>* = (*y*ˆ*B*(1), *y*ˆ*B*(2), ··· , *y*ˆ*B*(*q*))- BPNN *y*ˆ*<sup>E</sup>* = (*y*ˆ*E*(1), *y*ˆ*E*(2), ··· , *y*ˆ*E*(*q*))- ELM *y*ˆ*<sup>W</sup>* = (*y*ˆ*W*(1), *y*ˆ*W*(2), ··· , *y*ˆ*W*(*q*))-WNN *y*ˆ*<sup>A</sup>* = (*y*ˆ*A*(1), *y*ˆ*A*(2), ··· , *y*ˆ*A*(*q*))-ARIMA

#### *Output:*

*<sup>y</sup>*<sup>ˆ</sup> *<sup>f</sup>* <sup>=</sup> *y*ˆ *<sup>f</sup>*(1), *y*ˆ *<sup>f</sup>*(2), ··· , *y*ˆ *<sup>f</sup>*(*l*) - forecasting results

#### *Parameters:*

*L*—the maximum number of iterations

*n*—the number of grasshoppers

*lbi*,*ubi*—boundaries of the *i-th* variable

*xi*—*i-th* grasshopper's position

*l*—current iteration number

*d*—dimension amount.

*c*max—*c*'s maximum value

*c*min— *c*'s minimum value

*<sup>T</sup>*<sup>ˆ</sup> *<sup>d</sup>*—best solution's *<sup>d</sup>*-th dimension value so far

*dij*—the distance between the *i-th* and the *j-th* grasshopper

*s*—social forces function

$$\mathbf{c} = \mathbf{c}\_{m\alpha} - l \frac{\mathbf{c}\_{m\alpha} - \mathbf{c}\_{m\dot{n}}}{L}$$

$$\mathbf{X}\_{i}^{\;\;d} = \mathbf{c}(\sum\_{j=1 \atop j \neq i}^{n} \mathbf{c} \frac{\mathbf{u}\mathbf{b}\_{d} - \mathbf{l}\mathbf{b}\_{d}}{2} \mathbf{s}(\left|\mathbf{x}\_{j}{}^{\;\;d} - \mathbf{x}\_{i}{}^{\;\!\;d}\right|) \frac{\mathbf{x}\_{j} - \mathbf{x}\_{i}}{\mathbf{d}\_{\;\;\!u}}) + \hat{T}\_{d}\mathbf{c}$$

#### *3.8. SSA-MOGOA Combined Model*

In our study, a new combined model applying a data preprocessing technique, a new parameter determination method, and several individual prediction algorithms, including both linear and nonlinear models, is successfully developed. The main steps are listed below. The flowcharts of the proposed model are depicted in Figure 1.

#### 3.8.1. Stage 1: Data Preprocessing

SSA is a nonparametric spectral estimation method usually used for filtering in the preprocessing stage of time series forecasting [57]. The advantage of SSA is that it always works well in both linear and nonlinear time series. In addition, the processed data will be used in subsequent forecasting. The main steps of SSA are depicted in Figure 1.

#### 3.8.2. Stage 2: Individual Models used for Forecasting

Three nonlinear models, BPNN, ELM, and WNN, and a linear model, ARIMA, are chosen as the individual models that together form the combined model. It is worth mentioning that all 4 models can achieve good prediction results in our electric load forecasting.

#### 3.8.3. Stage 3: Optimization of Weight Parameters of Combined Model

Determining the parameter coefficients of each individual model is very important for construction of the combined model. In past combined models, a simple average coefficient allocation strategy was often used. In our research, we adopted a multiobjective optimization algorithm called MOGOA for the deciding parameters and made the combined model achieve good prediction results in electric load forecasting.

**Figure 1.** Structure of proposed singular spectrum analysis–multiobjective grasshopper optimization algorithm (SSA-MOGOA) combined model.

#### **4. Experiments and Analysis**

In this section, we introduce the electric load time series we selected and the performance metric and testing methods. We also present three experiments aimed at verifying the effectiveness of our forecasting model. The main steps and flowchart of the developed model are described in Figure 1, which includes a data preprocessing technique, application of several individual models, optimization of the combined model's weight coefficients, and forecasting results.

#### *4.1. Datasets*

In this paper, original electric load time series data were collected from two areas in Australia, New South Wales (NSW) and Queensland (QLD), on a half-hourly basis (48 data points per day). Two datasets were collected in New South Wales and Queensland, which were sampled from 13 to 31 July 2011, 19 days in all. The third dataset was sampled from 13 to 31 July 2010 in Queensland. Figure 2 presents a simple map of the study area, some descriptive statistical indicators of the datasets, and three general trends of testing samples. Specifically, in each dataset, the training set included 768 data points and the testing set consisted of 144 data points. There were 48 data points in a single day according to the data, therefore we selected the period T = 48 for the combined model. Statistical indicators including minimum, maximum, mean, and standard deviation are listed in Table 1. From one-step to three-step prediction, forecasting outcomes are all based on the historical data, which means this experimental outcome is not used as input data to forecast the subsequent values in this study, while in artificial intelligence models, based on plenty of experimental results, five historical data points are chosen as input so as to obtain the best forecasting performance in the multistep forecasting mechanism. The detailed data structure is presented in Figure 2.


**Table 1.** Statistical indicators of experimental samples for three sites.

#### *4.2. Performance Metrics*

In our study, to evaluate the predictive power of the proposed model, we needed performance metrics in our time forecasting experiments. Because there is no general standard for evaluating a time forecasting model, we decided to apply three performance metrics: mean absolute error (MAE), root mean square error (RMSE), and mean absolute percent error (MAPE), as presented in Table 2 [58]. Next, we introduce these three performance metrics in detail.

From the definitions of MAE and RMSE, it is obvious that the advantage of these two performance metrics is that they can avoid canceling between positive and negative forecasting errors due to the use of absolute value symbols. They can evaluate the average dimension of the forecasted time series with actual data. MAPE, which is regarded as the most widely used performance metric in time series forecasting, is obtained by calculating the average of absolute error. The advantage of MAPE is that it can reflect the reliability and validity of the time series forecasting method. When observing the values of all three of these metrics, the smaller the value, the more accurate the prediction. Table 2 shows the formulas of the three error metrics. Here Ai means actual values of the time series and Fi means predicted values, and N means sample size.

**Figure 2.** Location of electric load and data structure.


#### *4.3. Testing Method*

In this section, we introduce the Diebold–Mariano (DM) test and the forecasting effectiveness that were applied to statistically test the accuracy of our proposed model in time series forecasting.

#### 4.3.1. Diebold–Mariano Test

Diebold and Mariano [59] developed a test to compare a model's prediction efficiency with that of other models. The main steps of the DM test are as follows:

Since the DM test is essentially a hypothesis test, the first things to introduce are the null hypothesis *H*<sup>0</sup> and alternative hypothesis *H*1:

$$H\_0: E\left[F(e\_t^1)\right] = E\left[F(e\_t^2)\right] \tag{13}$$

$$\mathbb{E}\left[H\_1: E\middle[F(e\_t^1)\right] \neq E\middle[F(e\_t^2)\right] \tag{14}$$

where *e*<sup>1</sup> *<sup>t</sup>* and *<sup>e</sup>*<sup>2</sup> *<sup>t</sup>* are subtracted from actual time series data and the different models' predicted time series values, also called forecasting errors, and *F* is the loss function of *e*<sup>1</sup> *<sup>t</sup>* and *<sup>e</sup>*<sup>2</sup> *t* .

$$\overline{d} = \frac{1}{L} \sum\_{t=1}^{L} \left[ F(e\_t^1) - F(e\_t^2) \right] \tag{15}$$

*d* is obtained by calculating the average of the sum of differences between the two models' loss function, and L is the length of predicted values.

$$DM = \frac{\overline{d}}{\sqrt{2\pi f\_d(0)/L}} \to N(0,1) \tag{16}$$

As shown in the above formula, the test statistic *DM* is convergent in the standard normal distribution *<sup>N</sup>* (0, 1). The null hypothesis will be rejected if <sup>|</sup>*DM*<sup>|</sup> is bigger than *Z*α/2 , where *<sup>z</sup>*α/2 stands for the critical z-value of the standard normal distribution and α denotes the significance level.

#### 4.3.2. Forecasting Effectiveness

Forecasting effectiveness can be calculated by the accuracy of the mean squared deviation, which the DM test cannot do [60]. Forecasting effectiveness is also employed in our study. The principal ideas of forecasting effectiveness are as follows:

*mk* <sup>=</sup> *<sup>n</sup> i*=1 *QiA<sup>k</sup> <sup>i</sup>* is used to calculate the *k*th order forecasting effectiveness unit, where *Ai* means

forecasting accuracy time *<sup>i</sup>*; *Qi* object to *<sup>n</sup> i*=1 *Qi* = 1, *Qi* > 0, called discrete possibility distribution. *Qi* will be defined as *Qi* = 1/*n*, *i* = 1, 2, ... , *n* when there is no prior information of *Qi*. The *k*th order forecasting effectiveness is calculated by *H <sup>m</sup>*1, *<sup>m</sup>*2, ··· , *<sup>m</sup><sup>k</sup>* , where *H* is a continuous function with *k* units. The first-order forecasting effectiveness will be defined as *H m*1 = *m*<sup>1</sup> when *H*(x) = *x* is a continuous constant function. *H m*1, *m*<sup>2</sup> = *m*<sup>1</sup> 1 − *m*<sup>2</sup> − (*m*1) 2 will be called as the second-order forecasting effectiveness when *H*(*x*, *y*) = *x* <sup>1</sup> <sup>−</sup> *y* − *x*<sup>2</sup> is a continuous function with two variables.

#### *4.4. Experiments and Analysis*

In this part, to examine our proposed model's performance in electric load time series forecasting, we did three experiments from corresponding power station sites.

#### 4.4.1. Experiment I: Compare with Other Models Based on SSA

In order to determine the necessity of combining the models, we made this experiment comparing the electric load time series forecasting results of our new model with the four SSA-based models. The experimental results are shown in Table 3. Detailed descriptions are as follows:


models in one-step to three-step forecasting. This is powerful proof that our model is indeed superior. At the same time, we can also determine the necessity of combining models through this experiment by the fact that it really can improve forecast accuracy.

N.B. By comparing the forecasting results of our proposed combined model with other SSA-based models, there were many useful findings from Experiment I. Our model's overall performance in predicting accuracy demonstrates the need to combine models. Moreover, our proposed model greatly improves electric load forecasting accuracy with an average MAPE of 0.52% in all experiments.


**Table 3.** Comparison of proposed model with other SSA-based models.

#### 4.4.2. Experiment II: Comparing Models using Other Data Preprocessing Methods

In order to verify whether singular spectrum analysis (SSA) is the best choice for data processing, we conducted an experiment comparing the electric load time series forecasting results of our proposed model with other data processing method–based models. The experimental results are shown in Table 4. Detailed descriptions are as follows:


N.B. In experiment II, by comparing the forecasting results of our proposed combined model with other models using different data processing methods, there are many useful findings. Our model's overall lead in predicting accuracy demonstrates that SSA is the best choice of data processing method.


**Table 4.** Comparison of forecasting performance of combined model and models using different data preprocessing methods.

#### 4.4.3. Experiment III: Comparing with Classic Models

In Experiment III we took the forecasting results of our proposed model and the artificial intelligence model to compare the BP, WNN, ELM, and ARIMA. In order to make the experiment more complete and persuasive, we also compared it with some classic conventional models. The experimental results are shown in Table 5. Figure 5 shows a comparison of the one- to three-step forecasting performance of Experiment III. Detailed descriptions are as follows:



**Table 5.** Comparison of forecasting performance of combined model and some classic individual models.

N.B. In experiment III, by observing the results including our proposed method, models that we applied in our model, and traditional models we did not use, we found that our model had an overall lead in predicting accuracy. This demonstrates that our proposed combined model will save a lot of energy for power systems and should be applied in actual electric load forecasting practice.

#### **5. Discussion**

This section provides an in-depth discussion of the proposed model, including a proposed optimization algorithm performance test, two proposed model effectiveness tests, and an experiment showing the improvements of the proposed model and a comparative experiment.

#### *5.1. Multiobjective Grasshopper Algorithm Experiments*

Four typical test functions (shown in Table 6) were applied to examine the excellence of the proposed algorithm. We chose multiobjective ant lion optimization (MOALO) and the multiobjective dragonfly algorithm (MODA) to compare with MOGOA to examine its optimization performance. To control variables, we set the maximum iterations and search agents as 100 and the size of archive as 150. We applied inverted generational distance (IGD), which is a metric showing the evaluation degree of multiobjective optimization algorithms. Table 7 shows the test results of IGD, for which we did 60 experiments for every test function [61]. Moreover, Figure 6 shows the obtained Pareto optimal solutions by these three algorithms.


**Table 6.** Test functions of algorithms.

From the experimental results we can see the following:


N.B. The optimization ability of MOGOA is proven to be good through the experiment and discussion above. Therefore, MOGOA can be widely applied to deal with multiobjective optimization problems.


**Table 7.** Results of multiobjective algorithms using inverted generational distance (IGD) on four test functions.

**Figure 6.** Pareto optimal solutions obtained by optimization algorithm for test functions.

#### *5.2. Proposed Model's E*ff*ectiveness*

The Diebold–Mariano test was used to verify the validity of the developed model, which means every model mentioned above was compared to the SSA-MOGOA combined model. The DM test is a kind of hypothetical test. The null hypothesis is that there is no significant difference in the models' forecasting performance. The opposite hypothesis is that there is a significant different in the models' forecasting performance. Table 8 shows average DM test values of all experiments for one- to three-step forecasting.


**Table 8.** DM test of different models.

\* 1% significance level; \*\* 5% significance level.

Table 8 shows that except for the one-step SSA-ELM experiment, the DM value in all the other experiments is big enough to be rejected at the 1% significance level, while the null hypothesis of one-step SSA-ELM is rejected at 5%. Moreover, for the DM test of individual models, the value is 3.3792, which shows that the accuracy of the proposed model is fairly high.

To further evaluate our model, as introduced in Section 3.3, we also applied the forecasting effectiveness method in our testing experiments. Forecasting effectiveness can effectively reflect the accuracy of the forecasting performance of various models, making it easy to comparing their pros and cons. In Table 9, we record the detailed forecasting effectiveness values of all models in one- to three-step forecasting.


**Table 9.** Forecasting effectiveness of different models.

The forecasting effectiveness results in Table 9 show the following results: First, the most obvious is that our proposed combined model has the best prediction performance with the highest forecasting effectiveness values in all forecasting. Second, the prediction performance of other individual models is significantly worse than that of our model and there is still a big gap between them and our proposed model, which is sufficient to reflect the excellence of our model.

#### *5.3. Proposed Combined Model's Improvements*

In order to make the traditional MAPE criteria more clear in comparing the pros and cons of the models, in this paper we propose a new form of MAPE, defined as:

$$P\_{MAPE} = \left| \frac{MAPE\_1 - MAPE\_2}{MAPE\_1} \right| \tag{17}$$

This new MAPE criterion is used to compare the proposed model with the other models in the above experiments, including three data denoising algorithms, seven classic models, and four individual models with singular spectrum analysis. Table 10 shows the experimental results, and some interesting conclusions can be summarized as follows:



**Table 10.** Percentage improvement of the proposed model.

#### *5.4. Combined Strategy*

We selected and applied a simple averaging strategy to calculate the prediction results of all individual models to compare with the results of MOGOA optimization to test the effectiveness of the proposed combination strategy. The results of the comparison between the two methods are shown in Table 11.

From Table 11, we can easily find from the prediction results that the proposed combined model using MOGOA always outperformed the model applying a simple average strategy, no matter which sites and forecasting steps were used in all three error metrics. For instance, in the three-step forecasting of NSW (2011), the MAPE of the proposed model is 0.8648% while the corresponding MAPE is 1.8336%, which shows the excellence of the model's combined strategy.


**Table 11.** Comparison between proposed model and simple average strategy.

#### **6. Conclusions**

As an indispensable part of the economic operation of power systems, electric load prediction has developed a lot in the past few years. Many studies have been developed and have contributed to improving forecasting accuracy. Establishing a model with perfect forecasting performance and strong stability can provide huge economic and social benefits. At the same time, it can help managers to develop blueprints for future power system construction to ensure the reliability and efficiency of the power supply. As a result, developing a new, robust model with high forecasting accuracy means a lot to the whole world. However, classic and individual models do not always produce satisfactory results. A combined model using data preprocessing technology, a combination of four individual models optimized by an intelligence algorithm called the multiobjective grasshopper optimization algorithm, and the multistep forecasting strategy was used for electric load forecasting in our study. Specifically, the technique of singular spectrum analysis, based on decomposition and reconstruction, was employed to get basic features of the time series by removing high-frequency signals. Moreover, the weight coefficients of individual models in the combined model were optimized by the latest advanced optimization algorithms to obtain both high precision and strong stability. With regard to the individual models in the combined model, the ARIMA model was selected to reflect the linearity of the sequence and artificial intelligence models were selected to reflect the nonlinearity. Furthermore, the combined model was employed in multistep forecasting to validate its forecasting performance. The experimental results show that the new combined model performed significantly better than the other benchmark models on the basis of multiple comparisons and analysis. Additionally, by comparing the outcomes of DM and forecasting effectiveness tests, we found that our model performed best among all the models applied in the experiments. The proposed combined model, with its brilliant prediction performance, can yield tremendous economic benefits and lead to a dramatic reduction in the consumption of environmental resources. Apart from that, it is certain that wide application of this model will contribute to the management of power systems, rational electric dispatching, and electric power scheduling. In conclusion, our proposed combined model can improve the performance of electric load time series forecasting and provide a new feasible choice for smart power distribution planning.

**Author Contributions:** Conceptualization, Y.Z. and J.W.; Methodology, J.W.; Software, Y.Z.; Validation, J.W., H.L.; Formal Analysis, Y.Z. and J.W.; Investigation, H.L.; Resources, Y.Z.; Data Curation, H.L.; Writing-Original Draft Preparation, Y.Z.; Writing-Review & Editing, Y.Z. and J.W.; Visualization, Y.Z. and J.W.; Supervision, J.W.; Project Administration, Y.Z.; Funding Acquisition, J.W. and H.L.

**Funding:** This work was funded by the National Natural Science Foundation of China (grant number 71671029).

**Conflicts of Interest:** The authors declare that there is no conflict of interest regarding the publication of this paper.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Phase Space Reconstruction Algorithm and Deep Learning-Based Very Short-Term Bus Load Forecasting**

### **Tian Shi 1, Fei Mei 2, Jixiang Lu 3, Jinjun Lu 3, Yi Pan 1, Cheng Zhou 1, Jianzhang Wu <sup>1</sup> and Jianyong Zheng 1,\***


Received: 20 October 2019; Accepted: 13 November 2019; Published: 15 November 2019

**Abstract:** With the refinement and intelligence of power system optimal dispatching, the widespread adoption of advanced grid applications that consider the safety and economy of power systems, and the massive access of distributed energy resources, the requirement for bus load prediction accuracy is continuously increasing. Aiming at the volatility brought about by the large-scale access of new energy sources, the adaptability to different forecasting horizons and the time series characteristics of the load, this paper proposes a phase space reconstruction (PSR) and deep belief network (DBN)-based very short-term bus load prediction model. Cross-validation is also employed to optimize the structure of the DBN. The proposed PSR-DBN very short-term bus load forecasting model is verified by applying the real measured load data of a substation. The results prove that, when compared to other alternative models, the PSR-DBN model has higher prediction accuracy and better adaptability for different forecasting horizons in the case of high distributed power penetration and large fluctuation of bus load.

**Keywords:** Load forecasting; VSTLF; bus load forecasting; DBN; PSR; deep learning

#### **1. Introduction**

Electricity cannot be stored in large quantities, and the investment recovery cycle of large-scale energy storage equipment is long. Therefore, in order to ensure the safe operation of power systems and power quality on the user side, the operators must have knowledge of future power loads [1]. Power system load forecasting is an important method to understand the trend of future electric load. In addition, power load forecasting is of great significance for the planning of power systems and scheduling of generation and transmission maintenance. Power system load forecasting is generally divided into long-term forecasting, medium-term forecasting, short-term forecasting, and very short-term forecasting [2]. Among them, short-term load forecasting (STLF) and very short-term load forecasting (VSTLF) are of great significance for economic dispatch, optimal power flow, and electricity market trading. The higher the accuracy of load forecasting is, the more beneficial it is to improve the utilization rate of power generation equipment and the effectiveness of economic dispatch, and reduce the operation cost of smart grid.

In the past decades, experts and scholars have made systematic and effective research on traditional deterministic and probabilistic STLF and VSTLF. Deterministic forecasting methods can be divided into two main categories [3]: The first category uses statistical forecasting models, such as linear regression [4], curve extrapolation [5], Autoregressive Integrated Moving Average (ARIMA) model [6,7], and other time series methods; the second category uses artificial intelligent forecasting models, such as Bayesian estimation [8], Random Forests [9], Support Vector Regression (SVR) [10,11], Artificial Neural Network (ANN) [12,13], Deep Belief Network (DBN) [14,15], and Long Short Term Memory (LSTM) Network [16,17]. These methods have achieved high forecasting accuracy and good robustness in day-ahead and hour-ahead load forecasting. However, most of the studies are focused on system-level load forecasting and there are relatively few on bus load forecasting. Generally, bus loads can refer to loads supplied by transmission and distribution systems transformers [18]. Bus load forecasting can be conducive to the optimal scheduling of decentralized generation, network congestion studies, and others [19].

With the refinement and intelligence of power grid optimal dispatching and the wide application of advanced smart grid applications, which take into account the security and economy of power system, the demand for bus load forecasting accuracy is increasing. Since bus load base is much smaller than that of a system, the uncertainty of bus load and the multi-dimensional nonlinearity [20] are more obvious. The traditional method of distributing the predicted value of system through bus load ratio often fails to achieve satisfactory results [21]. In this regard, the literature [18] modifies the ANN model for the aggregated load of the interconnected system and proposes two novel hybrid forecasting models, which can capture and successfully treat the special characteristics of bus load patterns. In Ref. [19], a bus load forecasting model based on clustering and ANN is proposed; the day-ahead load forecasting and hour-ahead load forecasting are carried out, achieving high prediction accuracy.

With a large number of distributed power access, the uncertainty and nonlinear characteristics of system [22] and bus load are further enhanced. As the collection time interval of distributed photovoltaic power, plant data is generally several minutes. The short-term and day-ahead load forecasting in hours of bus cannot make full use of historical information and have low prediction accuracy. In order to ensure the reliable operation of power system real-time security analysis and economic dispatch, more detailed VSTLF is needed. The authors of [23] proposed a chaotic-radial basis function (RBF) photovoltaic power generation prediction model and verified its prediction accuracy under different weather conditions. However, the author only validated the prediction accuracy of the model in the case of single-step prediction and did not involve the forecasting horizon problem of the model [24]. Ref. [20] proposes a novel load forecasting model based on phase space reconstruction (PSR) algorithm and bi-square kernel (BSK) regression, and achieves high prediction accuracy on different data sets. However, after phase space reconstruction, different BSK models are used to independently predict the various dimensional data, which neglect the time series characteristics of the load.

In view of the shortcomings of the above forecasting model, considering the adaptability to different forecasting horizons, the time series characteristics of load, and the volatility brought by large-scale access of new energy sources, this paper proposes a novel very short-term bus load forecasting model based on phase space reconstruction and deep belief network (PSR-DBN). Because the amount of historical data in VSTLF is relatively large and closely related to future load trend, the impact of weather, electricity price, and other factors on VSTLF is not considered in this paper. Firstly, the proposed PSR-DBN model performs phase space reconstruction on bus load history data, and projects the historical data to the motion track of a moving point in the phase space. Then the model takes advantage of the excellent nonlinear fitting ability of the deep belief network to fit the moving point trajectory and provide a prediction of the trajectory. Finally, the predicted value of the load is obtained. At the same time, the structure of the DBN is optimized by cross-validation. In order to test the validity and superiority of the proposed PSR-DBN very short-term bus load forecasting model, this paper applies the measured load data of a substation in China to verify the forecasting effectiveness of the model under different forecasting horizons (5 min–1 h). In addition, other six alternative forecasting models are employed to further compare with the proposed PSR-DBN model.

The major contributions of this paper are as follows:


The rest of this paper is organized as follows. In Section 2, the relevant theory of the PSR-DBN model is introduced. Section 3 provides the principal steps of the PSR-DBN model and covers the tuning method of network hyperparameter based on cross-validation. Section 4 presents the evaluation criteria of forecasting accuracy, case study settings, forecasting results, and a comparison. Section 5 gives the conclusion of the paper and an outlook for future research.

#### **2. Methodology**

#### *2.1. Phase Space Reconstruction (PSR)*

Phase space reconstruction (PSR) is an efficient method for analyzing nonlinear time series. The basic idea of phase space reconstruction is to regard the time series as a component generated by a nonlinear dynamic system. The variation law of the component can reconstruct the equivalent high-dimensional phase space of the dynamic system, and the time series can be projected into a moving point trajectory in the high-dimensional phase space. If there is a one-dimension time series *<sup>x</sup>* <sup>=</sup> {*x*1, *<sup>x</sup>*<sup>2</sup> ··· *xN*}, the embedding dimension is *<sup>m</sup>*, and the delay time is *<sup>t</sup>*, then the set of time series reconstructed by phase space can be expressed as:

$$\begin{bmatrix} \mathbf{X}\_1 \\ \mathbf{X}\_2 \\ \vdots \\ \mathbf{X}\_M \end{bmatrix} = \begin{bmatrix} \mathbf{x}\_1 & \mathbf{x}\_{1+t} & \cdots & \mathbf{x}\_{1+(m-1)t} \\ \mathbf{x}\_2 & \mathbf{x}\_{2+t} & \cdots & \mathbf{x}\_{2+(m-1)t} \\ \vdots & \vdots & \cdots & \vdots \\ \mathbf{x}\_M & \mathbf{x}\_{M+t} & \cdots & \mathbf{x}\_N \end{bmatrix} \tag{1}$$

where *M* = *N* − (*m* − 1)*t*.

The key to PSR is to determine the optimal embedding dimension *mopt* and optimal delay *topt*. In this paper, the C-C method [25] is employed to determine the optimal embedding dimension *mopt* and delay *topt* at the same time.

Based on Equation (1), the associated integral is defined as:

$$\mathbb{C}(m, \mathbf{N}, r\_k, t) = \frac{2}{M(M-1)} \sum\_{1 \le i < j \le M} \theta(r\_k - \|\mathbf{x}\_i - \mathbf{x}\_j\|) \tag{2}$$

where θ(*x*) = 0 *x* < 0 <sup>1</sup> *<sup>x</sup>* <sup>≥</sup> <sup>0</sup> .

According to BDS (Brock-Dechert-Scheinkman) statistical conclusions [26,27], when *N* > 3000, the range of values of *m* and *rk* can be obtained, *m* ∈ {2, 3, 4, 5}, *rk* = *k* × 0.5σ, where σ is the standard deviation of the time series and *k* ∈ {1, 2, 3, 4}.

Based on matrix partitioning average strategy, the test statistics *S* is defined as:

$$S(m, N, r\_k, t) = \frac{1}{t} \sum\_{i=1}^{t} \mathbb{C}\_i(m, \frac{N}{t}, r\_k, t) - \mathbb{C}\_i^m(m, \frac{N}{t}, r\_k, t) \tag{3}$$

For *N* → ∞, Equation (3) can be deformed to:

$$S(m\_\prime r\_{k'} t) = \frac{1}{t} \sum\_{i=1}^{t} \mathbb{C}\_i(m\_\prime r\_{k'} t) - \mathbb{C}\_i^m(m\_\prime r\_{k'} t) \tag{4}$$

For the fixed *m* and *t*, *S*(*m*,*rk*, *t*) will equal for 0 for all r, if the data are iid and *N* → ∞. However, the real data set is not infinite, and there may be a correlation between data. Thus, the optimal delay time may be either the zero crossing of *S*(*m*,*rk*, *t*) or the times at which *S*(*m*,*rk*, *t*) shows the least variation with *r* [25].

To represent the variation of *S*(*m*,*rk*, *t*) with r, the test statistics Δ*S* is defined as:

$$\Delta S(m, t) = \max \left[ S(m, r\_{k\_1 \prime}, t) \right] - \min \left[ S(m, r\_{k\_2 \prime}, t) \right] \tag{5}$$

where *k*<sup>1</sup> ∈ {1, 2, 3, 4}, *k*<sup>2</sup> ∈ {1, 2, 3, 4}.

The means of *S* and Δ*S* are defined as *S* and Δ*S*, and the equations are shown as:

$$\begin{cases} \overline{S}(t) = \frac{1}{4 \times 4} \sum\_{m=2}^{5} \sum\_{k=1}^{4} S(m, r\_k, t) \\ \overline{\Delta S}(t) = \frac{1}{4} \sum\_{m=2}^{5} \Delta S(m, t) \end{cases} \tag{6}$$

For all values of*t*, *S*(*t*) and Δ*S*(*t*) can find corresponding values. Wherein, the *t* value corresponding to the first zero point of *S*(*t*) or the first minimum point of Δ*S*(*t*) is rounded to be the optimal delay *topt*.

The test statistic *Scor* is defined as:

$$S\_{\rm cor}(t) = \overline{\Delta S}(t) + \left| \overline{S}(t) \right| \tag{7}$$

where the *t* value corresponding to the global minimum point of *Scor*(*t*) is the optimal embedded window *t*ω.

When the optimal delay *topt* is determined by Equation (6) and the optimal embedded window *t*ω is determined by Equation (7), the optimal embedding dimension *mopt* can be determined by rounding the value of Equation (8).

$$t\_{\omega} = (m\_{\text{opt}} - 1)t\_{\text{opt}} \tag{8}$$

#### *2.2. Deep Belief Network (DBN)*

The Deep Belief Network (DBN) is a deep learning model proposed by Geoffrey Hinton [28] in 2006 and is a stack of multiple Restricted Boltzmann Machines (RBM). Compared with the Artificial Neural Network (ANN), DBN employs pre-training technology combined with Backpropagation (BP) algorithm to solve network parameters. Therefore, it is not easy for DBN to fall into a local optimal solution and has higher convergence accuracy. Furthermore, when the number of layers and the number of neurons in each layer are large, DBN also has a fast convergence speed, which makes it more suitable for the fitting problem of complex nonlinear time series [29].

#### 2.2.1. Restricted Boltzmann Machine (RBM)

The RBM consists of a visible layer V and a hidden layer H. As shown in Figure 1, the visible layer consists of *nv* neurons and the hidden layer consists of *mh* neurons, each of which takes a value of 0 or 1 and obeys the Bernoulli distribution, i.e., *vi* ∈ {0, 1}(*i* = 1, 2 ··· *n*), *hj* ∈ {0, 1}(*j* = 1, 2 ··· *m*). There is no connection between the neurons in each layer, and the neurons between the layers are connected by weights ω.

**Figure 1.** Structure of Restricted Boltzmann Machine (RBM).

RBM is a kind of probabilistic unsupervised learning. Its network parameters are composed of visible layer bias **b**, weight matrix ω, and hidden layer bias **c**, and the optimal value of network parameters is determined by the minimum energy function. The energy function is defined as:

$$E(\mathbf{v}, \mathbf{h} | \boldsymbol{\Theta}) = -\sum\_{i=1}^{n} b\_i v\_i - \sum\_{j=1}^{m} c\_j h\_j - \sum\_{i=1}^{n} \sum\_{j=1}^{m} v\_i \alpha\_{ij} h\_j \tag{9}$$

where ω*ij* is the connection weight of the *i*-th visible layer neuron and the *j*-th hidden layer neuron, and θ = {**b**, ω, **c**}.

Based on Equation (9), the joint probability distribution of the visible neuron state and the hidden neuron state is shown as:

$$P(\mathbf{v}, \mathbf{h} | \boldsymbol{\theta}) = \frac{e^{-E(\mathbf{v}, \mathbf{h} | \boldsymbol{\theta})}}{Z} \tag{10}$$

where normalization factor *Z* = **v**,**h** *e*−*E*(**v**,**h**|θ), which represents the sum of the energy function negative exponents under all possible values of visible layer neuron state variable **v** and hidden layer neuron state variable **h.**

The probability distribution *P*(**v**) of **v** can be derived from Equation (10) as:

$$P(\mathbf{v}|\boldsymbol{\Theta}) = \sum\_{\mathbf{h}} \frac{e^{-E(\mathbf{v}, \mathbf{h}|\boldsymbol{\Theta})}}{Z} \tag{11}$$

Thus, the objective function of RBM training can be expressed as a likelihood function of the probability distribution of visible layer state variable **v** on the training set, and the likelihood function can be derived from the Equation (11) as:

$$L(\boldsymbol{\Theta}) = \sum\_{\mathbf{v} \in \mathcal{T}} \log P(\mathbf{v} | \boldsymbol{\Theta}) \tag{12}$$

where *T* represents the set of sample inputs on the training set, and when the objective function takes the maximum value, the energy function is the minimum.

According to the network structure in Figure 1, the activation probability of *vi* in a given hidden layer neuron state **h** and the activation probability of *hj* in a given visible layer neuron state **v** can be derived as:

$$P(v\_i = 1 | \mathbf{h}\_i) = \sigma \left( b\_i + \sum\_{j=1}^{m} \omega\_{ij} h\_j \right) \tag{13}$$

$$P(h\_j = 1 | \mathbf{v}) = \sigma \left( c\_i + \sum\_{i=1}^n a\_{ij} v\_i \right) \tag{14}$$

where <sup>σ</sup> represents the sigmoid function, <sup>σ</sup>(*x*) <sup>=</sup> <sup>1</sup> <sup>1</sup>+*e*−*<sup>x</sup>* .

Since the gradient cannot be directly obtained when using stochastic gradient ascent algorithm to seek the maximum of Equation (12), the training of RBM usually applies Contrastive Divergence (CD) algorithm to approximate the gradient of likelihood function [30]. The specific steps of RBM training are as follows:

Step 1: Substitute the input of the training set as **v**<sup>1</sup> in Equation (14) to obtain *P*(**h**<sup>1</sup> = 1|**v**<sup>1</sup> ), then employ random sampling to acquire the reconstructed value of **h**1.

Step 2: Substitute **h**<sup>1</sup> in Equation (13) to obtain *P*(**v**<sup>2</sup> = 1|**h**<sup>1</sup> ) and then employ random sampling to acquire the reconstructed value of **v**2.

Step 3: Substitute **v**<sup>2</sup> in Equation (14) to obtain *P*(**h**<sup>2</sup> = 1|**v**<sup>2</sup> ).

Step 4: Update network parameters. The iteration algorithm of network parameters is as follows:

$$\begin{cases} \mathbf{w}^{(k+1)} = \mathbf{w}^{(k)} + \varepsilon \Big( \mathbf{h}\_1 \mathbf{v}\_1^T - P(\mathbf{h}\_2 = 1 | \mathbf{v}\_2) \mathbf{v}\_2^T \Big) \\\ \mathbf{b}^{(k+1)} = \mathbf{b}^{(k)} + \varepsilon (\mathbf{v}\_1 - \mathbf{v}\_2) \\\ \mathbf{c}^{(k+1)} = \mathbf{c}^{(k)} + \varepsilon (\mathbf{h}\_1 - P(\mathbf{h}\_2 = 1 | \mathbf{v}\_2)) \end{cases} \tag{15}$$

where ε is the learning rate, which takes the value of 0.8 in this paper, and the superscript *k* represents the *k-*th iteration.

#### 2.2.2. DBN based on Levenberg-Marquardt backpropagation (LMBP) Algorithm

Traditional DBN is formed by stacking multiple RBMs, in which the hidden layer of the previous RBM is used as the visible layer of the next RBM. CD algorithm is used to determine the network parameters layer by layer during pre-training, which is unsupervised learning. Then the pre-trained network parameters are assigned to the neural network as the initial training value of network parameters. The network parameters are fine-tuned by using the sample labels in the training set combined with BP algorithm, which is supervised learning. The structure of a traditional DBN is shown in Figure 2.

**Figure 2.** Structure of the traditional Deep Belief Network (DBN).

In this paper, the LM (Levenberg-Marquardt) BP algorithm [31] is used to replace the traditional BP algorithm to fine-tune the DBN. Compared with the traditional BP algorithm, the LMBP algorithm has faster convergence speed and higher convergence reliability, and is more suitable for training neural networks with many hidden layers and neurons.

Different from the traditional BP algorithm, the LMBP algorithm is based on the Gauss-Newton method in the least square solution. The square of error **v** is taken as the objective function and the second-order Taylor expansion of objective function is derived. After approximating the gradient of the objective function (ignoring the high-order term), the correction of weight ω is:

$$
\Delta\boldsymbol{\omega} = -\left[\mathbf{J}^T(\boldsymbol{\omega})\mathbf{J}(\boldsymbol{\omega}) + \boldsymbol{\mu}\mathbf{I}\right]^{-1}\mathbf{J}^T(\boldsymbol{\omega})\mathbf{v}(\boldsymbol{\omega})\tag{16}
$$

where μ is the correction coefficient, which is set to prevent **J***T*(ω)**J**(ω) from being irreversible; **I** is the identity matrix; **J**(ω) is the Jacobian matrix of **v**(ω), which can be written as:

$$\mathbf{J}(\boldsymbol{\omega}) = \begin{bmatrix} \frac{\partial v\_1}{\partial \omega\_1} & \frac{\partial v\_1}{\partial \omega\_2} & \cdots & \frac{\partial v\_1}{\partial \omega\_4} \\ \frac{\partial v\_2}{\partial \omega\_1} & \frac{\partial v\_2}{\partial \omega\_2} & \cdots & \frac{\partial v\_2}{\partial \omega\_4} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial v\_t}{\partial \omega\_1} & \frac{\partial v\_t}{\partial \omega\_2} & \cdots & \frac{\partial v\_t}{\partial \omega\_t} \end{bmatrix} \tag{17}$$

where <sup>∂</sup>*vi* ∂ω*<sup>j</sup>* represents the partial derivative of *vi* to <sup>ω</sup>*j*.

Similar to BP algorithm, the modifier expression of weight ω(*k*+1) in the *k-*th iteration is:

$$
\omega^{(k+1)} = \mathfrak{w}^{(k)} + \Lambda \mathfrak{w}^{(k)} \tag{18}
$$

μ needs to be adjusted in each iteration to obtain a better convergence effect. When μ is small, the algorithm is standard Gauss-Newton method, which has higher convergence accuracy. However, if the difference between the objective function and the approximate quadratic function is too large in the iteration, the convergence effect will be poor. When μ is large, the algorithm becomes traditional BP algorithm. When the Gauss-Newton method has a poor convergence performance, the gradient descent BP algorithm can be used as an auxiliary solution.

#### **3. PSR-DBN Forecasting Model**

#### *3.1. The Procedure of the PSR-DBN Model*

For a list of bus load historical data time series *p* = " *p*1, *p*<sup>2</sup> ··· *pN* # , the prediction process of the PSR-DBN forecasting model is as follows:

Step 1 normalization: The load time series is normalized to prepare for the training of deep belief network, and the maximum and minimum values of data are saved for subsequent denormalization of the load predicted value to restore real value.

Step 2 PSR: The C-C method is adopted to process the load time series to find the optimal embedding dimension *m* and the optimal delay *t* of time series. Then the load time series is reconstructed according to the obtained embedding dimension *m* and delay *t*. The reconstructed load time series is as follows:

$$\begin{bmatrix} \mathbf{P}\_1 \\ \mathbf{P}\_2 \\ \vdots \\ \mathbf{P}\_M \end{bmatrix} = \begin{bmatrix} p\_1 & p\_{1+t} & \cdots & p\_{1+(m-1)t} \\ p\_2 & p\_{2+t} & \cdots & p\_{2+(m-1)t} \\ \vdots & \vdots & \cdots & \vdots \\ p\_M & p\_{M+t} & \cdots & p\_N \end{bmatrix} \tag{19}$$

where *M* = *N* − (*m* − 1)*t*.

Step 3 DBN: The DBN is constructed by using the phase space matrix in step 2 as the training set and employing cross-validation to optimize the hyperparameters of the network. The details of hyperparameters tuning are introduced in Section 3.2. Finally, the trained DBN is adopted to predict the load value of the future moment.

Step 4 denormalization: The load prediction value returned by the deep belief network in step 3 is denormalized by applying the maximum and minimum values saved in step 1, then the actual load forecasting value is obtained.

The flow chart corresponding to the above steps is shown in Figure 3:

**Figure 3.** Flowchart of the PSR-DBN forecasting model.

#### *3.2. Determination of DBN Network Structure*

Similar to the neural network, DBN also has many hyperparameters to be set. The rationality of hyperparameters adjustment determines the prediction accuracy of the prediction model. Unreasonable hyperparameters will lead to a significant increase in prediction error. The determination of the network structure is an important part of DBN hyperparameters adjustment.

For the input layer of the DBN, since the original load data is input to DBN after passing through PSR, the number of the input layer neurons of DBN does not need to be tuned and can be directly set to embedding dimensions *m*, that is, one line of elements in Equation (19) is input each time. For the DBN output layer, when the DBN input is a row of elements in Equation (19), it is equivalent to inputting the position vector of the moving point in phase space at a certain moment. Then it needs to output the predicted value of the moving point position vector at the next moment.

However, if the input of the model is **p***<sup>i</sup>* (1 ≤ *i* ≤ *M*) in the phase space matrix of Equation (19), only *pi*<sup>+</sup>1+(*m*−1)*<sup>t</sup>* in the position vector **p***i*+<sup>1</sup> at the next moment is unknown, so the output layer only needs to output the predicted value *<sup>p</sup>*ˆ*i*+1+(*m*−1)*<sup>t</sup>* of the load at time *<sup>i</sup>* + 1 + (*<sup>m</sup>* <sup>−</sup> <sup>1</sup>)*t*. If *i*+1 is greater than *M*, then the phase space matrix of Equation (19) needs to be extended downwards, **p***i*+<sup>1</sup> is added as a new row, and **p***i*+<sup>1</sup> is taken as the input of the model to obtain the predicted value of *pi*<sup>+</sup>2+(*m*−1)*t*; Then the matrix is augmented and the predicted value of *pi*<sup>+</sup>3+(*m*−1)*<sup>t</sup>* is obtained, and so on until the end of the forecasting. The expression of **p***i*+<sup>1</sup> in the augmented matrix is:

$$\mathbf{p}\_{i+1} = \begin{bmatrix} p\_{i+1} & p\_{i+1+t} & \cdots & p\_{i+1+(m-1)t} \end{bmatrix} \tag{20}$$

where the value of *pi*<sup>+</sup>1+(*m*−1)*<sup>t</sup>* is determined by the forecasting horizons. If it is one-step forecasting, *pi*<sup>+</sup>1+(*m*−1)*<sup>t</sup>* takes the real value of load measured at time *<sup>i</sup>* + <sup>1</sup> + (*<sup>m</sup>* <sup>−</sup> <sup>1</sup>)*t*. If it is multi-step forecasting

and the number of forecasting steps is k, *pi*<sup>+</sup>1+(*m*−1)*<sup>t</sup>* = *p*ˆ*i*+1+(*m*−1)*<sup>t</sup>* is taken until the *<sup>k</sup>*-th step forecasting. and then the predicted value in the augmented matrix is replaced by the real value of the measured load.

For the hidden layer of DBN, the number of hidden layers and neurons in each hidden layer has a significant influence on the prediction result, and it is found that the effect of optimizing the number of hidden layers is usually more obvious. Too few hidden layers will cause the under-fitting to affect the forecasting performance, and too many hidden layers will lead to over-fitting and will make forecasting performance worse. Therefore, in order to improve the prediction accuracy of the PSR-DBN model, cross-validation is used to optimize the number of hidden layers and neurons in each hidden layer. The flow is shown in the blue dotted line in Figure 3. The specific steps are as follows:


In summary, the structure, input, and output of the proposed DBN are shown in Figure 4, in which the number of the hidden layers and the number of neurons in each hidden layer are obtained by cross-validation.

**Figure 4.** The input, output, and structure of the network.

#### **4. Case Study**

#### *4.1. Bus Load Data*

In order to test the validity of the model, the load data of a 220-kV substation bus in a city of China from 1 May to 18 May 2017 are used in this paper. The bus is connected with a distributed photovoltaic power station with an installed capacity of about 50 MW, and the sampling time interval of load data is 5 min. During the period, the substation had no overhaul or fault shutdown. The reliability of historical data is high, and 3σ criteria is used to detect that there is no abnormal data.

In this paper, the data of 1–14 May are selected as the training set of the PSR-DBN forecasting model, and cross-validation is used to adjust the model hyperparameters. The data of 15–18 May are selected as the forecasting test set. If only one-step forecasting of the load in the next 5 min is used as given in Ref. [22], the prediction horizons are too short to meet the real-time safety analysis and economic dispatching requirements of the power system. However, short-term load forecasting on an hourly scale combined with the interpolation has poor forecasting accuracy. Therefore, the forecasting horizons of very short-term load forecasting in this paper are from 5 min to 1 h, and the proposed model is validated in the MATLAB (R2018a, MathWorks Inc., Massachusetts, USA) environment.

#### *4.2. Forecasting Evaluation Index*

In order to more intuitively and accurately evaluate the forecasting performance of the model and the accuracy of prediction, this paper adopts Mean Absolute Percentage Error (MAPE), Mean Absolute Scaled Error (MASE), Symmetric Mean Absolute Error (sMAPE), Geometric Mean Absolute Error (GMAE), and Root Mean Square Error (RMSE) [33] as evaluation indicators.

$$\mathbf{MAPE} = \frac{1}{n} \sum\_{i=1}^{n} \left| \frac{p\_i - \hat{p}\_i}{p\_i} \right| \times 100\% \tag{21}$$

$$\text{RMSE} = \frac{1}{n} \sum\_{i=1}^{n} \left( \frac{|p\_i - p\_i|}{\frac{1}{n-1} \sum\_{j=2}^{n} |p\_j - p\_{j-1}|} \right) \tag{22}$$

$$\text{sMASE} = \frac{100\%}{n} \sum\_{i=1}^{n} \frac{|p\_i - \rho\_i|}{0.5(|p\_i| + |\rho\_i|)}\tag{23}$$

$$\mathbf{GMAE} = \sqrt[n]{\prod\_{i=1}^{n} |p\_i - p\_i|} \tag{24}$$

$$\text{RMSE} = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} \left(p\_i - \rho\_i\right)^2} \tag{25}$$

where *n* represents the number of predicted samples, *pi* represents the real value of the load at time *i*, and *p*ˆ*<sup>i</sup>* represents the predicted value of the load at time *i*.

The smaller the values of each metric, the higher the prediction accuracy of the model. However, these indicators are relative values and need to be compared under the same data scale to be meaningful.

#### *4.3. PSR Reconstruction Results*

Based on the theoretical analysis of PSR in Section 2.1, the C-C method is employed to reconstruct the phase space of the bus load data from 1 May to 14 May. The corresponding statistics of Δ*S*(*t*) and *Scor*(*t*) are shown in Figure 5. It can be seen that the first minimum point of Δ*S*(*t*) is *t* = 18, while *Scor*(*t*) cannot get the optimal embedding window *t*ω without an obvious minimum point. However, from the

BDS statistics, when *N* > 3000, *m* ∈ {2, 3, 4, 5}, so the maximum value of *m* can only be 5. According to Equation (8), the final optimal embedding dimension *mopt* = 5 and the optimal delay *topt* = 18.

**Figure 5.** Curves of Δ*S*(*t*) and *Scor*(*t*).

#### *4.4. DBN Hyperparameter Setting*

In Section 4.3, the optimal embedding dimension obtained by PSR is *mopt* = 5. The structure of DBN is determined by the method described in Section 3.2. The number of neurons in the input layer and output layer of DBN is 5 and 1, respectively.

To determine the hidden layer hyperparameter, 13–14 May of the training set is eliminated and used as a verification set; the rest are reserved as the training set. The number of hidden layer neurons of DBN is fixed to 10, the number of layers is increased one by one, and the MAPE of the prediction result on the verification set is used as the criterion. The forecasting horizon is 1 h, and the MAPE of different hidden layers models is shown in Table 1.

**Table 1.** Mean Absolute Percentage Error (MAPE) of models with various number of hidden layers.


In Table 1, when the number of hidden layers equals 8, the MAPE is significantly increased, and it can be inferred that over-fitting occurs at this time, so the continuation of increasing the number of layers is stopped. When the number of hidden layers is 4, MAPE is the smallest and is 0.9358. Therefore, the optimal number of hidden layers equals 4.

Based on the four hidden layers, the optimal number of neurons was roughly searched by a fixed step size, and the step size was set to 5 neurons. The MAPE of prediction result on the verification set is used as the criterion, and the forecasting horizon is 1 h. The sample space of the search is 54 = 625. When the MAPE is the smallest, the number of neurons in each hidden layer is [25, 15, 20, 15]. At this time, the MAPE of predicted value on the verification set is 0.8447, which is obviously better than the MAPE = 0.9358 of 4 hidden layers and the number of neurons in each layer is 10. Therefore, the number of neurons in each hidden layer is [25, 15, 20, 15].

In summary, the hyperparameter setting of DBN in this paper is shown in Table 2.


**Table 2.** Hyperparameter setting of DBN.

#### *4.5. Forecasting Result*

#### 4.5.1. One Hour Ahead Load Forecasting

In order to verify the forecasting performance of the proposed method, the ARIMA model, NN model, PSR-NN model, LSTM model, PSR-DBN model (without tuning), and DBN model are adopted to predict the load of 15–18 May after training with 1–14 May as historical data. The forecasting horizon is 1 h, and the corresponding evaluation indicators of each model are calculated. The ARIMA model employs the Akaike Information Criterion (AIC) to determine the optimal autoregressive model (AR model) order *p* and moving average model (MA model) order *q*. The hyperparameter of the NN model, LSTM model, and DBN model are also optimized by cross-validation. The PSR-DBN model (without tuning) has four hidden layers and 10 neurons per layer. The curves of the load predicted value and the load measured value corresponding to different models on 15–18 May are shown in Figure 6.

**Figure 6.** Load forecasting curve one hour ahead of 15–18 May.

The solid black line in Figure 6 is the measured value. It can be inferred that the active output of the photovoltaic power station has large fluctuations, and the bus load curve is severely distorted by the general saddle type, showing irregular fluctuations. If short-term load forecasting is used at this time, it will cause a large error and waste a lot of information. The remaining curves are the forecasting values of the ARIMA model, NN model, PSR-NN model, LSTM model, PSR-DBN model (no tuning), and DBN model. It can be clearly seen that the predicted curves of the ARIMA model with the golden dashed line and the NN model with the blue dashed line deviate significantly from the black measured values, while the forecasting performance of other models cannot be directly judged by curves. In order to more intuitively see the prediction accuracy of each model, the forecasting evaluation indicators and training time of each model are calculated as shown in Table 3.


**Table 3.** Forecasting performance of each model.

In Table 3, it is seen that the training time of all models meet the requirements of VSTLF. The PSR-DBN model proposed in this paper has the smallest indicators, except for the second smallest RMSE in all seven models. The RMSE of the PSR-DBN model proposed in this paper is only 0.0006 more than the minimum RMSE from the PSR-NN model. The 'Inf' of GAME indicates that the product of local error exceeds the upper limit of double type. It can be seen that the prediction accuracy of the DBN pre-trained by the CD method is better than that of the ordinary NN. The forecasting evaluation indicators of the PSR-DBN model and PSR-NN model, which adds PSR link to reconstruct original data, are also significantly less compared with the DBN model and the NN model. Compared with the PSR-DBN model without tuning, the tuned PSR-DBN model also has less evaluation indicators. Therefore, it can be inferred that the proposed method has higher prediction accuracy and better forecasting performance in the one-hour very short-term prediction of bus load with high distributed energy permeability and large fluctuation.

Figure 7 is a bar graph of the relative error of the predicted values of each model on 17 May. It can be seen from Figure 7 that the prediction errors of the ARIMA model and NN model are larger than those of the other five models, and the time with large prediction errors is concentrated in the noon period. At this time, the power output of the photovoltaic power station is large and vulnerable to clouds and other weather factors, resulting in great fluctuations of output and difficulties in prediction. Although the prediction accuracy of the noon period is not improved after adding the PSR algorithm, the reconstruction of the data reduces the influence of the fluctuation of the historical load data at noontime on the load forecasting of other time periods, thus effectively reducing the relative error of other time periods and improving the prediction accuracy of the model.

**Figure 7.** Relative error of load forecasting on 17 May. (**a**) PSR-DBN; (**b**) ARIMA; (**c**)PSR-NN; (**d**) NN; (**e**) DBN; (**f**) LSTM; (**g**) PSR-DBN (no tuning).

#### 4.5.2. Prediction of Different Forecasting Horizons (5 min to 1 h ahead)

In order to verify the adaptability of the proposed PSR-DBN model to different forecasting horizons, the DBN and NN models are still employed in this paper, and the load of 15–18 May is forecasted by using historical data of 1–14 May. The forecasting horizons are 5 min to 1 h, and the corresponding MAPE is calculated, respectively. The MAPE curves of different models vary with forecasting horizons (shown in Figure 8).

**Figure 8.** The curve of MAPE varies with forecasting horizons.

As can be seen in Figure 8, MAPE generally increases with increase of the forecasting horizon. The model proposed in this paper has higher prediction accuracy in most of the forecasting horizon, while the forecasting performance of the NN model is not ideal. When the forecasting horizon is 5 min to half an hour, the DBN model and PSR-DBN model do not have much difference in prediction accuracy. However, when the forecasting horizon is further increased from half an hour to one hour, the advantage of reconstructing data by PSR begins to appear. At this time, the MAPE of the PSR-DBN model is obviously less than that of the pure DBN model. Therefore, the model proposed in this paper can also have a small prediction error within the forecasting horizon of 5 min to 1 h and better adaptability in different forecasting horizons.

As can be seen in Figure 8, MAPE generally increases with increase of the forecasting horizon. The model proposed in this paper has higher prediction accuracy in most of the forecasting horizon, while the forecasting performance of the NN model is not ideal. When the forecasting horizon is 5 min to half an hour, the DBN model and PSR-DBN model do not have much difference in prediction accuracy. However, when the forecasting horizon is further increased from half an hour to one hour, the advantage of reconstructing data by PSR begins to appear. At this time, the MAPE of the PSR-DBN model is obviously less than that of the pure DBN model. Therefore, the model proposed in this paper can also have a small prediction error within the forecasting horizon of 5 min to 1 h and better adaptability in different forecasting horizons.

#### **5. Conclusions**

In this paper, aiming at the adaptability of forecasting horizons, the time series characteristics of the load, and the fluctuation caused by large amounts of distributed power access in bus load forecasting, a very short-term bus load forecasting model based on phase space reconstruction and deep belief network is proposed. The time series is projected by phase space reconstruction as the trajectory of a moving point in phase space, then the excellent non-linear fitting ability of DBN network is applied to fit the trajectory, so as to realize bus load forecasting. This paper also employs a practical method based on cross-validation to optimize the DBN network structure, and the real bus load data are employed to verify that:


In this paper, the hyperparameters, such as the network structure of DBN, are only optimized by a roughly tuning method, and it is difficult to find the optimal value of hyperparameters. In practice, the corresponding load regular pattern will change greatly with the change of bus operation mode. The temperature elements also have an impact on load forecasting. Therefore, there are several factors of the bus load very short-term prediction model proposed in this paper that need to be considered and improved upon in the future.

**Author Contributions:** Data curation, F.M.; Formal analysis, C.Z.; Funding acquisition, J.L. (Jixiang Lu) and J.L. (Jinjun Lu); Investigation, F.M.; Project administration, J.Z.; Software, T.S.; Supervision, J.Z.; Validation, Y.P.; Writing—original draft, T.S.; Writing—review & editing, F.M. and J.W.

**Funding:** This research was funded by the "State Key Laboratory of Smart Grid Protection and Control, SKL of SGPC (No. 20195021212)" and "National Key Research and Development Project (No. 2018YFB0905000)".

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **A Multi-Step Approach to Modeling the 24-hour Daily Profiles of Electricity Load using Daily Splines**

#### **Abdelmonaem Jornaz 1,\* and V. A. Samaranayake <sup>2</sup>**


Received: 8 October 2019; Accepted: 30 October 2019; Published: 1 November 2019

**Abstract:** Forecasting of real-time electricity load has been an important research topic over many years. Electricity load is driven by many factors, including economic conditions and weather. Furthermore, the demand for electricity varies with time, with different hours of the day and different days of the week having an effect on the load. This paper proposes a hybrid load-forecasting method that combines classical time series formulations with cubic splines to model electricity load. It is shown that this approach produces a model capable of making short-term forecasts with reasonable accuracy. In contrast to forecasting models that utilize a multitude of regressor variables observed at multiple time points within a day, only the hourly temperature is used in the proposed model and predictive power gains are achieved through the modeling of the 24-hour load profiles across weekends and weekdays while also taking into consideration seasonal variations of such profiles. Long-term trends are accounted for by using population and economic variables. The proposed approach can be used as a stand-alone predictive platform or be used as a scaffolding to build a more complex model involving additional inputs. The data cover the period from 1 January 1993 through 31 December 2013 from the Atlantic City Electric zone.

**Keywords:** forecasting; time series; cubic splines; real-time electricity load; seasonal patterns

#### **1. Introduction**

There is a long history of research on the modeling of hourly real-time electricity load. They range from standard regression and time series approaches to methods that use machine learning algorithms, such as artificial neural networks (ANNs), which require training by experts familiar with the algorithms being utilized. In contrast to naive regression approaches or the use of more sophisticated machine learning algorithms, a hybrid method that amalgamates regression splines with time series methods is proposed. One advantage of the proposed method is that it is implementable by using standard off-the-shelf software that does not require specialized training to be an effective user. Another is that it utilizes temperature as the only weather-related variable. Moreover, the proposed time-varying spline approach allows one to model the profile of daily electricity load for weak days as well as weekends for winter, summer, and shoulder months, providing valuable information about the daily electricity use patterns and how they evolve across days and seasons. In addition, the proposed method can be used as a platform for building more sophisticated models with additional variables. Finally, the model is readily interpretable as opposed to a forecasting model that utilizes a "black box" type algorithm.

The literature on load forecasting is extensive, and therefore, a complete discussion of the literature is not possible in this paper. However, a sample of the approaches to load forecasting is presented herein to demonstrated the variety of available methods. For early classical work, the reader is referred to Bunn and Farmer [1], which summarized approaches developed for short-term forecasting of electricity load. An important reference that classifies different methods of load forecasting is Alfares and Nazeeruddin [2]. They categorized the various approaches into nine classes, which are: (1) multiple regression, (2) exponential smoothing, (3) iterative reweighted least-squares, (4) adaptive load forecasting, (5) stochastic time series, (6) Autoregressive Moving Average models with exogenous inputs (ARMAX models) and those with optimal model selection using the genetic algorithm, (7) fuzzy logic, (8) neural networks, and (9) expert systems. Alfares and Nazeeruddin also commented that while the pure time series approach is widely used, hybrid approaches, which combine several techniques, have become more common.

As mentioned by Alfares and Nazeeruddin, there are many instances of the use of hybrid methods. For example, El-Keib et al. [3] presented a hybrid approach where exponential smoothing was augmented with power spectrum analysis and adaptive autoregressive modeling. On the other hand, Dash et al. [4] utilized an expert system modeled fuzzy neural network and a hybrid neural network to forecast electricity load. Other publications that employed hybrid approaches are: Kim et al. [5] Chow et al. [6], and Choueiki et al. [7]. A more recent two-stage approach to forecasting the hourly electricity load for 24 hours ahead was developed by Gajowniczek and Zabkowski [8]. In this approach, the peak load values were determined by using the generic function quantile in the first stage, followed by building of three classification models, corresponding to the 99th, 95th, and 90th percentile of the distribution, to identify the load level. They used two machine learning algorithms, namely support vector machine (SVM) and artificial neural networks (ANN), to forecast the 24-hour electricity load. Another recently introduced approach, based on an extreme learning technique, was proposed by Das et al. [9]. This method considered relative difference in percentage of load at different intervals in its modeling approach. More recently, Annamareddi et al. [10] proposed a hybrid model based on a wavelet transform technique and double exponential smoothing to forecast the electricity load. Another hybrid method for predicting the electricity load using Support Vector Regression (SVR) and the Krill-Herd (KH) algorithm was proposed by Baziar and Kavousi-Fard [11]. The first step used training data, and the KH algorithm was used to optimize the SVR parameters. Consequently, in the second step, the optimized SVR was used to forecast the electricity load.

A research paper that influenced our approach to electricity load modeling is the publication by Nowicka-Zagrajek and Weron [12], which proposed a two-step procedure based on removing the trend and seasonal effects first and then fitting an autoregressive moving average (ARMA) model to the de-seasonalized data to obtain day-ahead predictions for the real-time load. In addition to removing trend and seasonal effects, our approach uses spline regression to model daily load profiles. In contrast, Liu et al. [13] utilized a semi-parametric model for nonlinear time series data, with the model consisting of two components. One of the components is nonparametric, while the other is a parametric Autoregressive Integrated Moving Average (ARIMA) component. Another approach that accounts for periodicity is a generalization of the logistic Smooth Transition Autoregressive (STAR) model for short term forecasting, developed by Amaral et al. [14]. This approach combines periodic models with a smooth transition between the regimes. Another publication that deals with cyclical behavior is that by Dordonnat et al. [15], which presented a periodic state space model, with different equations and different parameters for each hour, for forecasting of the hourly electricity load. The multi-equation linear model with autoregressive order two AR(2) approach developed by Chapagain and Kittipiyakul [16] uses 48 separate equations to forecast every half hour electricity load for one day ahead. Two different techniques, namely the ordinary least square (OLS) and a Bayesian approach, were used to estimate the model parameters for each type of day separately weekdays, weekends, and holidays.

The daily electricity use profile over a 24-hour period has prompted researchers to use functional approaches to modeling electricity load. Kosiorowski [17] compared methods of load forecasting that utilizes such approaches and concluded that the moving functional median is the appropriate approach for functional time series that contain outliers and nonstationary functional time series. In comparison, the other three approaches, functional autoregressive, fully functional regression, and the method proposed by Hyndman and Shang [18], work for a stationarity functional time series, and the prediction

accuracy of the Hyndman and Shang method was the best overall. Our proposed methodology treats the 24-hour load profile as a function, which changes according to the type of day and season; however, we model these changing profiles using the well-understood cubic spline approach.

More recently, Papadopoulos and Karakatsanis [19] compared four different approaches, namely the seasonal autoregressive moving average (SARIMA), seasonal autoregressive moving average with exogenous variable (SARIMAX), random forests (RF), and gradient boosting regression trees (GBRT). Among the methods compared, GBRT showed the most accurate results based on mean absolute percentage error (MAPE) and root mean square error (RMSE). Alkhathami [20] also discussed the merits of various forecasting methodologies for load forecasting. He mentioned that the complex methods give more accurate results. Yang et al. [21] developed a hybrid method to forecast the half-hour electricity load, and they applied autocorrelation function (ACF) to select the important input features of least square support vector machines (LSSVM), followed by the grey wolf optimization algorithm (GWO) and cross validation (CV) to optimize the parameters of LSSVM. This method was more accurate compared to nine other approaches over three electricity load data sets from Australia.

In addition to the literature discussed in the previous paragraphs, there are a plethora of publications on the topics of load forecasting. Nevertheless, for the sake of brevity, we presented only a limited sample to illustrate the diversity of approaches taken by researchers in this area.

It is worth noting that many of the RTOs (Regional Transmission Organizations) and ISOs (Independent System Operators), as well as utility companies, have tended to use multiple regression models with a multitude of weather-related inputs for short-term prediction in spite of recent research that tend to include more sophisticated approaches. Possible reasons for this are discussed later in this section.

The Pennsylvania-New Jersey-Maryland (PFM) RTO uses multiple regression models with many regressor, such as 26 calendar variables, which are the days of the week (6), month of the year (11), holidays (15), and daylight-saving time impacts (see PJM Manual 19 [22]). Different variables for heating, cooling and shoulder seasons are included in the PJM model. Moreover, several formulae are used to calculate some of the weather, economic, and end-use variables. Another variable labeled load adjustment has also been used in PJM model.

It is evident that while many sophisticated models have been proposed, at least some practitioners, such as the modelers at PJM, seem to prefer models based on classical statistical approaches. One reason for this may be that the less sophisticated multiple regression models work reasonably well and are both interpretable as well as easy to modify and re-train compared to those that are based on ANNs (Artificial neural networks) or RFs (Random Forests). Keeping this perspective in mind, an approach for short-term forecasting of electricity load using classical techniques that are relatively easy to implement, is proposed. One of the goals is to avoid using "black box" approaches that result in non-interpretable formulations, but instead to utilize methodologies that result in easy to understand models. The proposed method, while somewhat more complex than straightforward regression approaches, is nevertheless based on regression and basic time series modeling that can be executed using widely available software. In addition, it uses a minimum amount of weather variables and drives the forecasting power by capturing the effect of such variables implicitly embedded in the lagged values of the load series as well as by exploiting the cyclical patterns inherent in the data. While relatively simple when compared to the more sophisticated models described earlier, the proposed approach nevertheless provides flexibility to model non-linear and non-stationary components that exhibit seasonal variability. In addition, it provides a platform on which more complex models, involving regressors such as additional weather variables, can be built.

The rest of the paper follows the following format. In Section 2, the main factors that affect electricity load are discussed and their impact on the load is illustrated graphically using empirical data. Section 3 describes the sources of the electricity load data employed in the analysis as well as weather and macro-economic data utilized in the proposed model. The proposed modeling approach is detailed in Section 4 and concluding remarks are made in Section 5.

#### **2. The Factors A**ff**ecting Electricity Load**

There are many factors that affect the electricity load, some in the short term and others in the long term. Fahad and Arbab [23] described the impact of various factors on the short-term load and grouped those factors into four categories, namely time, weather, economy, and random disturbances. Several economic and macroeconomic factors influence the electricity load over the long-term and researchers have utilized these to obtain long-term forecasts. Some examples of such variables are gross domestic product (GDP), gross domestic product per capita (GDP per capita), and population size. At the initial stages of the proposed modeling approach, long-term trend is removed using a regression model that accounts for several macro-economic variables and population size.

The time related changes in electricity load are not only due to long-term trend. Daily variations in human activity due to working, leisure, and sleeping periods (see Figure 1) can introduce a cyclical pattern with a 24-hour period. Other time related factors including day of the week, holidays, and seasonal changes in consumer behavior can affect this 24-hour cycle as seen in Figure 1.

**Figure 1.** The average of a 24-hour of load curve of weekdays (blue solid) and weekends (red dashed) 1993–2012 (**left**: January; **right**: July).

The weather variables, such as temperature, humidity, precipitation, and wind speed, have played a significant role in electricity load forecasting, such as in the models used by the PJM TRO. Out of these factors, the temperature plays a major role (Figure 2). The effect of seasons on the electricity load, as seen in Figure 3, can be mainly attributed to seasonal fluctuation of the temperature, even though seasonal changes in human behavior can also play a role. The proposed approach to modeling electricity load strives to capture these effects due to seasonality, week-day and weekend differences, as well as the intra-day fluctuation of temperature on the 24-hour load curve. Details of how this is accomplished are given in Section 4.

**Figure 2.** The relationship between the hourly load and hourly temperature—South New Jersey 1993–2012.

**Figure 3.** The average of 24-hour load curves (blue solid) and the temperature (red doted) (**left**: January 2013; **right**: July 2013).

#### **3. Data Sources**

The historical load dataset used in this study was obtained from the Pennsylvania-New Jersey-Maryland RTO website (PJM) [24]. The data cover a sub region of PJM (see Figure 4), namely the Atlantic City Electric company (AE), which serve approximately 556,000 customers in eight counties (Atlantic, Burlington, Camden, Cape May, Cumberland, Gloucester, Ocean, and Salem), in southern New Jersey. This dataset includes hourly observations measured in megawatts (MW) over 20 years from 1 January 1993 through 31 December 2012 (see Figure 5), which were used for modeling purposes (i.e., as training data), and data from 1 January 2013 through 31 December 2013, shown in Figure 6, which were used as test data for computing forecasting error. The weather data were obtained from the National Oceanic Atmospheric Administration (NOAA) based on four weather stations in different locations of the study area, southern New Jersey. These stations are located in Atlantic City, Millville, Mount Holly, and Wildwood.

The economic data were obtained from Federal Reserve Bank of St. Louis. The specific economic variables used in this study are: industrial production index in the US (IPI) which is an economic indicator that measures the amount of the output from manufacturing, mining, electric and gas industries; government employment in New Jersey (NJGOVTN), which is defined as the total body of employees in all government agencies apart from the military; home vacancy rate in New Jersey (NJHVAC), which is defined as the percentage of all available units in a rental property that are vacant or unoccupied at a particular time.

**Figure 4.** Map of Pennsylvania-New Jersey-Maryland (PJM) [25].

**Figure 5.** The hourly observed load over 20 years.

**Figure 6.** The hourly load of Atlantic City Electric Company (AE) in 2013.

#### **4. The Proposed Approach to Modeling Electricity Load**

The approach used in the following assumes a traditional structural time series model with trend, seasonal, and cyclical components, but utilizes a variation of cubic splines to estimate the 24-hour load profile for weekdays and weekends in a given season with hourly temperature playing an explanatory role. Specifically, model parameters are estimated for the mean load curve separately for weekdays and weekend days within each season after performing a de-trending operation. This approach can be considered suitable for short term prediction because of the need to have good estimates of hourly temperature.

The model assumes that *Yt*,*i*, the real-time load at hour *i* on day *t*, is a composite of structural components consisting of a long-term trend τ*t*, a seasonal component *St*, a weekly cycle *wt*, a set of functions *fs*,*d*(*x*) representing the hourly load profile at time *x* for season *s* and day of the week *d* (taking one value for week-days and a different value for weekends), and an irregular stochastic component *ut*,*i*. Thus, *Yt*,*<sup>i</sup>* can be expressed as:

$$\mathcal{Y}\_{t,i} = \tau\_t + S\_t + w\_t + f\_{s,d}(i) + u\_{t,i}e\_i$$

where *t* = 1, 2, ... , *N* and *i* = 1, 2, ... , 24. Note that *N* denotes the number of days in the training data set and *i* denotes the hour of the day.

The long-term trend was modeled using classical regression with select economic variables as regressors. The weakly seasonal component was modeled using a vector autoregressive moving average with exogenous terms (ARMAX) formulation with the average weekly temperature and its square as exogenous variables. The 24-hour load profiles were modeled by using a separate set of cubic splines for each season and weekdays/weekend combinations. Different spline models were used for each season because the 24-hour load profile within a season has almost the same pattern but differs across seasons. The weekdays were modeled separately within each season because they have quite different load profiles as well when compared to weekend patterns. The assumption of only one functional form for the load profile of weekdays can be relaxed by adding unique functions for each day of the week or for Monday, Friday, and the rest of the weekdays. Similarly, one can assume

separate functional forms for Saturday and Sundays. Since such an approach can reduce the accuracy of estimates due to reduced sample sizes, the number of different functions was kept to a minimum.

Details of the modeling process are described below, beginning with the detrending process followed by the estimation of the seasonal components and concluding with the spline modeling of the 24-hour load profile.

#### *4.1. Predicting Long-Term Trend*

The first step included modeling the hourly average electricity load per year, τ∗ *<sup>l</sup>* <sup>=</sup> <sup>1</sup> 24*Nl Nl t*=1 24 *i*=1 *Yt*,*i*, using classical regression analysis. Note that in the above expression for the average load, *l* denotes the year with *l* = 1, 2, ... , 20, and *Nl* denotes the total number of days in that year. A stepwise selection method was used to determine the independent variables to be included in the model. Out of more than 20 economic variables plus population size and the average monthly temperature, the following variables were selected: government employment in New Jersey (NJGOVTN), industrial production index in the US (IPI), home vacancy rate in New Jersey (NJHVAC), and the average temperature of September (Temp\_Sep). Table 1 provides the results of the multiple linear regression analysis.


**Table 1.** The results for the regression model for annual load.

Note: The above results are for the training data set only.

The estimated regression model for the annual data is:

τˆ∗ *<sup>l</sup>* = −422.01 − 36.78 *NJHVAC* + 1.61 *NJGOVTN* + 2.77 *IPI* +7.25 *Temp*\_*Sep*.

The selected independent variables explain 98.5% of the variation in the average annual load and the root mean square error (RMSE) is 9.7, which is relatively small. Moreover, no serious multicollinearity among the independent variables was detected. The residual analysis is shown in the Figure A1 in the Appendix A, and while two outliers (with high Cook's distance) are shown, no major concern is raised. In addition, the model does a good job of predicting the annual load, as seen in Figure 7.

Figure 7 displays the average annual real-time load per hour for the 20 years of training data and one year of test data and the average annual load predicted, using the estimated regression model. The figure shows the predicted trend using actual macroeconomic data for the test year, but the macroeconomic data for the test year can be predicted very accurately using an ARMA model, and the results do not change by much. The display shows very good in-sample agreement between the observed and predicted load and a reasonable agreement between the two for the test year. One word of caution is that we observed that the electricity load decreased in the last three years, but one of the two most important variables in the model, the government employment in New Jersey (NJGOVTN), decreased only slightly, while the industrial production index in the US (IPI) increased. Those two variables explained 92% of the variation in the electricity load. Thus, some delinking of these variables with electricity load may be occurring and developing an annual model with more recent data may be pragmatic.

**Figure 7.** The annual average of hourly load (blue solid) and the predicted load (red doted) 1993–2013.

#### *4.2. Estimating Seasonal Variation in Data*

The trend estimates for each year were transposed onto a weekly series, and a 52-week moving average was applied to this series to smooth the predictions from a step function to a smooth one. The smoothed trend,'τ∗ *<sup>w</sup>*, for week *w*, was subtracted from the average load *Yt*,• = (24) <sup>−</sup><sup>1</sup> <sup>24</sup> *i*=1 *Yt*,*<sup>i</sup>* for each day within the corresponding week, with the process repeated for all weeks, yielding the detrended daily averages *Y* ∗ *<sup>t</sup>*,•. The resulting data can be represented in a vector series *S*<sup>∗</sup> *<sup>w</sup>*, where each vector contains the seven detrended daily averages *Y* ∗ *<sup>t</sup>*,• corresponding to that week. Note that *w* = 1, 2, ... , *W*, where *W* is the total number of weeks in the training data set. The de-trended weekly time series, *S*ˆ∗ *<sup>w</sup>*, was then used to fit a subset ARMAX model (see Baillie, R. T [26] for details) given below:

$$\begin{aligned} S\_w^\* &= 1023 & + & 1.13 S\_{(w-1)} - 0.23 S\_{(w-2)}^\* & + 0.76 S\_{(w-52)}^\* - 0.67 S\_{(w-53)} & + 0.75 Z\_{(w-1)} \\ &- & 0.06 Z\_{(w-50)} + 0.71 Z\_{(w-52)} - 0.44 Z\_{(w-53)} - 0.11 Z\_{(w-54)} - 50.36 T + 0.53 T^2 \end{aligned}$$

where *L*(*w*−*lag*) denotes the autoregressive lag term, *Z*(*w*−*lag*) denotes the moving average lag terms, and *T* denotes the weekly average temperature. The residuals of the fit do not show any major autocorrelations and the test for white noise (bottom right hand corner of Figure A2 in the Appendix A) shows no evidence that the residuals are anything other than white noise. The check for normality of the residuals, given in Figure A3 in the Appendix A, shows some deviation from normality, but this is not much of a concern because the model performed an adequate job of extracting the seasonal component as indicated by white noise residual.

The forecasted weekly vector values for the test data set (year 2013) were then obtained using the estimated ARMAX model. These estimated vectors *S*ˆ<sup>∗</sup> *<sup>w</sup>* contain forecasted daily values for each week. These daily values were averaged to get a forecast weekly average. The trend model was then employed to forecast a yearly trend for the test data (year 2013). Note that to predict a trend, we needed macroeconomic data for the test year and in practice, these have to be predicted. We found that applying an ARMA model to the past macroeconomic data would yield accurate forecasts for the test year. This yielded a constant forecast across all the weeks of the test year. These were them smoothed using a 52-week moving average that utilized previous year's data for the smoothing. The smoothed weekly trend data were then added to the forecast weekly average obtained from averaging the daily values from the ARMAX vector forecasts. The resulting weekly averages were then compared with the observed weekly average load for the test year (Figure 8). These out-of-sample checks show that the seasonal (weekly) model provides a satisfactory estimation of the seasonal component.

Note that the smoothing of the yearly trend data allows for a smooth transition from one year to the next. It also reduces any bias due to poor estimation of the trend value for the test year, when computing trend values for the early part of the test year. The trend estimates can be updated later in the test year by reforecasting the trend value by using updated macroeconomic variables that may become available after the first quarter of that year and later after the second and third quarter.

**Figure 8.** The weekly average of the hourly load (blue solid) and the predicted load (red dashed) in 2013.

#### *4.3. Modeling the Hourly Load*

At this point, the weekly smoothed trend 'τ∗ *<sup>w</sup>* and the estimated seasonal component *S*ˆ<sup>∗</sup> *<sup>w</sup>* were removed from the hourly data *Yt*,*<sup>i</sup>* for both training and test data years, and a new de-trended and de-seasonalized time series, *Y*∗ *t*,*i* , was obtained. The times series *Y*∗ *<sup>t</sup>*,*<sup>i</sup>* was modeled using the training data set, by fitting cubic splines to model the 24-hour daily profile. Different spline estimates were obtained for each season, weekday, and weekend combination. Temperature and its interaction with time were also fitted as regressors.

Two scenarios were applied here. The first one modeled each season and each day type (weekday or weekend) separately. We denoted the resulting model as Model 1. The second scenario modeled each season separately and ignored the day type but added a dummy variable to identify the type of day. This approach provided us with Model 2.

#### 4.3.1. The First Scenario

The general spline Model 1 is:

$$\begin{split} Y\_{t,l}^\* &= b\_0 \ &+ \ b\_1 \dot{\imath} + b\_2 \dot{\imath}^2 + b\_3 \dot{\imath}^3 + b\_4 (\dot{\imath} - \kappa\_1)^2 + b\_5 (\dot{\imath} - \kappa\_2)^2 + b\_6 (\dot{\imath} - \kappa\_3)^2 + b\_7 (\dot{\imath} - \kappa\_1)^3 + b\_8 (\dot{\imath} - \kappa\_2)^3 \\ &+ \ b\_9 (\dot{\imath} - \kappa\_3)^3 + b\_{10} T\_{t,l} + b\_{11} T\_{t,l} \ast \dot{\imath} + b\_{12} T\_{t,l} \ast \dot{\imath}^2 + b\_{13} T\_{t,l} \ast \dot{\imath}^3 + b\_{14} T\_{t,l} \ast \left(\dot{\imath} - \kappa\_1\right)^2 + b\_{15} T\_{t,l} \ast \left(\dot{\imath} - \kappa\_2\right)^2 \\ &+ b\_{16} T\_{t,l} \ast \left(\dot{\imath} - \kappa\_3\right)^2 + b\_{17} T\_{t,l} \ast \left(\dot{\imath} - \kappa\_1\right)^3 + b\_{18} T\_{t,l} \ast \left(\dot{\imath} - \kappa\_2\right)^3 + b\_{19} T\_{t,l} \ast \left(\dot{\imath} - \kappa\_3\right)^3, \end{split}$$

where *i* is the hour, κ *j s* are the knots that change according to season and day type, and *Tt*,*<sup>i</sup>* is temperature at hour *i* on day *t*. The knot positions were chosen by inspection for each season and are given, together with the parameter estimates, in the Tables A2–A5 in the Appendix A. Note that non-significant terms were dropped from the model and what is given above is the reduced model.

#### 4.3.2. The Second Scenario

The general spline Model 2 is:

$$\begin{split} Y^\*\_{t,l} &= b\_0 \ &+ \ b\_1 \dot{\imath} + b\_2 \dot{\imath}^2 + b\_3 \dot{\imath}^3 + b\_4 (\dot{\imath} - \varkappa\_1)^2 + b\_5 (\dot{\imath} - \varkappa\_2)^2 + b\_6 (\dot{\imath} - \varkappa\_3)^2 + b\_7 (\dot{\imath} - \varkappa\_1)^3 + b\_8 (\dot{\imath} - \varkappa\_2)^3 \\ &+ b\_9 (\dot{\imath} - \varkappa\_3)^3 + b\_{10} T\_{t,l} + b\_{11} T\_{t,l} \ast \dot{\imath} + b\_{12} T\_{t,l} \ast \dot{\imath}^2 + b\_{13} T\_{t,l} \ast \dot{\imath}^3 + b\_{14} T\_{t,l} \ast \left(\dot{\imath} - \varkappa\_1\right)^2 + b\_{15} T\_{t,l} \ast \left(\dot{\imath} - \varkappa\_2\right)^2 \\ &+ b\_{16} T\_{t,l} \ast \left(\dot{\imath} - \varkappa\_3\right)^2 + b\_{17} T\_{t,l} \ast \left(\dot{\imath} - \varkappa\_1\right)^3 + b\_{18} T\_{t,l} \ast \left(\dot{\imath} - \varkappa\_2\right)^3 + b\_{19} T\_{t,l} \ast \left(\dot{\imath} - \varkappa\_3\right)^3 + b\_{20} \varkappa\_{t,l} \ast \left(\dot{\imath} - \varkappa\_4\right)^3 \end{split}$$

where *i* is the hour, κ is a knot that changes according to the season and the knot positions and parameter estimates are given in the Tables A6 and A7 in the Appendix A. Note that *Tt*,*<sup>i</sup>* is the temperature at time *i* on day *t*, and *wt* is a dummy variable denoting weekend day. Note that non-significant terms were dropped from the model and the results reported are for the reduced model.

As mentioned previously, the Table A2 through Table A7, given in the Appendix A, present the knot positions and the results of the regression model for each season and each type of day. The tables for weekdays and weekends for a given season are paired together for easy comparison.

Model obtained for each season is different from the others, reflecting changes in the daily load profiles across seasons. The comparison between Model 1 and Model 2, based on Akaike Information Criteria (*AIC*) and Root Mean Square Error (*RMSE*) is presented in Table 2. The results show very little difference between the two models. In addition, the Figures 10 and 11 show the comparison between the two models based on the Coefficient of Variation (*CV*) for each month and each hour, respectively.


**Table 2.** The comparison between the two models.

Figure 9 shows very close agreement between the two models when compared using the *CV* by month. There is a slight drop in the *CV* for Model 1 suggesting a slight gain in accuracy when the weekdays and weekends are modeled separately. Figure 10 given below shows the *CV* for the two models by each hour of the day. Again, Model 1 shows a slight advantage with the *CV* for Model 2 showing higher values for hours before 10 am. This may be because the load profile for weekends

shows a two-hour shift in the morning load profile and the inclusion of a dummy variable is not sufficient to account for this difference in the shape of the load profile.

**Figure 9.** The comparison between the *CV* of the predicted monthly average load of Model 1 (blue solid) and Model 2 (red doted).

**Figure 10.** The comparison between the *CV* of the predicted hourly average load of Model 1 (blue solid) and Model 2 (red doted).

The Figures 11–14 provide a comparison between the two models for four different weeks of the test year. The weeks were chosen from the middle of each season. The forecasts based on each model were very close to one another, which suggests that adding a dummy variable for the day type instead of building the extra models for the type of day provides satisfactory forecast overall; however, for all seasons except summer, Model 2 yielded forecasts that fall below the observed load during the weekends (last two days in the graph), especially in the morning period. However, except for the weekends, both models underestimated the afternoon peak in winter and spring. For the summer season (Figure 13), the afternoon peak was overestimated by both models on Fridays.

Table 3 provides an additional contrast between the two models based on *CV*. It is immediately apparent that any difference between the two models is quite marginal and may not have any practical consequences. The cells highlighted in light blue indicate places where the *CV* for a given model is lower than that for the competing model. If any conclusion can be made based examining these results, it may be that Model 1 is slightly better than Model 2 across most hours in winter and fall, and Model 1 appears to perform slightly better before noon during spring. One reason for this may be the somewhat poor performance of model two during the weekend mornings. If pressed to select one model over the other, the natural choice would be Model 1, but a strong argument cannot be made that it is much more superior to Model 2.

**Figure 11.** The comparison between the observed hourly load (blue solid), the predicted hourly load of Model 1 (red doted), and Model 2 (green dashed). 7 January 2013–13 January 2013.

**Figure 12.** The comparison between the observed hourly load (blue solid), the predicted hourly load of Model 1 (red doted), and Model 2 (green dashed). 1 April 2013–7 April 2013.

**Figure 13.** The comparison between the observed hourly load (blue solid), the predicted hourly load of Model 1 (red doted), and Model 2 (green dashed). 8 July 2013–14 July 2013.

**Figure 14.** The comparison between the observed hourly load (blue solid), the predicted hourly load of Model 1 (red doted), and Model 2 (green dashed). 7 October 2013–13 October 2013.


**Table 3.** The comparison between the *CV* of the two models for each season by hour.

Note: The numbers highlighted in blue indicates the lower of the two *CV* values for each hour of each season.

#### **5. Conclusions**

A multi-step approach to modeling the hourly electricity load using a structural time series model that utilizes only standard statistical modeling techniques was introduced. While the proposed methods require multiple steps to model the load data, every step can be implemented using commonly available statistical software packages, and therefore, they are within the reach of empirical modelers who do not have training in the use of sophisticated machine learning algorithms or have the time required to master complex analytical techniques. The results of modeling observed real-time load data from the PJM market show that the proposed method performs reasonably well in modeling the training data and short-term forecasting out-of-sample data. In addition, the proposed methodology utilized only macroeconomic and temperature data, and the use of additional input variables has the potential to further improve the performance of the models considered in this study. One shortcoming of the proposed study is the need to know macroeconomic data and the population figures to predict long-term trend, but in the context of forecasting in the short-term, this may not be a great drawback, because near-term forecasts of these can be quite reliable or one can use the most recent data without sacrificing much accuracy. In spite of the above shortcoming, it is seen that the cubic spline model worked very well in capturing the 24-hour load curve, and therefore, the proposed methodology can provide a framework for modeling other phenomena that exhibit a daily cycle, especially if long-term trend forecasting is not needed.

**Author Contributions:** The contributions made by the two authors are as follows: conceptualization, A.J. and V.A.S.; methodology, A.J. and V.A.S.; software, A.J.; validation, A.J.; formal analysis, A.J.; investigation, A.J.; resources, V.A.S.; data curation, A.J.; writing—original draft preparation, A.J.; writing—V.A.S.; visualization, A.J.; supervision, V.A.S.; project administration, V.A.S.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

In this appendix, additional figures and tables relevant to material presented in this paper are given.

**Figure A1.** Residual Analysis of the Annual Regression Model.

**Figure A2.** Analysis of Residual for the Weekly ARMAX Model.

**Figure A3.** Normality Diagnostics of the Residuals of the Weekly ARMAX Model.




**Table A2.** The Results for the Regression Model for the Winter Season (Model 1).

**Table A3.** The Results for the Regression Model for the Spring Season (Model 1).



**Table A4.** The Results for the Regression Model for the Summer Season (Model 1).

**Table A5.** The Results for the Regression Model for the Fall Season (Model 1).



**Table A6.** The Results for the Regression Model for the Winter and Spring Seasons (Model 2).


**Table A7.** The Results for the Regression Model for the Summer and Fall Seasons (Model 2).

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Short-Term Electricity Demand Forecasting Using Components Estimation Technique**

**Ismail Shah 1,2,\*,†, Hasnain Iftikhar 1,†, Sajid Ali 1,† and Depeng Wang 3,\*,†**


Received: 27 May 2019; Accepted: 28 June 2019; Published: 1 July 2019

**Abstract:** Currently, in most countries, the electricity sector is liberalized, and electricity is traded in deregulated electricity markets. In these markets, electricity demand is determined the day before the physical delivery through (semi-)hourly concurrent auctions. Hence, accurate forecasts are essential for efficient and effective management of power systems. The electricity demand and prices, however, exhibit specific features, including non-constant mean and variance, calendar effects, multiple periodicities, high volatility, jumps, and so on, which complicate the forecasting problem. In this work, we compare different modeling techniques able to capture the specific dynamics of the demand time series. To this end, the electricity demand time series is divided into two major components: deterministic and stochastic. Both components are estimated using different regression and time series methods with parametric and nonparametric estimation techniques. Specifically, we use linear regression-based models (local polynomial regression models based on different types of kernel functions; tri-cubic, Gaussian, and Epanechnikov), spline function-based models (smoothing splines, regression splines), and traditional time series models (autoregressive moving average, nonparametric autoregressive, and vector autoregressive). Within the deterministic part, special attention is paid to the estimation of the yearly cycle as it was previously ignored by many authors. This work considers electricity demand data from the Nordic electricity market for the period covering 1 January 2013–31 December 2016. To assess the one-day-ahead out-of-sample forecasting accuracy, Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) are calculated. The results suggest that the proposed component-wise estimation method is extremely effective at forecasting electricity demand. Further, vector autoregressive modeling combined with spline function-based regression gives superior performance compared with the rest.

**Keywords:** Nordic electricity market; electricity demand; component estimation method; univariate and multivariate time series analysis; modeling and forecasting

#### **1. Introduction**

Liberalization of the energy sector, changes in climate policies, and the upgrade of renewable energy resources have completely changed the structure of the previous strictly-controlled energy sector. Today, most energy markets have been liberalized and privatized with the purpose of gaining consistent and inexpensive facilities for power trades. Within the energy sector, the liberalization of the electricity market has also introduced new challenges. In particular, electricity demand and price forecasting have become extremely important issues for producers, energy suppliers, system operators, and other market participants. In many electricity markets, electricity demand is fixed a day before the physical delivery by concurrent (semi-)hourly auctions. Further, electricity cannot be stored

in an efficient manner, and the end-user demand must be satisfied instantaneously; thus, accurate forecast for electricity demand is crucial for effective power system management [1,2].

The electricity demand forecast can be broadly divided into three time horizons: (a) short-term, (b) medium-term, and (c) long-term load forecasting. Long-Term Load Forecast (LTLF) includes horizons from a few months to several years ahead. LTLF is generally used for planning and investment profitability analysis, determining upcoming sites, or acquiring fuel sources for production plants [3]. Medium-Term Load Forecast (MTLF) normally considers horizons from a few days to months ahead and is usually preferred for risk management, balance sheet calculations, and derivatives pricing [4]. Finally, Short-Term Load Forecast (STLF) generally includes horizons from a few minutes up to a few days ahead. In practice, the most attention in electricity load forecasting has been paid to STLF since it is an essential tool for the daily market operations [5].

However, electricity demand forecasting is a difficult task due to the features demand time series exhibit. These features include non-constant mean and variance, calendar effects, multiple periodicities, high volatility, jumps, etc. For example, the yearly, weekly, and daily periodicities can be seen from Figure 1. The weekly phase is comprised of comparatively lower variation in the data. The load curves are comparatively different on different days of the week, and the demand varies throughout the day. The demand is high on weekdays as compared to weekends. Moreover, electricity demand is also affected by calendar effects (bank/bridging holidays) and by seasons. In general, the demand is considerably lower during bank holidays and bridging holidays (a day among two non-working days). From the figure, high volatility in electricity demand can also be observed in almost all load periods. In addition, different environmental, geographical, and meteorological factors have a direct effect on electricity demand. Further, as electricity is a secondary source of energy, which is retrieved by converting prime energy sources like fossil fuels, natural gas, solar, wind power, etc. [6], the cost related to each source is different. Thus, a consistent electricity supply mechanism for different levels of demand with short periods of high and rather longer periods of moderate demand is necessary.

**Figure 1.** Yearly seasonality for the period 01-01-2012–31-12-2015 (**top left**), weekly periodicity for the period 01-01-2013–14-01-2013 (**top right**), box plot of hourly electricity load for the period 01-01-2013–31-12-2016 (**bottom right**), daily load curves for the period 01-01-2013–31-01-2013, weekdays (solid lines), Saturdays (dashed lines), Sundays (dotted lines), and bank holidays at the bottom (solid) representing 1 January (**bottom left**).

To account for the different features of the demand series, in the last two decades, researchers suggested different methods and models to forecast electricity demand [7–11]. For example, the work in [12] proposed a semi-parametric component-based model consisting of a non-parametric (smoothing spline) and a parametric (autoregressive moving average model) component. Exponential smoothing techniques are also widely used in forecasting electricity demand [13,14]. Multiple equations time series models, e.g., the Double Seasonal Autoregressive Moving Average (DSARIMA) model, the Double Holt–Winters (D-HW) model, and Multiple Equations Time Series (MET) approaches are also used for short-term load forecasting [15,16]. Regression methods are easy to implement and have been widely used for electricity demand forecasting in the past. For example, the work in [17] used parametric regression models to forecast electricity demand for the Turkish electricity market. Some authors included exogenous variables in the time series models to improve the forecasting performance [18–20]. Several researchers compared the classical time series models and computational intelligence models [21–23]. For example, the work in [24] compared the Seasonal Autoregressive Moving Average (SARIMA) and Adaptive Network-based Fuzzy Inference System (ANFIS) models. For short-term load forecasting, the work in [25] introduced a new hybrid model that combines SARIMA and the Back Propagation Neural Network (BPNN) model. Some authors suggested the use of functional data analysis to predict electricity demand [26–28]. The main idea behind this approach is to consider the daily demand profile as a single functional object; thus, functional approaches can be applied to electricity load series. Other approaches used for demand forecasting can be seen in [29–33]. Apart from the forecasting models, Distributed Energy Resources (DERs) that are directly connected to a local distribution system and can be used for electricity producing or as controllable loads are also discussed in the literature [34,35]. DERs include solar panels, combined heat and power plants, electricity storage, small natural gas-fueled generators, and electric water heaters.

The main objective of this work is to compare different modeling techniques for electricity demand forecasting. The main attention is paid to the yearly cycle, which in many cases is ignored. The authors suggest to estimate jointly the effect of the long-term trend and yearly cycle using one component [36,37]. In practice, however, the yearly component shows regular cycles, while the long-term component highlights the trend structure of the data. Thus, these two components must be modeled separately [26]. Further, in our case, some pilot analyses suggested that modeling these two components separately can significantly improve the forecasting results. Thus, the main contribution of this paper is the thorough investigation of the impact of yearly component estimation on one-day-ahead out-of-sample electricity demand forecasting. Within the framework of the components estimation method, we compare models in terms of forecasting ability considering both univariate and multivariate, as well as parametric and non-parametric models. Moreover, for the considered models, the significance analysis of the difference in predication accuracy is also conducted.

The rest of the article is organized as follows: Section 2 contains a description of the proposed modeling framework and of the considered models. Section 3 provides an application of the proposed modeling framework. Section 4 contains a summary and conclusions.

#### **2. Component-Wise Estimation: General Modeling Framework**

The main objective of this study is to forecast one-day-ahead electricity demand using different forecasting models and methods. To this end, let log(*Dt*,*j*) be the series of the log demand for the *t* th day and the *j* th hour. Following [28,33], the dynamics of the log demand, log(*Dt*,*j*), can be modeled as:

$$\log\left(D\_{t,j}\right) = F\_{t,j} + R\_{t,j} \tag{1}$$

That is, the log(*Dt*,*j*) is divided into two major components: a deterministic component *Ft*,*<sup>j</sup>* and a stochastic component *Rt*,*j*. The deterministic component, *Ft*,*j*, is comprised of the long-run trend, annual, seasonal, and weekly cycles, and calendar effects and is modeled as:

$$F\_{t,j} = l\_{t,j} + a\_{t,j} + s\_{t,j} + w\_{t,j} + b\_{t,j} \tag{2}$$

where *lt*,*<sup>j</sup>* represents the long-run (trend component), *at*,*<sup>j</sup>* represents the annual cycles, *st*,*<sup>j</sup>* represents the seasonal cycles, *wt*,*<sup>j</sup>* is the weekly cycles, and *bt*,*<sup>j</sup>* represents the bank holidays. On the other hand, *Rt*,*<sup>j</sup>* is a (residual) stochastic component that describes the short-run dependence of demand series. Concerning the estimation of the deterministic component, apart from the yearly component *at*,*j*, the remaining components are estimated using parametric regression. For the estimation of *at*,*j*, six different methods including the sinusoidal function-based regression techniques, three local polynomial regression models, and two regression spline function-based models are used. All the components in Equation (2) are estimated using the back fitting algorithm. In the case of stochastic component *Rt*,*j*, four different methods, namely the Autoregressive Model (AR), the Non-Parametric Autoregressive model (NPAR), the Autoregressive Moving Average Model (ARMA), and the Vector-Autoregressive model (VAR) are used. Combining the models for deterministic and stochastic components estimations leads us to comparing twenty four (6*<sup>F</sup>* × <sup>4</sup>*<sup>R</sup>* =) 24 different combinations. Note that in the case of univariate models, each load period is modeled separately to account for the intra-daily periodicity [38].

#### *2.1. Modeling the Deterministic Component*

This section will explain the estimation of the deterministic component. The long-run (trend) component *lt*,*j*, which is a function of time t, is estimated using Ordinary Least Squares (OLS). Dummy variables are used for seasonal periodicities, weekly periodicities, and for bank holidays, i.e., *st* <sup>=</sup> <sup>4</sup> ∑ *i*=1 *αiIi*,*t*, with *Ii*,*<sup>t</sup>* = 1 if t refers to the *i* th season of the year and zero otherwise, *wt* <sup>=</sup> <sup>7</sup> ∑ *i*=1 *βiIi*,*t*, with *Ii*,*<sup>t</sup>* = 1 if t refers to the *i* th day of the week and zero otherwise, and *bt* <sup>=</sup> <sup>2</sup> ∑ *i*=1 *γIi*,*t*, with *Ii*,*<sup>t</sup>* = 1 if t refers to a bank holiday or zero otherwise. The coefficients *α*'s, *β*'s, and *γ*'s are estimated by OLS. On the other hand, the annual component *at*,*j*, which is a function of the series (1, 2, 3, ... , 365, 1, 2, 3, ... , 365, ...), is estimated by six different methods that include Sinusoidal function-based Regression (SR), local polynomial regression models with three different kernels, namely: tri-cubic ((L1), Gaussian (L2), and Epanechnikov (L3), Regression Splines (RS), and Smoothing Splines (SS).

#### 2.1.1. Sinusoidal Function Regression Model

Sinusoidal function Regression (SR) is widely used in the literature to capture the periodicity of a periodic component [39–44]. In this method, we consider that the annual cycle can be estimated using *q* sine and cosine functions given as:

$$a\_l = \sum\_{i=1}^{q} \left( a\_{1,i} \sin ua\_l + a\_{2,i} \cos ua\_l \right), \tag{3}$$

where *<sup>w</sup>* <sup>=</sup> <sup>2</sup>*<sup>π</sup>* 365.25 . The unknown parameters *<sup>α</sup>*1,*<sup>i</sup> and <sup>α</sup>*2,*<sup>i</sup>* (*<sup>i</sup>* <sup>=</sup> 1, . . . , *<sup>q</sup>*) are estimated by OLS.

#### 2.1.2. Local Polynomial Regression

Local polynomial regression is a flexible non-parametric technique that approximates *at* at a point *a*<sup>0</sup> by a low-order polynomial (say, *q*), fit using only points in some neighborhood of *a*0.

$$\mathfrak{A}\_{t} = \sum\_{j=1}^{q} \mathfrak{k}\_{j} (a\_{t} - a\_{0})^{j}. \tag{4}$$

Parameters *α*ˆ*<sup>j</sup>* are estimated by Weighted Least Squares (WLS) by minimizing:

$$\sum\_{t=1}^{N} (a\_t - \mathfrak{d}\_t)^2 \mathcal{K}\_{\delta(a)}(a\_t - a\_0),\tag{5}$$

where *Kδ*(*a*)(*at* − *a*0) is a weighting kernel function, which depends on the smoothing parameter *δ*, also known as the bandwidth. It controls the size of the neighborhood around *a*<sup>0</sup> [45] and, thus, of the locality of the approximation. In this work, the value of the bandwidth is selected by using the cross-validation technique. Three different weighting kernel functions, namely the tri-cubic kernel (L1), the Epanechnikov (L2), and Gaussian kernels (L3) are used. It is worth mentioning that these types of local kernel-based regression techniques have been extensively used in the literature [31,39,40,44,46].

#### 2.1.3. Regression Spline Models

Spline Regression (RS) is a popular non-parametric regression technique, which approximates *at* by means of piecewise polynomials of order p, estimated in the subintervals delimited by a sequence of *m* points called knots. Any spline function *Z*(*a*) of order p can be described as a linear combination of functions *Zj*(*a*) called basis functions and is expressed in the following way:

$$Z(a) = \sum\_{j=1}^{m+p+1} \alpha\_j Z\_j(a).$$

The unknown parameters *α<sup>j</sup>* are estimated by the OLS. The most important choice is the number of knots and their location because they define the smoothness of the approximation. Again, we chose it by the cross-validation approach. In the literature, many authors considered this approach for long-run component prediction [11,12,26]. The annual cycle component for regression splines is estimated as,

$$\mathfrak{a}\_{\ell} = \hat{Z}(a)$$

#### 2.1.4. Smoothing Splines

To overcome the requirement for fixing the number of knots, spline functions can alternatively be estimated by using the penalized least squares condition to minimize the sum of squares. Hence, the expression to minimize becomes:

$$\sum\_{j=1}^{N} (a\_t - Z(a))^2 + \lambda \int \left(Z''(a)\right)^2 dt\tag{6}$$

where (*Z*(*a*)) is the second derivative of *Z*(*a*). The first term accounts for the degree of fitting, while the second one penalizes the roughness of the function through the smoothing parameter *λ*. The selection of smoothing parameter is an important task, which in this work is done using the cross-validation approach. Smoothing Splines (SS) have been previously used by some authors to estimate the long-run dynamics of the series, e.g., [11,47,48].

To see the performance of all six models defined above for the estimation of the annual component *at*,*j*, the observed log demand and the estimated annual components are depicted in Figure 2. From the figure, we can see that all six models for *at*,*<sup>j</sup>* were capable of capturing the annual seasonality, as the annual cycles can be seen clearly from the figure.

Finally, it is worth mentioning that one-day-ahead forecast for the deterministic component is straightforward as the elements of *Ft*,*<sup>j</sup>* are deterministic functions of time or calendar conditions, which are known at any time. Once all these components are estimated, the residual (stochastic) component *Rt*,*<sup>j</sup>* is obtained as:

$$R\_{t,\dot{j}} = \log(D\_{t,\dot{j}}) - (\dot{l}\_{t,\dot{j}} + \dot{a}\_{t,\dot{j}} + \pounds\_{t,\dot{j}} + \dot{w}\_{t,\dot{j}} + \dot{\theta}\_{t,\dot{j}}) \tag{7}$$

**Figure 2.** Observed *log*(*Dt*,21) with superimposed estimated *at*,*<sup>j</sup>* using: (**first row**) Sinusoidal Regression (SR) (**left**), Local regression (L1) (**middle**), L2 (**right**), and (**second row**) L3 (**left**), Regression Splines (RS) (**middle**), and Smoothing Splines (SS) (**right**).

#### *2.2. Modeling the Stochastic Component*

Once the stochastic (residual) component is obtained, different types of parametric and non-parametric time series models can be considered. In our case, from the univariate class, we consider parametric AutoRegressive (AR), Non-Parametric AutoRegressive (NPAR), and Autoregressive Moving Average (ARMA). On the other hand, the Vector AutoRegressive (VAR) model is used to compare the performance of the multivariate model with the univariate models.

#### 2.2.1. Autoregressive Model

A linear parametric Autoregressive (AR) model defines the short-run dynamics of *Rt*,*<sup>j</sup>* taking into account a linear combination of the past r observations of *Rt*,*<sup>j</sup>* and is given by:

$$R\_{t,j} = \mathcal{c} + \beta\_1 R\_{t-1,j} + \beta\_2 R\_{t-2,j} + \dots + \beta\_r R\_{t-r,j} + \varepsilon\_t \tag{8}$$

where *c* is the intercept, *β<sup>i</sup>* (*i* = 1, 2, ... ,*r*) are the parameters of the AR(r) model, and *<sup>t</sup>* is a white noise process. In our case, the parameters are estimated using the maximum likelihood estimation method. After some pilot analysis on different load periods, we concluded that the lags 1, 2, and 7 were significant in most cases and hence were used to estimate the model.

#### 2.2.2. Non-Parametric Autoregressive Model

The additive non-parametric counterpart of AR is an additive model (NPAR), where the relation between *Rt*,*j*, and its lagged values do not have a particular parametric form, allowing, potentially, for any type of non-linearity and given by:

$$R\_{t,j} = \lg(R\_{t-1,j}) + \lg(R\_{t-2,j}) + \dots + \lg(R\_{t-r,j}) + \varepsilon\_{t,j} \tag{9}$$

where *gi* are smoothing functions describing the relation between each past values and *Rt*,*j*. In our case, functions *gi* are represented by cubic regression splines. As in the parametric case, we used the lags 1, 2, and 7 to estimate NPAR. To avoid the so-called "curse of dimensionality", we used the back fitting algorithm to estimate the model [49].

#### 2.2.3. Autoregressive Moving Average Model

The Autoregressive Moving Average (ARMA) model not only includes the lagged values of the series, but also considers the past error terms in the model. In our case, the stochastic component *Rt*,*<sup>j</sup>* is modeled as a linear combination of the past *r* observations, as well as the lagged error terms. Mathematically,

$$R\_{t, \dot{\jmath}} = \mathcal{c} + \beta\_1 R\_{t-1, \dot{\jmath}} + \beta\_2 R\_{t-2, \dot{\jmath}} + \dots + \beta\_l R\_{t-r, \dot{\jmath}} + \varepsilon\_{t, \dot{\jmath}} + \phi\_1 \varepsilon\_{t-1, \dot{\jmath}} + \phi\_2 \varepsilon\_{t-2, \dot{\jmath}} + \dots + \phi \varepsilon\_{t-s, \dot{\jmath}} \tag{10}$$

where *c* is the intercept, *β<sup>i</sup>* (*i* = 1, 2, ... ,*r*) and *φ<sup>j</sup>* (*j* = 1, 2, ... ,*s*) are parameters of the AR and MA components, respectively, and *<sup>t</sup>* ∼ N (0, *<sup>σ</sup>*<sup>2</sup> ). In this case, some pilot analyses suggest that the lags 1, 2, and 7 are significant for the AR part, while only the lag 1 for the MA part, thus a constrained ARMA(7,1) where *β*<sup>3</sup> = ··· = *β*<sup>6</sup> = 0 is fitted to *Rt*,*<sup>j</sup>* using the maximum likelihood estimation method.

#### 2.2.4. Vector Autoregressive Model

In the Vector Autoregressive (VAR) model, both the response and the predictors are vectors, and hence, they contain information on the whole daily load profile. This allows one to account for possible interdependence among demand levels at different load periods. In our context, the daily stochastic component *Rt* is modeled as a linear combination of the past r observations of *Rt*, i.e.,

$$\mathbf{R}\_{\mathbf{t}} = \mathbf{G}\_{\mathbf{1}} \mathbf{R}\_{\mathbf{t}-\mathbf{1}} + \mathbf{G}\_{\mathbf{2}} \mathbf{R}\_{\mathbf{t}-\mathbf{2}} + \dots + \mathbf{G}\_{\mathbf{I}} \mathbf{R}\_{\mathbf{t}-\mathbf{I}} + \mathbf{e}\_{\mathbf{I}} \tag{11}$$

where **Rt** = {*Rt*,1, ... , *Rt*,24}, **G***<sup>j</sup>* (*j* = 1, 2, ··· ,*r*) are coefficient matrices and *<sup>t</sup>* = (*t*,1, ... , *t*,24) is a vector of the disturbance term, such that *<sup>t</sup>* ∼ N (**0**, **Σ***-*). Estimation of the parameters is done using the maximum likelihood estimation method.

Finally, once estimation of both, deterministic and stochastic, components is done, the final day-ahead electricity demand forecast is obtained as:

$$\begin{aligned} \mathcal{D}\_{t+1,j} &= \ \exp\left(\hat{l}\_{t+1,j} + \mathfrak{a}\_{t+1,j} + \mathfrak{s}\_{t+1,j} + \mathfrak{w}\_{t+1,j} + \hat{\mathfrak{z}}\_{t+1,j} + \mathfrak{R}\_{t+1,j}\right) \\ &= \ \exp\left(\mathfrak{F}\_{t+1,j} + \mathfrak{R}\_{t+1,j}\right) \end{aligned} \tag{12}$$

For the stochastic component *Rt*,*<sup>j</sup>* and the final model error *t*,*j*, examples of the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) are plotted in Figure 3 and 4. Note that in the case of *t*,*j*, both ACF and PACF refer to the models when VAR is used as a stochastic model. The reason for plotting the residual obtained after applying the VAR model to *Rt*,*<sup>j</sup>* is the superior forecasting performance of the multivariate model (see Table 1). Overall, the residuals *t*,*<sup>j</sup>* of each model have been whitened. In some cases, residuals still show some significant correlation, but with an absolute value so small that it is useless for prediction.

**Figure 3.** ACF and Partial Autocorrelation Function (PACF) plots for *Rt*,21 (**first row**), ACF and PACF plots for *t*,21 obtained with L1-VAR (**second row**), L2-VAR (**third row**), and L3-VAR (**fourth row**).

**Figure 4.** ACF and PACF plots for *t*,21 obtained with SR-VAR (**first row**), RS-VAR (**second row**), and SS-VAR (**third row**).

**Table 1.** Descriptive statistics for one-day-ahead out-of-sample forecasting: The column represents the estimation of the yearly component through Sinusoidal Regression (SR), Local regression (L1), Local regression (L2), Local regression (L3), Regression Spline (RS), and Smoothing Spline (SS). The row represents the estimation of the stochastic component thorough Autoregressive (AR), Non-Parametric Autoregressive (NPAR), Autoregressive Moving Average (ARMA), and Vector Autoregressive (VAR).


#### **3. Out-of-Sample Forecasting**

This work considers the electricity demand data for the Nord Pool electricity market. The data cover the period from 1 January 2013–31 December 2016 (35,064 hourly demand levels for 1461 days). A few missing observations in the load series were replaced by averages of the neighboring observations. The whole dataset was divided into two parts: 1 January 2013-31 December 2015 (26,280 data points, covering 1095 days) for model estimation and 1 January 2016–31 December 2016 (8784 data points, covering 366 days) for one-day-ahead out-of-sample forecasting.

In the first step, the deterministic component was estimated separately for each load period as described in Section 2.1. An example of estimated deterministic components, as well as of *Rt*,*<sup>j</sup>* is plotted in Figure 5. In the figure, along with the log demand at the top left, the long trend, yearly, seasonal, and weekly components are plotted on top right, middle left, middle right, and bottom left, respectively. Note that the elements of the deterministic components capture different dynamics of the log demand. An example of the series *Rt*,21 is plotted at the bottom right in Figure 5. In the second step, the previously-defined models for stochastic component were applied to the residual series *Rt*,*j*. In both steps, models were estimated and one-day-ahead forecasts were obtained for 366 days using the rolling window technique. Final demand forecasts were obtained using Equation (12).

**Figure 5.** *log*(*Dt*,21) (**top left**), ˆ *lt*,21 (**top right**), *a*ˆ*t*,21 (**middle left**), *s*ˆ*t*,21 (**middle right**), *w*ˆ *<sup>t</sup>*,21 (**bottom left**), and *Rt*,21 (**bottom right**).

To evaluate the forecasting performance of the final models obtained from different combinations of deterministic and stochastic components, three accuracy measures, namely Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) were computed as:

$$\text{MAPE} = \text{mean}\left(\frac{|D\_{t,j} - \hat{D}\_{t,j}|}{D\_{t,j}}\right) \times 100$$

$$\text{MAE} = \text{mean}\left(|D\_{t,j} - \hat{D}\_{t,j}|\right)$$

$$\text{RMSE} = \sqrt{\text{mean}(D\_{t,j} - \hat{D}\_{t,j})^2}$$

where *Dt*,*<sup>j</sup>* and *D*ˆ *<sup>t</sup>*,*<sup>j</sup>* are the observed and the forecasted demand for the *t* th day (*t* = 1, 2, ..., 366) and the *j* th (*j* = 1, 2, . . . , 24) load period.

As within the deterministic component, this work used six different estimation methods for *at*,*j*, whereas the estimation of other elements was the same; six different combinations were obtained. On the other hand, four different models were used to model the stochastic component. Hence, the estimation of both, deterministic and stochastic, components led us to compare twenty four different models. For these twenty four models, one-day-ahead out-of-sample forecast results are listed in Table 1. From the table, it is evident that the multivariate VAR model combined with any estimation technique used for *at*,*<sup>j</sup>* led to a better forecast compared to the univariate models. The best forecasting model was obtained by combining VAR and RS, which produced 1.994, 856.082, and 1145.979 for MAPE, MAE, and RMSE, respectively. VAR combined with SS or with L3 produced the second best results. Within the univariate models, NPAR combined with the spline-based regression models performed better than the other two parametric counterparts. Finally, any stochastic model combined with SR or with L1 led to the worst forecast in their respective classes (univariate and multivariate). Considering only MAPE, a graphical representation of the results for the twenty four combination is given in Figure 6. From the figure, we can easily see that multivariate models performed better than the univariate models. To assess the significance of the difference among accuracy measures listed in Table 1 for different combinations, we performed the Diebold and Mariano (DM) [50] test of equal forecast accuracy. The DM test is a widely-used statistical test for comparing forecasts obtained from different models. To understand it, consider two forecasts, *y*ˆ1*<sup>t</sup>* and *y*ˆ2*t*, that are available for the time series *yt* for *t* = 1, ... , *T*. The associated forecast errors are 1*<sup>t</sup>* = *yt* − *y*ˆ1*<sup>t</sup>* and 2*<sup>t</sup>* = *yt* − *y*ˆ2*t*. Let the loss associated with forecast error {*it*}<sup>2</sup> *<sup>i</sup>*=<sup>1</sup> by *L*(*it*). For example, time t absolute loss would be *L*(*it*) = |*it*|. The loss differential between Forecasts 1 and 2 for time t is then *η<sup>t</sup>* = *L*(1*t*) − *L*(2*t*). The null hypothesis of equal forecast accuracy for two forecast is *E*[*ηt*] = 0. The DM test requires that the loss differential be covariance stationary, i.e.,

$$\begin{aligned} E[\eta\_t] &= \quad \mu\_\prime \quad \forall \, t\\ \text{cov}(\eta\_t - \eta\_{t-\tau}) &= \quad \gamma(\tau), \quad \forall \, t\\ \text{var}(\eta\_t) &= \quad \sigma\_\eta \quad 0 < \sigma\_\eta < \infty \end{aligned}$$

Under these assumptions, the DM test of equal forecast accuracy is:

$$\text{DM} = \frac{\bar{\mathcal{T}}}{\partial \mathfrak{I}\_{\mathfrak{I}}} \xrightarrow{d} N(0, 1)$$

where *η*¯ = <sup>1</sup> *<sup>T</sup>* <sup>∑</sup>*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> *η<sup>t</sup>* is the sample mean loss differential and *σ*ˆ*η*¯ is a consistent standard error estimate of *ηt*.

The results for the DM test are listed in Table 2 and Table 3. The elements of these tables are *p*-values of the Diebold and Mariano test where the null hypothesis assumes no difference in the accuracy of predictors in the column and row against the alternative hypothesis that the predictor in the column is more accurate than the predictor in the row. From Table 2, it is clear that the multivariate VAR models outperform their univariate counterparts. When looking at the results of VAR using different methods of estimation for *at*,*<sup>j</sup>* in Table 3, it can be seen that, except for SR-VAR and L1-VAR, the remaining four combinations had the same predictive ability. In the case of SR-VAR and L1-VAR, the remaining four combinations performed statistically better.

**Figure 6.** One-day-ahead out-of-sample MAPE for electricity demand using SR, L1, L2, L3, RS, SS, AR, NPAR, ARMA, and VAR.

**Table 2.** *p*-values for the Diebold and Mariano test. *H*0: the forecasting accuracy for the model in the row and the model in the column is the same; *H*1: the forecasting accuracy of the model in the column is greater than that of the model in the row.



**Table 3.** *p*-values for the Diebold and Mariano test. *H*0: the forecasting accuracy for the model in the row and the model in the column is the same; *H*1: the forecasting accuracy of the model in the column is greater than that of the model in the row.

The day-specific MAPE, MAE, and RMSE are tabulated in Table 4. From this table, we can see that day-specific MAPE was relatively higher on Monday and Sunday and smaller on other weekdays. As the VAR model performed better previously, the day-specific MAPE values for this model were considerably lower compared to univariate models, except on Wednesday, Thursday, and Friday. For these three days, both the univariate and multivariate models produced lower errors. The same findings can be seen by looking at day-specific MAE and day-specific RMSE. The day-specific MAPE values are also depicted in Figure 7. The figure clearly indicates that the MAPE value was lower in the middle of the week and was higher on Monday and Sunday.

**Figure 7.** Day-specific MAPEs for all stochastic component models: AR, NPAR, ARMA, and VAR.


**Table 4.** Electricity demand: hourly day-specific MAPE, MAE, and RMSE.

To conclude this section, the hourly RMSE and forecasted demand for the best four combinations including one for each stochastic model is plotted in Figure 8. From the figure (left), note that hourly RMSE are considerably lower at the low load periods, while they are high at peak load periods. Further, note the best forecasting performance of the SR-VAR model compared to the competing stochastic models. For these models, the observed and the forecasted demand are also plotted in Figure 8 (right). The forecasted demand was following the actual demand very well, especially when VAR was used as a stochastic model. Thus, we can conclude that the multivariate model VAR outperformed the univariate counterparts.

**Figure 8.** (**left**) Hourly RMSE for: RS-AR (solid), RS-NPAR (dashed), RS-ARMA (dotted), and RS-VAR (dotted-dashed). (**Right**) Observed demand (solid) and forecasted demand for: RS-AR (dashed), RS-NPAR (dotted), RS-ARMA (dotted-dashed), and RS-VAR (long dash).

#### **4. Conclusions**

The main aim of this work was to model and forecast electricity demand using the component estimation method. For this purpose, the log demand was divided into two components: deterministic and stochastic. The elements of the deterministic component consisted of a long trend, multiple periodicities due to annual, seasonal, and weekly regular cycles, and bank holidays. Special attention was paid to the estimation of the yearly seasonality as it was previously ignored by many authors. The estimation of yearly components was based on six different estimation methods, whereas other elements of the deterministic component were estimated using ordinary least squares. In particular, for the estimation of annual periodicity, this work used the sinusoidal function-based model (SR), the local polynomial regression models with three different kernels: tri-cubic (L1), Gaussian (L2), and Epanechnikov (L3), Regression Splines (RS), and Smoothing Splines (SS). For the stochastic component, we used four univariate and multivariate models, namely the Autoregressive Model (AR), the Non-Parametric Autoregressive Model (NPAR), the Autoregressive Moving Average model (ARMA), and the Vector Autoregressive model (VAR). The estimation of both, deterministic and stochastic, components led us to compare twenty four different combinations of these models. To see the predictive performance of different models, demand data from the Nord Pool electricity market were used, and one-day-ahead out-of-sample forecasts were obtained for a complete year. The forecasting accuracy of the models was assessed through the MAPE, MAE, and RMSE. To assess the significance of the differences in the predictive performance of the models, the Diebold and Mariano test was performed. Results suggested that the component-wise estimation method was extremely effective for modeling and forecasting electricity demand. The best results were produced by combining RS and the VAR model, which led to the lowest error values. Further, all the combinations of the multivariate model VAR completely outperformed the univariate counterparts, suggesting the superiority of multivariate models. Within the combination of VAR, however, the results were not statistically different for all models.

**Author Contributions:** Conceptualization and Methodology I.S. ; Software, S.A.; Validation, I.S., S.A.; Formal Analysis, H.I. and I.S.; Investigation, H.I.; Resources, D.W.; Data Curation, S.A.; Writing−Original Draft Preparation, I.S.; Writing−Review & Editing, I.S. S.A., and D.W.; Visualization, H.I. and S.A.; Supervision, I.S.; Project Administration, I.S.; Funding Acquisition, S.A. and D.W.

**Funding:** The work of Ismail Shah is partially funded by Quaid-i-Azam University, Islamabad, Pakistan through university research fund.

**Acknowledgments:** The authors would like to thank Francesco Lisi, department of statistical sciences, University of Padua, for his valuable suggestions and comments. We are also grateful to the anonymous referees for their constructive comments, which greatly improved the manuscript.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Review*
