**Computational Intelligence on Short-Term Load Forecasting: A Methodological Overview**

### **Seyedeh Narjes Fallah 1, Mehdi Ganjkhani 2, Shahaboddin Shamshirband 3,4,\* and Kwok-wing Chau <sup>5</sup>**


Received: 17 December 2018; Accepted: 25 January 2019; Published: 27 January 2019

**Abstract:** Electricity demand forecasting has been a real challenge for power system scheduling in different levels of energy sectors. Various computational intelligence techniques and methodologies have been employed in the electricity market for short-term load forecasting, although scant evidence is available about the feasibility of these methods considering the type of data and other potential factors. This work introduces several scientific, technical rationales behind short-term load forecasting methodologies based on works of previous researchers in the energy field. Fundamental benefits and drawbacks of these methods are discussed to represent the efficiency of each approach in various circumstances. Finally, a hybrid strategy is proposed.

**Keywords:** short-term load forecasting; demand-side management; pattern similarity; hierarchical short-term load forecasting; feature selection; weather station selection

#### **1. Introduction**

Short-Term load forecasting (STLF) is an integral part of the energy planning sector. Designing a time-ahead power market requires demand-scheduling for various energy divisions, namely, generation, transmission, and distribution. STLF helps power system operators with various decision-making in the power system, including supply planning, generation reserve, system security, dispatching scheduling, demand-side management, financial planning, and so forth. While STLF is particularly essential for the time-ahead power market operation, inaccurate demand forecasting will cost the utility a tremendous financial burden [1].

Traditionally, engineering approaches were employed to predict the future demand manually with the help of charts and tables. These traditional methods mainly considered weather impacts as well as calendar effects. Nowadays, these features are still determined for developing load models with novel methods [2].

With the advent of statistical software packages and artificial intelligence techniques, several outstanding pieces of research are devoted to statistical [3] and computational intelligence (CI) approaches [4] to model the future load. Some examples of statistical regression-based STLF approaches in the literature, including auto-regressive moving average (ARMA) [5,6], auto-regressive integrated moving average (ARIMA) [7], and seasonal ARIMA (SARMIA) [8]. Artificial neural network

(ANN) [4], support vector machine (SVM) [9], fuzzy logic [10], etc., are considered prevailing CI-based forecasting techniques.

CI-based load models, regardless of underlying computational algorithms, can be further categorized into several methodological outlines. Correspondingly, it must be acknowledged that different forecasting techniques cannot be interpreted as different methodological approaches. A method is defined as a structured procedural solution designed for specific cases of forecasting practices; while a technique refers to a certain model that can be categorized with all other similar models in one technical category such as regression or neural network techniques. For example, Fan & Hyndman [11] and Mandal et al. [12] both applied ANN architecture to develop a 24-hour ahead load model whereas different methodological approaches were considered in each of these papers. In [11], a stepwise method, which locates the lowest error in the model, is applied for selecting the optimal subset of variables including the historical load and meteorological variables. However, in [12], only daily load profiles similar to day-ahead load recognized by a similarity index (similar day type and similar weather) are fed into the engine. The solution is not always narrowed down to the technique that the forecasters use. Instead, the strategy to implement those techniques is important as well.

Generally, both methods and techniques are important when it comes to accurate estimation. However, limited literature is available for STLF methodologies. Most surveys in the literature are devoted to the investigation of different STLF techniques [13–16]. For example, Mogharm et al. [14] investigated STLF techniques by classifying them into two categories of statistical approaches and CI-based techniques. Hippert et al. [13] reviewed ANN-based STLF. Although these surveys addressed most applicable STLF techniques, this still might be unclear for young researchers to understand the merits behind developing any specific load model.

This paper explains the main framework of state-of-the-art methodologies applied for the CI-based STLF via examples of several case studies. A comprehensive overview of technical and computational difficulties for STLF is presented as well as the proposed strategies by various researchers to unravel them. These strategies are categorized into four main groups based on their identical topologies. The robustness of each method to deal with different type of load data is identified.

The rest of the paper is organized as follows. Section 2 presents a general overview of four principal methodologies, followed by four subsections wherein details of each method are fully described. Section 3 discusses the main advantages and disadvantages of STLF methods. Moreover, in Section 3, advantages of hybrid methods are highlighted, and a hybrid method is proposed. Finally, the concluding remarks are drawn in the last section.

#### **2. STLF Methodologies**

Load forecasting can be conducted by various methodologies. The selection of a forecasting method relies on several factors including the relevance and availability of historical data, the forecast horizon, the level of accuracy for weather data, desired prediction accuracy, and so forth. Accordingly, selecting the proper load forecasting approach primarily depends on the time horizon of the prediction.

Different time horizons are adopted for load forecasting based on their specific applications in power system planning. For instance, the distribution and transmission planning are involved with STLF, while for longer durations, i.e., more than a year up to a few decades, the load prediction provides a decision platform for financial or power supply planning. Likewise, the required level of accuracy in these time horizons are not equal, as the decisions in the long-term are preliminary and may need significant changes in subsequent planning stages due to very uncertain input information while a short-term forecast provides information to the day-ahead market, which requires a high-level of accuracy. Moreover, different kinds of predictor variables are considered for each horizon of the forecast. For example, a long-term load forecasting takes into account population variations, economic development, industrial construction, and technology development whereas a short-term forecast mainly considers calendar variables, weather data, and customers' behaviors [17].

Generally, both time categories of load forecasts are important for power system operation and development especially by the integration of distributed generators into the grid, which adds further intermittency and vulnerability in power provision. This study exclusively investigates the STLF approaches by reviewing original papers in this field. Even though some of the artificial intelligence (AI) techniques that are used for STLF might be also applicable for long-term load forecasting, on a methodological foundation, they are not comparable due to the aforementioned reasons.

Hong and Fan [2] identified four general categories of STLF methodologies, which can be applied to several techniques to solve the STLF problem. The four categories, i.e., similar day, variable selection, hierarchical forecasting, and weather station selection are specified based on different realizations of forecast problem. For example, similar-day method determines the load data as a sequence of various similar daily load profiles, while variable selection method presumes that the load data behaves like a series of variables either correlated or independent from each other. Hierarchical method, on the other hand, considers the data as an aggregated load, which is highly varying by changes in the load at lower levels of the hierarchy. Finally, weather station selection is a method, which determines the best-fitted weather data into the load model.

Hong and Fan outlined these general methodological approaches in a review [2], describing two or three examples of each, while an extensive literature of STLF has not been assessed accordingly. More investigation of the adopted STLF methodologies in the literature reveals that there are some novel approaches that could further be subcategorized into the four root categories. For example, the classic similar-day method, which distinguished the similarity between daily load profiles by assigning the day type index, was later developed as similar-pattern method, in which similar load profiles are extracted by using either a minimization algorithm or clustering techniques. Other novel approaches, such as pattern-sequence and sequence learning, are also recognized to be in the category of similar-pattern method, if their algorithms try to find or learn similar sequences of patterns within the dataset.

Moreover, the majority of STLF researchers chose the variable selection method, while different algorithms were employed for selecting prominent features of candidate variables. This research distinguished five state-of-the-art feature selection approaches for STLF. These approaches are specifically important to create the optimal subseries of data and leading to more accurate results.

Another category of STLF methodologies is assigned to hierarchical short-term load forecasting (HSTLF), which has been limitedly addressed in the literature. HSTLF methodology addresses forecasting at several levels of aggregate data. Although at each aggregate level, other methodologies, i.e., similar pattern and variable selection are used for individual predictions, the novelty of HSTLF is related to the applied combination method. Thus, four approaches are identified for HSTLF while each one proposes a different strategy. The classical top-down and bottom-up approaches are two common algorithms for hierarchical forecasting, with the latter aggregating the data and the former aggregating the forecast. Yet, recent approaches for hierarchical forecasting, i.e., weighted combination and ensemble model try to capture the model at each aggregate level individually and find the correlation between the individual models at different levels of the hierarchies. Despite limited literature, HSTLF lately received more attention by distribution and transmission operators for power system control and planning. It takes into account recent advances in communication infrastructures for remote measurement and automated metering, which enables operators with high granular data at user ends. Thus, the most recent challenges of HSTLF methodology is highlighted in this study to help young researchers find the competing research direction in this field.

By drawing attention towards HSTLF, a question might come to minds that by aggregating the data, what happens to other exogenous variables such as meteorological variables, as they cannot be aggregated. This challenge was raised for the first time by Hong and Pinson [18] and a competition was launched to address this question. Results of this competition are further discussed in this paper to draw a conclusion.

Figure 1 shows the tree diagram of these four forecasting methods. As can be seen, each method can be carried out via multiple strategies. For example, there are various approaches to predict a hierarchical structure including bottom-up, top-down, ensemble, and weighted combination. A full description of these four recognized categories of STLF methodologies is presented in the following subsections with examples of several case studies.

**Figure 1.** Tree diagram of the STLF methods.

#### *2.1. Similar-Pattern Method*

Similarity-based methods are generalized forms of minimum distance approaches applied in machine learning and pattern recognition. These methods have also been used for STLF by finding similar demand patterns within the data set and predicting the future load using interpolation or weighting [19]. There are different strategies for finding similar load profiles; in the simplest case, it can be achieved by assigning a similarity index to the type-of-the-day in the calendar or meteorological factors. Similar patterns will then be achieved by searching between those days with similar indexes. Searching space is generally within a close neighborhood, although sometimes annual lagged data is also determined. For example, Dudek et al. [20] developed a similarity-based forecasting model by using the similarity between seasonal patterns of a load time series based on the calendar-lagged load data. The search space in [20] was limited to the nearest neighbor of the forecast day as well as the nearest neighbor of the same calendar day in the previous year. In fact, assigning the day-of-the-year index besides the weekday index is essential to avoid seasonal variations. A typical search space for similar-day method is illustrated in Figure 2.

**Figure 2.** Limitation of search space for similar days.

Figure 3 illustrates the methodology applied by Dudek et al. [20]. In the first step, days similar to the forecast day with similar weekday and day-of-the-year indices are extracted from the load time series (first series). Thereupon, a sequence of days following these similar days (second series) is created. In the second step, days with similar patterns within the first series (similar-day series) are chosen by a selection strategy, and those followed by these newly selected days within the second series (sequence series). The outcome of the third step is a regression model of load data extracted from the sequence series. Eventually, the load of the next day in the original time series is forecasted by decoding the final model.

**Figure 3.** Similar day-based prediction algorithm developed by Dedek et al. [20].

Besides the calendar index as the similarity indicator, other characteristics such as weather similarities can be considered as well. For instance, Ying Chen et al. [21] proposed a similar-day selection method based on the weather similarity of the forecast day. In their proposed method, which was designed to forecast the load in a short-term period (two working days excluding the weekend) by hourly resolution, the search for the similar days was limited to days with the same weekday and weather indices to the forecast day. Days with similar weather condition were selected based on a minimization process, while the meteorological condition was defined by wind chill, temperature, humidity, wind speed, and cloud cover variables. In addition, the same index was assigned for some of the weekdays with similar load pattern. It has also been shown that relying only on similar days' data without establishing the initial status of tomorrow's demand leads to an inaccurate forecast result. Thus, the 24-hour today's load has been fed as an input to the forecasting engine. Figure 4 illustrates the schematic diagram of similar-day method developed in [21].

As already mentioned, the selection of similar load profiles between days with similar indexes (weekday, the day-of-the-year and weather indexes) can be made by a distance minimization technique. Some works in the literature applied Euclidean norm to measure the match level between similar days [12,21,22]. As listed in Table 1, Chen et al. [21] used the Euclidean norm to evaluate the weather similarity between the forecast day and previous days. Senjyu et al. [22] also applied a weighted Euclidian to investigate the similarity of load patterns using load deviations between forecast day and historical days, weather deviation, and the slope of load deviations. The assigned weights (*w*) in Equation (2) is determined based on a regression model using the trend of load and temperature.

**Figure 4.** Schematic diagram of similar-day method developed by Chen et al. [21].


Dynamic time warping (DTW) is another method to measure the similarity for those time series with similar values not exactly at the same time point. Using DTW method might end up finding several similar patterns of load profiles within the dataset. Teeraratkul et al. [23] indicated that by using DTW method, the number of groups for similar profiles reduced by 50%.

More recently, clustering algorithms are used to find similar sequence of load patterns within the dataset [24,25]. These clustering techniques are used to group data into a specific number of categories of daily load patterns, which were termed pattern-sequence-based STLF method. Under this method, a label indexes the load for each day in the dataset. Consequently, a sequence of labels is created in the dataset. Alvarez et al. [26] applied K-means clustering technique to create different clusters of load patterns and extracted a sequence of labels from the dataset as a pattern to search and predict the next day's load. A schematic diagram of pattern-sequence-based forecasting method is depicted in Figure 5. According to Figure 5, all weekdays in a dataset are labeled by using a clustering method. To predict the next day's load, a window of a sequence of labels before the forecast day is selected. By using this window, similar sequence of labels is searched within the dataset. Eventually, the load of the target day can be predicted by averaging the next day's load of the discovered sequences.

The prevalence of smart meters in a smart grid facilitated market planners with fine-grained data in hourly and sub-hourly resolution. Load profiles at the customer-end provide sophisticated information about the type of customers and their consumption behaviors. Quilumba et al. [27] used a clustering technique to group smart meter customers according to their similar energy pattern consumption. Temperature information was interpolated between neighbor values to become as granular as the smart data.

**Figure 5.** Schematic diagram of pattern sequence-based forecasting method [26].

Clustering methods can distinguish similar sequences within a dataset as discussed earlier; however, they cannot differentiate among the main features of these patterns. More recently, adding memory to the structure of learning engines such as recurrent neural network and deep learning, outweighed this drawback.

Liu et al. [28] considered sequence learning approach for developing a load model by using recurrent neural network structure (RNN). Kong et al. [29] recommended that long short-term memory of RNN was a powerful engine to learn the look back sequences due to its memory cells, remember important features, and forget gates to reset the cells for redundant features. Shi et al. [30] applied deep RNN to map the sequence of input data into the corresponding output sequence. Zheng et al. [31] proposed a hybrid method, by applying a clustering technique to capture similar days within a dataset, and then used the sequence-to-sequence structure of the long short-term memory structure to adjust the length of the input and output sequences. A sequence-to-sequence structure was primarily designed to map sequences with different length [32,33]. Marino et al. [33] suggested that the main advantage of the sequence-to-sequence structure was related to its ability to predict an arbitrary number of future time steps having an arbitrary length of an input sequence. Satish et al. [34] investigated the optimum learning sequence for the training stage. Results indicated that the number of patterns in a sequence affected the accuracy of the model.

Table 2 lists some highly cited publications in which similar-pattern method was applied for load prediction. These publications are categorized based on three common techniques, namely, "similar-day", "pattern-sequence", and "sequence learning".

In general, pattern similarity method is an efficient approach to capture repeated patterns of the load series in the short term. The overall pattern of a system is rarely changing in the short term; however, in longer periods, some significant deviations might lessen the similarity of future load to past load.


**Table 2.** Published articles employing similar-pattern method.

#### *2.2. Variable Selection Method*

Variable selection is the process of selecting the most influential variables or features (predictor variables) within the dataset while they can adequately capture the relationship between the available data and the output. Despite time series forecasting relies only on past data, variable selection method determines external variables besides historical load in order to embed into the model [42]. Some of these external variables, which are termed explanatory variables to explain the reason of load fluctuations, are calendar variables (time of the day, day of the week, month of the year, and day of the year etc.), meteorological variables (temperature, humidity, cloud cover, wind chill, solar radiation etc.) and so forth [43].

Several studies also considered the lagged load data into their model [44,45]. The lagged variables determine the recency effect by incorporating alteration of demand level throughout load time series into the model. For example, Ceperic et al. [44] proposed a feature selection algorithm to select the optimum number of lagged loads in order to embed the sequential correlation of load variables into the model. Another example is the work of Fan and Hyndman [11], which considered the following variables as candidate predictors: the lagged load demand for each of the preceding 12 hours, lagged values for the same hours of the two previous days, maximum and minimum load values in the past 24 hours, and the average load in the preceding week. Consequently, a selection algorithm was applied to choose between potential variables and create a subset of optimal predictor variables.

Besides the lagged demand, some studies embedded lagged temperatures as input variables. The electricity demand is remarkably impacted by the recent temperature as well as the current temperature. That is why in the forecasting model developed by Fan and Hyndman [11], besides the lagged demand, the current and 12-hour lagged temperature for the preceding day and the former two days were involved in the model. However, the main concern about weather variables was its level of validation, which depends partly on the weather station selection. It is discussed more in Section 2.4.

By nominating multiple input variables and considering a large amount of available data for every variable, the predictor engine might not be able to converge to an accurate predictive model. Therefore, an effective subset of the data with the optimal number of predictor variables will help the forecast accuracy [46]. An efficient predictor variable is highly explanatory and independent of other variables. The aim is to select the optimal subset of predictor variables with fewer numbers, which suitably describes the characteristics of the output variable. Optimal input subset favors model accuracy as well as cost efficiency and model interpretability [47].

In the literature, researchers employed different methods and techniques to select explanatory variables optimally.

One of the methods used for variable selection is the stepwise refinement which is a step by step approach for input selection. In this method, the primary model is a full model consisting of all measured variables. Hence, based on the predictive capability of individual variables, redundant terms from the model are omitted. The retained variables consequently lead to the best model. One example is the work of Fan and Hyndman [11], who carried out a step by step variable selection method to extract the best-suited model. The nominated inputs were the calendar variables, actual demand and lagged demand (from the National Electricity Market of Australia-NEM), and forecasted temperature data from more than one site in the target area. Assuming the selection of temperature differentials, in the first step, the temperature differentials form the same period of the last six days were dropped

one at a time, and the one leading to the lowest error was selected. Consequently, in the next step, the temperature variable was frozen to only the selected day from the previous step, and temperatures of the last six hours were considered for the trial. This procedure was continued until the final group of variables was selected.

Nedellec et al. [48] followed the same strategy of stepwise refinement for variable selection as well, but in a three-step procedure while the variables in each stage were selected based on the scale of forecast. In a long-term module, monthly load and temperature time series for every region and weather station were selected to extract long-term trend and low-frequency effects. The residual of the first stage with no seasonality and weather effects were considered for a medium-term estimation. Variables such as a type-of-the-day, type-of-the-year, de-trended electrical load, real temperature, and lagged temperature were predictor variables in this medium-term model. In a short-term stage, more localized factors, which remained from previous stages, were captured by selecting variables such as year, month, day, hour, time-of-the-year, and day type as well as real and smoothed weather variables. This stepwise algorithm is illustrated in Figure 6 for better understanding. As can be seen, the final forecasted load is an additive model of three components.

**Figure 6.** Stepwise algorithm for STLF [48].

Xiao et al. [49] also developed an ensemble load model by applying a group of STLF techniques to capture the trend of the load series. Consequently, the highly nonlinear characteristics of the residual subseries were modeled by using various data handling techniques.

Moreover, there are other approaches to identify the maximum relevance between different variables. Correlation-based methods use a heuristic algorithm to find a subset of variables, which are highly correlated with the output but are not correlated with each other [50]. Chen et al. [9] used correlation method to measure the dependency of the peak demand to temperature. Kouhi et al. [51] developed a correlation-based feature selection method to reduce chaotic structure of load time series and selected highly relevant variables within this reconstructed space. Amjady et al. [3] used a correlation approach to create a subseries of load data to develop a hybrid forecast model.

Mutual information (MI) is an information theoretic-based approach to measure the interdependency between two random variables. If MI is zero, two variables are independent and contain no mutual information about each other. Higher MI values indicate higher relevance with more information about the target feature [52]. Wang et al. [53] used MI method to obtain initial weights of the developed ANN-based load forecast model. Elattar et al. [54] reconstructed a load time series by embedding

dimension and time delay computed by MI approach. Young-Min Wi et al. [55] adopted MI method to evaluate mutual information between dominant weather features and loads at different seasons.

Moreover, filtering methods can be applied to data to find the correlation among variables independent of any learning machine. Filter-based feature selection algorithms use general characteristics of the training data, i.e., statistical dependencies to select highly ranked features by applying a threshold for the number of features [56]. Reis et al. [57] applied wavelet filter to reconstruct a subseries of data after selecting input variables by using autocorrelation function. Amjady et al. [58] proposed a hybrid load prediction algorithm, in which a filter-based technique was selected for a minimum subset of inputs. Zhongyi Hu et al. [59] proposed a hybrid filter method for feature selection procedure.

More recently, developing bio-inspired optimization tools as well as evolutionary optimization algorithms led to improvement of CI-based feature selection techniques for STLF. Some examples of developed optimization algorithms for feature selection in the literature include ant colony [60], particle swarm [61,62], differential evolution [63], hybrid genetic and a colony [64] and so forth.

Some of the highly cited publications for STLF, which are categorized based on the applied feature selection techniques, are listed in Table 3.


**Table 3.** List of publications employing different feature selection techniques.

Selecting proper variables is sometimes time-dependent, while variables have significant impacts on load behavior of several hours and subtle effects on loads of other hours during a 24-hour period. Thus, a suitable architecture for a forecasting engine can provide a simpler model to decrease the number of redundant data [70]. A general idea is that instead of creating one subseries of data, different subsets of variables can be created for each category of time, while data in each category is affected by the same variables. For example, Khotanzad et al. [71] proposed two different parallel architectures for load forecasting. The first design, as illustrated in Figure 7, was a three-module structure to model hourly, daily, and weekly trends. In their developed architecture for prediction of the hourly load of the next day, each of three modules would be trained by 24 ANN engines. Each of them represented an hour of a day. The second architecture divides 24 hours into four categories, i.e., 1–9, 10–14 and 19–22, 15–18, and 23–24 while different input variables are determined for each group of hourly loads, as depicted in Figure 7.

Some other papers in the literature also applied the so-called parallel architecture for 24-hour-ahead load forecasting [44,72]. The reasons for using this design are smaller number of training data for each module with omitted parameters for each hour of the day, and a simpler model for each hour of the day, compared to a general model for all 24 hours.

In overall, developing an explanatory model via variable selection method is appropriate when forecasters have fundamental knowledge about the system. To forecast the variable of interest, one needs to identify different exogenous variables. Generally, there are no rules implied for the selection of input variables. The forecaster's experiences in analyzing the type of data from a specific market as well as a preliminary testing might help to select a proper group of variables. Thus, professional judgment is undoubtedly part of the process.

**Figure 7.** Parallel architecture for 24-hour ahead forecasting proposed in [71].

#### *2.3. Hierarchical Forecasting*

Previous methods presume load data as single time series, while these time series can be inherently disaggregated by different attributes of interest [42]. Load time series naturally are organized based on different hierarchies such as geographic, temporal, circuit connection, and revenue. Figure 8 depicts a typical hierarchical structure of a time series divided into aggregate and disaggregate levels.

**Figure 8.** Schematic diagram of hierarchical structure of a load time series.

An example of hierarchical load structure can be found in a study conducted by Zhang et al. [73]. The load data was recorded consumption of three hundred smart meter customers of a subsection in Australian utility within three years. The customers were clustered into 30 nodes according to their postcodes. These 30 nodes were grouped into three nodes. Besides, these three nodes were summed up at the final level to an aggregated time series. In the distribution level, however, the hierarchical levels were specified as load of substations, feeders, transformers and, customers [74].

Recently, there has been a prevailing attention to HSTLF due to market considerations for decision-making in different levels of the power system including independent system operator, distribution operator, and customer-end. Utilities require load forecasting at low voltage levels to effectively perform distribution operation such as circuit switching and load control. An accurate load forecasting at low level could even increase the prediction accuracy at independent system operator level [75]. In fact, the independent system operator in the upper level in a power system covered a large geographical area, with extensive load diversities throughout the area. Hence, a single model was not able to guarantee the prediction accuracy.

The state-of-the-art HSTLF methods to address hierarchical load structure are sub-grouped into bottom-up and top-down approaches [27,76]. The bottom-up approach aggregates forecasts from low level to aggregated level, while the top-down method aggregates historical load prior to forecasting. The former approach does not miss out any information due to the aggregation, although high volatility of bottom level is challenging for prediction [77]. The top-down method, on the other hand, is simpler for less noisiness due to the aggregation. However, some features of the individual series are lost [42]. For instance, Quilumba et al. [27] used the bottom-up approach for forecasting load of the customers disaggregated by similar consumption patterns.

Some of the advantages and disadvantages of bottom-up and top-down approaches were highlighted by Hyndman et al. [78] who referenced early works in the literature. Generally, the bottom-up approach was robust when the data in bottom level was reliable without missing information. Otherwise, the forecast at a low level was error-prone and the top-down approach resulted in a more accurate forecast. Overall, the superiority of a method over another was not uniform.

HSTLF can also be conducted at all levels of hierarchies individually, which is termed "base forecast". However, the challenge here is that the prediction at aggregated level might not be consistent with the summed base forecasts [79].

Zhang et al. [73] proposed a solution to optimally adjust base forecast at each node in order to be consistent across the aggregation structure. This goal was accomplished by minimizing the redundancy between the forecast at the aggregated level and the sum of the base forecasts, by using quadratic programming in a post-processing scheme. The method was tested on two electricity networks; one bulk system of a large area with several dispatch zones at the bottom level, and the other was a distribution network covering a small area with hundreds of individual customers. Results indicate that for more than 85% of nodes in the bulk network, the proposed method was more accurate. For distribution network with more volatile load, the improvement was more obvious, especially at upper aggregated level where the error was significantly decreased. Nose-Filho et al. [80] also developed a load model for a sub-distribution system in New Zealand by finding participation factors between local forecasts and global forecast.

Another example is the study by Fan et al. [81], who proposed a strategy to forecast load of sub-regions within a large geographical area independently by finding the optimal region partition in the combination procedure. It was reported in [81] that the weather condition was a dominant factor for load variations and therefore, in a large geographical region, the extreme weather condition throughout the area caused high load diversity. Another factor that rendered regional load profiles vastly different was identified in [81] as non-coincident load peaks.

Sun et al. [74] proposed a strategy to predict loads of different nodes in a power distribution system by a top-down approach. Firstly, loads of parent nodes were forecasted. Subsequently, by finding the similarity between the parent node (aggregated level) and its child nodes (correspondent disaggregated levels), two classes of regular and irregular nodes were identified. Thus, for regular nodes, the load is a fraction of the origin load computed by a distribution factor. For those irregular loads, which did not follow leading characteristics of the parent node, individual models were forecasted. The similarity between nodes was identified by using distance minimization method for both weather parameter and historical load.

More recently, with the dominance of smart meters, fine-grained data at sub-levels revealed more information at the aggregate level. Wang et al. [82] used granular smart meter data to construct a forecast model at an aggregated level. In their proposed model, data was clustered into different groups of loads with similar patterns, and the aggregated forecast was obtained by adding the forecast of individual clusters. However, instead of the bottom-up strategy, a weight was assigned to each model while varying the number of clusters. The final forecast was an optimally weighted combination of these individual forecasts. Their proposed method was implemented on a data set consisting of 5237 residential consumers' information with half-hourly resolution for 75-week duration. It was shown that results of the direct aggregated load were more accurate than the clustering strategy although their proposed methodology outweighed the conventional bottom-up method. Besides this data set, the method was tested on 155 substations' load data for a 103-week duration. In contrast to the first data set, the outcomes of the forecast on the second dataset indicated that the bottom-up model was more accurate than other individual clustering models. It was concluded that this contrast was due to regularity in substation load in comparison to residential load profiles.

Table 4 illustrates two combination methods, which were applied to sum up base forecasts for maintaining its coherence with the aggregated forecast. Both of these methods minimized the error between the summed up base forecasts and aggregated forecast, either by linear [82] or quadratic [73] programming. Other combination methods were discussed in [83] with further theoretical explanations. This is suggested that new HSTLF methods might be expressed by selecting an appropriate combination algorithm.


**Table 4.** Combination methods for base forecasts.

Different levels of a hierarchical structure interacted with each other in a complicated fashion, whereas a change in one series at one level could sequentially change the series at the same level as well as other levels of a hierarchy. Sun et al. [74] considered the change that switching operation might cause on the load trend by adjusting the forecast whenever a switching was detected. Abnormal changes in the demand were identified by measuring the mean and standard deviation of the load by using statistical process control. The load participation factor was then computed based on the new data. Comparably, deviations in the meteorological conditions in a large geographical area caused base forecasts to vary, leading to changes in the aggregated load accordingly. However, meteorological information might not be available at every sub-level. There were usually several meteorological services available at a geographical area for providing weather forecast information. Hong et al. [18] recommended that, in a hierarchical structure with various nodes to be forecasted, the best-related weather information could not be selected manually for each node. Weather station selection method was one of the main objectives in the Global Energy Forecasting Competition 2012 (GEFC) [84]. More about this is discussed in the next section.

#### *2.4. Weather Station Selection*

In a large electricity market covering an expanded area, a single forecasting model cannot capture the load pattern. HSTLF method, which is discussed in the previous section, ensures a more satisfactory forecast across different levels of hierarchy. However, in HSTLF method that disaggregates the load based on geographical divisions or zonal hierarchies, meteorological hierarchies that are definitely a dominant factor in load diversity cannot be easily captured. The challenge is to assign the most related weather station information to each zone or area in the hierarchy.

Fan et al. [81] proposed a combination method to select the best adapted individual weather forecast between multiple forecasts provided by different meteorological services. Several papers in the literature [85,86] used the average data from multiple services for its simple and effective result compared to other weighted averaging methods.

In Hong & Pinson's planned competition (GEFC competition) [18], weather station selection was one of the addressed issues. Data provided in the competition was the hourly load history of 20 zones in the U.S. along with weather data gathered from 11 weather stations, without specifying locations of weather stations.

Among the winning teams, Charlton et al. [87] built 11 energy models for each zone based on the weather data of 11 weather stations provided in the competition. The best-fitted weather station for each zone was not a single station, rather, it was a linear combination of up to five best-fitting weather stations for each group. Lloyd [88] also developed a forecast model based on data from all weather stations and used a Bayesian model averaging to integrate these models into one final average model. Moreover, in the proposed model by Nedellec et al. [48], one station was selected for each zone, considering that other combination strategies led to unsatisfactory outcomes. Taieb et al. [89] selected the best-fitted station for each zone by testing the temperature data from previous week for each zone. The demand was modeled by using average temperature data of three best weather sites. Hong et al. [18], on the other hand, proposed a method for weather station selection that, instead of assigning the same number of weather station to all nodes at the same level of hierarchy (as it was the common strategy in the GEFC competition), different numbers of weather stations were selected for individual load zones. Yet, the result was not always superior to other alternatives.

#### **3. Method Evaluation and Future Work**

A comprehensive explanation of STLF methodologies is provided in the previous sections. Generally, the logic behind every specific method helps the forecaster to choose the best-fitted method based on their application. For example, similar-pattern method mainly relies on historical values, whereas variable selection method incorporates information about explanatory variables. Therefore, the forecaster might consider similar-pattern method in cases where the system might not be comprehensive enough, or if it is explanatory, it is extremely difficult to extract the main features that govern the demand behavior. In this situation, there are always some variations in the load that cannot be captured by explanatory variables. In similar-pattern strategy, on the other hand, the focus is on what is going to happen rather than why it happens. Still, when there is a correlation between exogenous variables and load data, explanatory model, i.e., variable selection method is an appropriate approach.

Some of the main advantages and disadvantages of these four methods are listed in Table 5. For example, in variable selection method, despite efforts to find independent variables in the dataset by using feature selection algorithms, the selected variables might still be partly correlated with each other. This matter is expressed as one of the drawbacks in Table 5. Similar-pattern method, on the other hand, presumes that the past values of a variable are important in predicting the future, although the algorithms can only look back for a few steps for a limited sequence of data.

Despite the unique characteristics of these four categories of STLF methodologies, they were not independent of each other and there might be some overlap between them. For example, in similar-pattern method, the similarity of exogenous variables such as temperature or humidity were used to find similar patterns [21]. Consequently, selection of highly correlated exogenous variables is essential for detecting similar load patterns within a dataset.

Sometimes the selection of exogenous variables in variable selection method was conducted by using similarity method. For example, Fujimoto et al. [90] applied the minimum distance technique to find the relationship between exogenous variables and residential demands of multiple houses.

Another example was HSTLF method, as already discussed in Section 2, wherein either variable selection method or similar-pattern method was applied to forecast the load at each level of aggregation. Similarly, for weather station selection, a forecaster addressed a subset of exogenous variables, i.e., meteorological variables.


Hyndman et al. [78] discussed that taking advantage of the prominent features of different methods and combining them in a hybrid scheme was what we needed to do now. Some examples of this combination were available in the literature. For example, Quilumba et al. [27] applied similar-pattern method in one step to group smart meter load profiles into an optimal number of groups and then feature selection method in the next step to forecast the aggregated load at each group of data.

In the proposed load model by Wang et al. [82], a three-stage combined model was applied. The hierarchical structure of the load series was extracted by applying hierarchical clustering technique based on similar consumption behavior of customers. Different load models were developed at each subgroup of data by using variable selection method. Eventually, the final model was undertaken by adding a weight factor to individual models to be coherent across the aggregate level.

Another example of the hybrid methodology could be found in the work of Zheng et al. [31], in which feature selection method was used to help find similar days' clusters. Each cluster was shaped based on feature values of the data, whereas a weighted parameter was assigned to each feature.

In this paper, a hybrid method is represented based on some of the main features of methods reviewed in the previous sections. The schematic diagram of the method is illustrated in Figure 9. As can be seen, this method is proposed to find base forecasts at each level of the hierarchical structure by applying similar-pattern method, and then by using a strategy to keep the coherency between the loads at different levels. The strategy is performed in seven steps as shown in Figure 9. In the first step, the patterns similar to today's load profile are extracted from each load series at the disaggregate level. Considering that *n* number of similar patterns are obtained for each subseries, and by assuming that there is *N* number of subseries at the disaggregate level, *n<sup>N</sup>* number of aggregated profiles is created. Between these aggregated profiles, the one with the minimum distance from today's profile at the aggregated level is selected. Subsequently, in the next step, the combined profile will be matched to the real aggregated profile by finding the weighting factor. Eventually, to forecast the next day's load at the aggregated level, load profiles of sequential days (days after similar-pattern days), which are selected in the optimal combination, will be summed up by using the weighting factor of step 5. This method finds similar patterns in the disaggregated level, but measures the similarity distance again at the aggregated level.

**Figure 9.** Schematic diagram of the proposed hybrid method.

The novelty of the proposed method over the aforementioned hybrid method is that it neither aggregates the data from bottom levels nor aggregates low-level forecasts. Following the hybrid method developed in [80], either forecast results of similar-pattern method is aggregated or a weighted averaged result of similar patterns are aggregated. However, the proposed method creates multiple subsets of data at the disaggregate level; consequently, the optimal subset is selected by comparing the combination results to the upper-level data. In this way, distinguishing the degree of similarity is not limited to one subset of data with averaging results. However, it is still not clear that this method might be more accurate than the conventional hybrid methods [82].

The proposed method assumes that finding the optimal subset of data might result in a more accurate forecast than averaging similar patterns in each low-level subseries. In fact, the idea of selecting an optimal subset of data at every disaggregate level for prediction of the next level's load is interesting; although, the technical difficulties for implementation need to be investigated in future works.

#### **4. Conclusions**

This paper discusses four categories of state-of-the-art STLF methodologies, i.e., similar-pattern, variable selection, hierarchical forecasting, and weather station selection while each of these methods proposes a specific solution for load prediction. Similar-pattern method, which is rooted from the minimum distance approach, presumes that the load trend is unlikely to vary during a short period. Hence, by searching within close vicinity of today's load, some similar patterns can be distinguished. In fact, forecasting the future load is based on the subsequent behavior of the discovered similar patterns in the load series.

Variable selection method, on the other hand, tries to find prominent and independent features in a dataset with the lowest correlation with each other and the highest correlation with the output. Constructing a subseries of these features helps to improve the forecast accuracy.

Hierarchical forecasting methods address the aggregated loads in different levels of the hierarchical structure. Predicting loads in various zonal level help power system operators to effectively perform the switching operation and load control. In addition, improving the forecast at sub-levels enhances the prediction accuracy at upper levels.

Besides geographical and zonal hierarchies, the weather hierarchy is another vital factor in STLF, which cannot be captured easily for each geographical zone. Various weather services in a large geographical area provide different weather forecast information. Selecting the best-suited weather

information is substantially important for STLF, considering the influence of weather variables on the load trend.

Eventually, by highlighting the main advantages and disadvantages of each approach, it is concluded that the load model can benefit from the robustness of individual methods in a hybrid scheme. Finally, the general outline of a hybrid strategy is proposed for future evaluation.

**Author Contributions:** S.N.F. came up with the primary idea. The initial idea further developed with the collaboration of M.G., S.S., and K.-w.C. The contribution is as follows: methodology, S.N.F.; investigation, S.N.F., M.G., S.S.; supervision, S.S. and K.-w.C.; original draft preparation: S.N.F.; writing and editing: S.N.F., M.G., S.S. and K.-w.C.; visualization, M.G., S.S.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Nomenclature**


#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Review* **Assessing and Comparing Short Term Load Forecasting Performance**

#### **Pekka Koponen 1,\*, Jussi Ikäheimo 1, Juha Koskela 2, Christina Brester <sup>3</sup> and Harri Niska <sup>3</sup>**


Received: 13 March 2020; Accepted: 17 April 2020; Published: 20 April 2020

**Abstract:** When identifying and comparing forecasting models, there may be a risk that poorly selected criteria could lead to wrong conclusions. Thus, it is important to know how sensitive the results are to the selection of criteria. This contribution aims to study the sensitivity of the identification and comparison results to the choice of criteria. It compares typically applied criteria for tuning and performance assessment of load forecasting methods with estimated costs caused by the forecasting errors. The focus is on short-term forecasting of the loads of energy systems. The estimated costs comprise electricity market costs and network costs. We estimate the electricity market costs by assuming that the forecasting errors cause balancing errors and consequently balancing costs to the market actors. The forecasting errors cause network costs by overloading network components thus increasing losses and reducing the component lifetime or alternatively increase operational margins to avoid those overloads. The lifetime loss of insulators, and thus also the components, is caused by heating according to the law of Arrhenius. We also study consumer costs. The results support the assumption that there is a need to develop and use additional and case-specific performance criteria for electricity load forecasting.

**Keywords:** short term load forecasting; performance criteria; power systems; cost analysis

#### **1. Introduction**

Bessa et al. [1] discussed two different ways of measuring the performance of a forecast. One way is to measure the correspondence between forecasts and observations (forecast quality). Another way is to measure the incremental benefits (economic/or other) when employed by users as an input into their decision-making processes (forecast value). Assessing forecast quality is more straightforward and the standard approach is statistical error metrics, such as:


Typically, these metrics apply some kind of loss function to individual errors and then calculate a summary statistic [2]. For example, Screck et al. [3] surveyed the literature for 681 load forecasts for the residential sector. Altogether, 15 error metrics were used, the most frequently used was mean absolute percentage error (MAPE) with 392 values. The second was normalized root mean squared error (NRMSE) with 209 error values and the third was RMSE. MAPE and NRMSE are relative metrics that aim to be comparable amongst different experiments. In the literature, the meaning of NRMSE varies significantly, because there are many different ways to normalize the RMSE. Here, we use only the most common definition that normalizes RMSE by dividing it by the mean of the measured values. In the alternative definitions, the normalization is done by the difference between the maximum and minimum, by the standard deviation, or be the interquartile range, etc. Some publications, such as [4], even use a NRMSE definition that, similarly to MAPE, normalizes each individual error with the simultaneous measured value before calculating the RMSE. So defined NRMSE and MAPE are much more sensitive than the NRMSE we use to calculate absolute errors when the actual loads are small or very small. MAPE also puts less weight on large deviations than NRMSE. MAPE is based on assumptions that (1) accurate forecasting of small loads is important and (2) one large error is not more significant than an equally large sum of small absolute errors. Both these assumptions are clearly in conflict with the actual consequences of the short-term load forecasting errors that we discuss next.

Often, model identification is easier and computationally more efficient with quadratic criteria such as sum of squared errors (SSE), RMSE and NRMSE. These criteria also reflect the combination of errors from several independent sources. This is important, as typically several different component forecasts are needed for forecasting the total power or power balance. The assumption of independence may not be completely valid, however. For example, the forecasts of loads and generation are often based on the same or mutually correlated weather forecasts or are connected by the behavior of certain groups of humans. Technologies and energy markets also reduce independence of load behavior. For example, demand side responses for electricity markets and ancillary services have very much mutual correlation. Good short-term load forecasting methods utilize such dependencies efficiently and their forecasting errors tend to be rather independent from each other. Correlated forecasting errors may also stem from using the same forecasting methods or methods that have common weaknesses. Mutual correlation of forecasting errors is usually easy to detect, and it is a sign that improving the forecast is possible.

The concept of forecast value, on the other hand, views the forecast user's business process more extensively and is more difficult to assess. Forecast value includes the economic and noneconomic benefits which are available to the user by using the forecast. An example is the reduction in imbalance costs for a balance responsible party. A crucial aspect is that the value is user and problem specific [1]. For example, power market participants and the system operator may measure forecast value differently. For the system operator, the most important issue is the expected maximum forecast error, and not the mean forecast error. An UCTE position paper [5] discussed this issue for wind power forecasts. We will employ a case study below to explore to what extent the consumer and the retailer aggregator have different preferences. Such differences between different actors largely stem from the fact that the electricity markets locally and imperfectly approximate marginal cost changes, but do not perfectly reflect the real costs of the power systems.

It is often infeasible to calculate the benefits of the forecast accurately. When it is possible to accurately model the decision-making process which exploits the forecasts and the resulting costs, the error metric can be selected so that the resulting costs are minimized [6].

Forecast value is also related to the error metrics. According to [6], error metrics should be easy to interpret, quick to compute and reflect the net increase in costs resulting from decisions made based on incorrect forecasts. MAPE also penalizes over-forecasts (where forecast load is greater than realized load) more than under–forecasts [7,8]. In addition, MAPE penalizes relatively lightly such large absolute forecasting errors that occur during load peaks. However, in short-term load forecasting, under–forecasting high loads tends to be especially costly for all the relevant actors, including the system operator and the energy consumer.

The most commonly used error metrics also suffer from a double penalty effect for events which are correctly predicted but temporally displaced [7,9]. There are criteria, such as the parameterized earth mover's distance [10], which do not suffer from this effect. For avoiding this double penalty effect, time shifted error measures such as dynamic time warping and permutated (so called adjusted) errors and their downsides are considered by [9].

Costs of large mutually correlated forecasting errors may behave differently from the SSE and RMSE. MAE assumes that the costs due to forecasting errors depend linearly on the size of the errors. It is often used to avoid the problem that SSE and RMSE put too much weight on large forecasting errors as compared to the assumed real costs. Often this assumption is not valid and the actual costs may grow even faster than the square of the error assumed in SSE and RMSE.

The power systems are changing in an increasing speed. Distributed new energy resources such as active loads and other controllable distributed flexible energy resources make the power flows in distribution grids more variable than before. The power flows to and from the customers of the grid are correlated due to control actions, solar radiation and wind. The traditional load forecasting methods become obsolete. The forecasting performance criteria also need some reconsideration and updating in order to fit to this new situation.

Our aim in the present work is to show that there is still a need to develop the understanding, selection and amendment of criteria for forecasting performance. In forecasting applications, assessing or measuring the incremental benefits from the forecasts is both necessary and difficult. With case studies, we assess how the forecasting errors increase the costs of the competitive electricity market actors, electricity network operators and their customers, the consumers and prosumers that use the power system.

#### **2. Needs to Develop Short Term Load Forecasting Criteria**

When studying and developing short term load forecasting methods for active demand, such as [11], we have detected the following challenges:


The costs of forecasting errors to the competitive electricity market actors mainly stem from the balancing errors caused to their balance responsible party. When forecasting only components of the total balance, the errors relative to the errors in the total balance matter and the errors relative to the component forecast itself are not at all relevant. In addition, we need accurate forecasting during the system peak loads and peak prices and the highest peaks are rare and unpredictable.

Underestimating the load when forecasting critical high load situations can lead to very expensive unwanted measures in managing the grid constraints or peak load reserves at short notice.

A common problem with black box methods, such as machine learning methods, is that although they generally tend to under-forecast load peaks, as shown in [4], for rarely occurring outdoor temperature related peak loads they tend to predict higher and shorter load peaks than what actually occur. This tends to happen because (1) the identification data do not include enough such extreme situations in order to model the load saturation effects or (2) many nonlinear black box methods tend to have large errors when extrapolating outside the situations included in the identification data. There are several methods to deal with this problem. Daizong Dint et al. [12] proposed using (1) a memory network technique to store historical patterns of extreme events for future reference and (2) a new classification loss function called extreme value loss (EVL) for detecting extreme events in the future. Those approaches can improve the deep neural network (DNN) performance regarding only those extreme events that have been included in the learning data. Another approach is to add another model for the power range and use that for limiting out those forecast values that exceed the limit by more

than a tolerance [4]. Then, the energy of the peak should be preserved by extending the length of the peak. Physically based model structures describing the existence of constrained maximum power are useful for such peak limitation. Physically based model structures with power constraints can also be used to model the main phenomenon that contributes to the peak load. We have improved forecasting of rarely occurring and extreme situations by combining several different models into hybrid models, see [11] for an example. In order to better assess, compare and develop methods, we need forecasting criteria that adequately reflect the concentration of the economic costs of the forecasting errors to the high load situations.

#### **3. Cases Studied**

#### *3.1. Electricity Market Costs Due to Forecasting Errors*

The costs relate to the overall forecasting error of the total energy balance of the balance responsible party of the actor in the market. Thus, the load forecast of a consumer group segment is only one component of the total forecast. There are forecasts for different types of load and local generation. The different forecasts are not fully independent, because many of them may use the same weather forecasts and be subject to interactions between consumer group behavior. However, the errors of accurate forecasters tend to be independent and it is easy to check to what extent this assumption holds. Going to such details is complicated and outside the scope of this paper. Thus, for clarity of the analysis, we assume that the errors of different segment forecasts are independent. Then the contribution of the expected individual error component *e*<sup>1</sup> to the total expected forecast error *e* is as follows.

$$\mathbb{E}[\epsilon^2] = \mathbb{E}[\epsilon \mathbf{o}^2 + \mathbf{e}\_1 \mathbf{i}^2] = \mathbb{E}[\epsilon \mathbf{o}^2] + \mathbb{E}[\mathbf{e}\_1 \mathbf{i}^2] \tag{1}$$

where *e*<sup>0</sup> is the expected total error of all the other forecast components. Figure 1a shows how the total error *e* behaves as a function of *e*<sup>1</sup> when *e*<sup>0</sup> is set to 1. For small *e*1, the increase in *e* is quadratic.

**Figure 1.** (**a**) Impact of an additional error std. component to the total error std. when assuming normally distributed independent error sequences, (**b**) the behavior of the mean of simulated balancing error costs, (**c**) the range of balancing error cost variation in the simulations.

The monetary cost of the total forecasting error needs to be assessed in the electricity market. For simplicity, we assume that the forecasting errors cause costs via the market balancing errors. The market rules for penalizing the balancing errors of loads vary from country to country. The Nordic power market applies the balancing market price as the cost of the balancing errors. In the Nordic countries, one price system is applied for errors in consumption (load) balance and, in addition, there is a small cost component proportional to the absolute balancing error (volume fee for imbalance power). Power generation plants below 1 MVA are included in the consumption balance.

The price of consumption imbalance power is the price for which Fingrid both purchases balance power from a balance responsible party and sells it to one. In the case of the regulating hour, the regulation price is used. If no regulation has been made, the Elspot FIN price is used as the purchase and selling price of consumption imbalance power. For an explanation, see [13]. All the prices and fees are publicly available at the webpages of Fingrid and NordPool, such as [14]. In addition, imbalance power in the consumption balance is subject to a volume fee. There is also a fee for actual consumption, but that depends only on the actual consumption and not on the forecasting errors. The volume fee of imbalance power for consumption during the whole study period was 0.5 €/MWh. See [15] for the fees.

From an actual or simulated component load forecasting error, the resulting increase the forecasting error of the total energy balance can be calculated, and using this increase in error the resulting balancing cost increase was calculated based on the price of imbalance power and the volume fee of the power imbalance. In this way, we got an estimate for how much electricity market costs the forecasting error causes to the balance responsible actor. We do not know the forecasting errors of the balance responsible party. In the following study, we generated them as a normally distributed sequence that has standard deviation 3 MW. The actual errors may be bigger, but as we see later, that will not affect the conclusions. The price of imbalance price occasionally has very high peaks. Thus, the cost estimate will be very inaccurate, even when a very long simulation period is applied. It is necessary to check the contribution of the very highest price peaks to the cost in order to have a rough idea on the inaccuracy of the results. We avoided this challenge as follows. We made a short-term load forecast using a residual hybrid of physically based load control response models and a stacked booster network as explained in [16] for a four-year-long test period. We found out that the forecasting errors were rather well normally distributed and bounded. We generated 200 random normally distributed bounded error sequences over the four–year period. With each one of these 200 normally distributed bounded random error sequences, we calculated the balancing error costs for the forecast group. Then, the standard deviation of the group errors was varied and the same cost calculation repeated. The variation of the costs between the error sequences was very large. The mean behaved as assumed in Figure 1a and an actual measured and forecast case was clearly in the area where the quadratic dependency dominates (see Figure 1b). The demand response aggregator considers the actual and forecast active loads as trade secrets and does not allow us to make them public information. Except for this one point, all data used for the simulations in Figure 1 are publicly available.

The balancing error cost was very stochastic, because the impact of high balancing market price peaks dominated (see Figure 1c). The balancing error cost over the whole four-year-long period mostly depended on the forecasting error during those few price peaks. The red circle represents the forecasting errors when an experimental short-term demand response forecasting algorithm was applied to the forecasting on the morning of the previous day. There, the aggregated controllable power was slightly over 18 MW. The results support the use of the quadratic error criteria std. and RMSE, rather than linear criteria such as MAPE etc. In this case, data driven forecasters that do not model the dynamic load control responses have so poor accuracy that the cost dependency approaches a linear dependency. Here, we have ignored the fact that especially large forecasting errors can affect the price of imbalance power significantly, thus increasing the balancing error cost much more than linearly.

Another observation is that with the good performance forecasting model, the forecasting errors increased the imbalance costs very little, only 0.53 € per controllable house annually. This suggests that the one price balancing error model gives only very weak incentives to improve the forecasting accuracy in normal situations. The one–price balance error cost model may not adequately reflect the situation from the power system point of view. A further study is needed to find out to what extent, how much and with which market mechanisms the power system can in the future benefit from improving the short-term forecasting accuracy. A conclusion of the H2020 SmartNet project [17] was that improving the forecasting accuracy is critical for getting the benefits from the future ancillary service market architectures for enabling the provision of the ancillary services using distributed flexible energy resources.

Some other countries apply a two–price system for balancing error costs. That means that the price for the balancing errors is separately defined by the balancing market for both directions. Then the costs of load forecasting errors are much higher than in a one–price system. They may even exaggerate the related needs of power system, if the errors of the forecasts of the individual balancing are assumed to be independent from each other. When the share of distributed generation increases, the one price system may become problematic, because the consumption and distributed generation may not have enough incentives to minimize their balancing errors. This increases the need for balancing reserves in the system. The share of distributed generation is expected to increase much during the summertime, which means that also in the Nordic countries there may appear needs to change the market rules regarding the balancing costs somehow. Moving to two–price system may be one such possibility. Thus, it may be worthwhile to repeat the above study using the two–price system of the generation balance.

#### *3.2. Distribution Network Costs Due to Forecasting Errors*

Overloading of network components causes high losses that increase the temperature of the components. Overheating reduces the lifetime of the insulator materials in the network components rapidly. If the forecast underestimates the peak load, the operational measures to limit overload are not activated in time, energy losses increase and the component aging increases so much that the expected lifetime is reduced.

The losses in the network components are generally proportional to the square of the current. When the voltage is kept roughly constant and the reactive power, voltage unbalance and distortion are assumed to be negligible, the losses are roughly proportional to the square of the transferred power. In real networks, these assumptions are not accurate enough. Strbac et al. [18] calculated losses using a complete model of three power distribution license areas in UK. The analysis highlighted that 36–47% of the losses are in low voltage (LV) networks, 9–13% are associated with distribution transformer load related losses, 7–10% are distribution transformer no-load losses and the remaining part in higher voltage levels. They [18] (p. 43) showed the marginal transmission losses as a function of system loading. A 1 MWh reduction in load would reduce transmission losses by 0.11 MWh during peak load condition (100%). When system loading is 60%, reducing the load by 1 MWh will reduce transmission losses by 0.024 MWh. This corresponds to the dependency f(P) = P2.98. The sample size is small, so the accuracy of this dependency is questionable. Nevertheless, the dependency is clearly different from f(P) = P<sup>2</sup> that results from assuming constant voltage at the purely active power load and transmission losses relative to the square of the current. Underestimating the peak loads causes much higher losses and related costs than other load forecasting errors. Thus, the impact of forecasting errors to the energy losses is very nonlinear and depends on the direction of the error and size of the load.

Ref. [19] studied how transformer oil lifetime depends on temperature. Arrhenius' law describes the observed aging rather well. Aging mechanisms of cable polymers were discussed in [20]. The Arrhenius method is often used to predict the aging of cable insulation, although it is valid only in a narrow temperature range. For example, it is applicable only below the melting point of the insulator. For simplicity, we here model the aging using the Arrhenius method. According to it, *k* the rate of chemical reaction (such as insulator aging) is an exponential function of the inverse of the absolute temperature *T*.

$$k = \mathbf{A}e^{-\mathbf{E}\_a/\mathbf{R}T} \tag{2}$$

where R is the gas constant, and the pre-exponential factor A and the activation energy Ea are almost independent from the temperature *T*. In the steady state, the difference between the component or insulator temperature and the ambient temperature is linearly proportional to the losses. Components are normally operated much below their nominal or design capacity and the impact of forecasting errors on the losses and aging is small. When the component during peak load is operated at or above its nominal load and is subject to high ambient temperature and poor power quality high losses, fast component aging, or expensive operational measures result from under-forecasting the load.

Thus, the impact of forecasting errors to the costs is very nonlinear and depends on the direction of the error and size of the load. Most of the time, the network costs from short term load forecasting errors are small or even insignificant. However, the costs of forecasting errors increase rapidly when the load is at or above the nominal capacity of the network bottlenecks, if the actual load is higher than the forecast load.

#### *3.3. A Consumer Cost Case Study: Load Forecasting Based Control of Domestic Energy Storage System*

All the costs of the power supply are paid by the users of the electricity grid. The forecasting errors discussed in the other chapters increase the electricity prices and grid tariffs by increasing the costs of the electricity retailers, the aggregators and the grid operators. Here, we consider those costs that the consumer has possibilities to control more directly.

By using energy storage in a domestic building, a customer can get savings in the electricity bill [21]. The amount of the savings depends on many factors. The load profile of the customer and the electricity pricing structure and price levels are the main variables that affect the savings, but the customer has very limited possibilities to change them. The size of the energy storage can be optimized for the customer's load profile, but after that, controlling the energy storage and consumption is the only way to maximize the savings. Energy storage can be used, e.g., to increase the self-consumption of small-scale photovoltaic production, but it can also be used for minimizing costs from different electricity pricing components. If the energy retailer's price is based on the market price of electricity, the customer can get savings by charging the energy storage during low price and discharging during high price as in [21]. If electricity distribution price is based on the maximum peak powers, the customer can get savings by discharging the battery during peak hours, as in [22].

Such electrical energy storage systems are still rare and typically installed to increase the self-consumption of small-scale photovoltaic power production. Although the battery technologies progress all the time, the profitability in such use is still typically poor, especially if there are loads that can be easily shifted in time. One can improve the profitability of the battery system significantly by having several control targets or a more holistic one. Such a possibility is minimizing the costs from different electricity pricing components, but that requires short–term load forecasting.

In this case study, it is assumed that every customer in the study group has a market price-based contract with an electricity retailer and the distribution price is partly based on the maximum peak powers as in [23]. The energy retailer price is based on hourly day-ahead market prices of Finland's area in Nord Pool electricity markets [13]. The electricity retailer adds a margin of 0.25 c/kwh to the hourly prices. Distribution prices are based on the study in [23], where the price components were calculated for the area of the same distribution system operator (DSO) as where the customers' data of this study were measured. The price structure includes two consumption-based components: volumetric charge (0.58 c/kWh) and power charge (5.83 €/kW). The power charge is based on the monthly highest hourly average peak power. When value added tax (24%) is added to these charges, the prices which affect to customers' costs are: volumetric charge 0.72 c/kWh and power charge: 7.23 €/kWh. The same prices were used in [22].

Customers' load data are from one Finnish DSO, whose license area is partly rural and partly small towns. The study group consists of 500 random customers. In simulations, each customer has 6 kWh with a 0.7 C-rate Lithium-ion battery. Battery type is lithium iron phosphate (LFP), because it has good safety for domestic use. The energy storage system is controlled firstly to decrease the monthly maximum peak power and secondly to decrease the electricity costs with market price-based control as in [22]. The battery is discharged when power is forecast to increase over the monthly peak and charged during low prices. The market price-based control algorithm and battery simulation model were presented in [21]. In previous studies, the controlling of energy storage was based on load forecasting. The load forecasts are based on a model, which utilizes customer's historical load data and temperature dependence of load with temperature forecast [24].

In the present study, the dependence between the accuracy of load forecasting and the customers' savings achieved by using the energy storage are studied. The simulations are made for every customer with 11 different load forecasts each having a different load forecasting accuracy. The forecasting accuracy is varied by scaling the forecast error time series. The actual load forecast time series is nominated as the basic level (100%) and the real load data correspond to the ideal forecast (0%). The range of studied error scaling is selected linearly between 0% and 200%, with 20% step size in every hour. Customers' yearly cost savings and different forecast accuracy criteria (SSE, RMSE, NRMSE, MAE and MAPE) have been calculated during simulations. Additionally, because most of the savings come from the decrease in monthly peak powers, the MAE of monthly peak powers (MAEmax) was calculated. The monetary value of the cost savings depends on the customer's load profile, so the results are given as percentage values of cost savings. The results of the simulations are shown in Figure 2. From the result points, we calculated least-squares fitted line (R1) and least-squares fitted second-order curve (R2).

**Figure 2.** Percentage yearly savings of customers when using energy storage to decrease monthly maximum peak powers and the costs of market priced energy, shown as a function of different forecast error criteria. Color of points shows the used error level (0–200%).

From the results, we see that with ideal forecasts the savings of the customers vary a lot. This stems from the different load profiles of the customers. The customers, whose load profile includes high peaks during several months, can get very high savings. If customer's load profile is very flat, the savings can be low. When the errors in the forecast start to increase, the savings drop very fast at first, but the decrease in the savings slows quickly and the decrease stays low until the end.

From the results of Figure 2 and the results of least-squared fittings, we collected the main results and values for the Table 1, which helps to compare different criteria. In Table 1, data points mean the points (maximum 5500 points) which can be used and logical order means that the data points are in order, such that higher error gives lower savings.

The idea in the comparison is that good criteria predict as accurate as possible the cost savings of a customer. Fitted lines and curves predict the cost savings best with MAPE, but MAPE can be calculated only for a small part of the customers. For this reason, the values of MAPE seem better than they really are. With NRMSE, the order of points is not logical: the savings do not monotonically decrease as the NRMSE value increases. SSE gives the worst values in fittings. RMSE and MAE are almost equal, but MAE is in this case marginally the best criteria. Differences between the criteria are not large and selecting the most suitable criteria in this case requires accurate comparison.

Additionally, the MAE of the monthly peak powers is shown in Figure 2. As we can see, MAEmax describe the effect of forecasting error for the savings almost as well as traditional forecast error criteria. This is logical when most of the savings come from the decrease in monthly peak powers. When the other forecast error criteria take into account all hours during the year (8760 h), this MAEmax is calculated only from one hour per month (12 h).


**Table 1.** Comparison of criteria in a consumer cost case study.

#### *3.4. Comparison of Methods across Di*ff*erent Forecasting Cases*

When the same method is used to forecast different aggregated loads the values of the same accuracy criteria can be very different. Humeau et al. [25] analyzed the consumption of 782 households and found out how the values of the NRMSE decrease with the increase in the number of sites in clusters. They compared linear regression, support vector regression (SVR) and multilayer perceptron (MLP) in this respect. [26] shows a typical short-term load forecasting accuracy dependence on the prediction time horizon. The weather forecasts and load forecasting methods have improved much so now the accuracy decreases somewhat later but the shape of the dependency is still similar. In addition to the number of sites, the value of criteria depends also on the size and type of sites, type of loads, the presence of active demand, etc. Figure 3 demonstrates some of the dependencies. It summarizes some results from our past publications between the years 2012 and 2020. All these publications can be found via [11,16]. All the most accurate methods in Figure 3 are hybrids that combine several short-term forecasting methods and include both machine learning and physically based models. In the case with about 59,000 customers, the most accurate method includes also a similar day method. All of the most accurate methods use more than one hybridization approach, including residual hybrid, ensemble and range constraining.

**Figure 3.** Load forecasting accuracy as a function of the number of aggregated customers as collected from our own published results. The results support the assumption of independent errors except for the largest group size and the other differences reduce the comparability among the cases much more.

In Figure 3, the four markers on the right end represent different forecasting method applied to the exact same case. At about 1000 aggregated customers, there are combinations of four rather similar groups of about 1000 customers and two different methods in all eight combinations. The blue line in the figure shows how the forecasting accuracy measured in std. depends on the number of customers assuming completely independent forecasting errors. The expected behavior of uncorrelated forecasts is like that. The cases are not fully comparable and the amount of them is too small for making any reliable conclusions. Forecasting when active loads are not present is usually more accurate than when dynamic load control is applied. Load control also makes the load behavior strongly correlated, which also tends to increase the correlation of forecasting errors and thus reduce the NRMSE improvement stemming from increasing the number of aggregated customers. In the cases with 59,000 customers at the right side of Figure 3, the forecasting time resolution was 3 min and in the other cases it was 1 h. In the lowest NRMSE case at about 3500 customers, it was 24 h. Using more accurate time resolution causes higher values of criteria. At about 3500 customers, the range of the values shows the impact of the improvement of the methods when the case remains the same. There, the outdated national load profiles had the highest NRMSE. In the four groups that have about 1000 customers, the controllable loads dominate. With two of these four groups, the forecasting performance is very good as compared to the blue line. The two leftmost groups, each having 350 to 380 customers, suffered from communication failures in the first third of the 5-year-long verification period. Selecting only the best year of the verification would have given NRMSE = 20.1% for group 1 and NRMSE = 26.8% for group 2 of that case.

One observation is that the results are not always very well comparable between different groups in the same case, nor between different years for the same group. Thus, meaningful comparison of methods across different individual forecasting cases does not seem feasible. The comparability was even worse when using MAPE instead of NRMSE. In addition, the results support the hypothesis that with the best forecasting methods in the studied cases the assumption of mutually independent forecasting errors may be justified if the number of houses is not very large. The amount of cases is too small for making firm conclusions. When the forecasting cases represent the same time and country the cross correlation can be calculated from the forecasts. We leave that now to further studies. The purpose here is only to show that (1) this kind of analysis may be useful when made with many more cases and (2) the comparison of forecasting methods between different cases is complex and gives only very rough results. Further research with many more cases is needed in order to get reliable quantitative results.

Figure 4 shows how the MAPE and NRMSE depended on each other in 38 short-term forecasts that we have produced in six different forecasting cases. All the forecasts have the same forecasting horizon. The differences in their behavior are rather small. This is the expected result for the errors of accurate forecasts. We expect that the results may be much more different if either low accuracy forecasts or more exceptional situations are included in the comparison. Further studies are needed regarding that.

All those six forecasts that have NRMSE between 15% and 17% are from the same group in the same case but use different forecasting methods. The group behavior in the verification was rather different from the identification. The low MAPE in one of the forecasts may indicate that MAPE there failed to adequately detect the rather large peak load errors caused by the behavior change. That may happen, because the statistical behavior of the errors was not any more normally distributed, as is the case with accurate forecasts.

**Figure 4.** Comparison of normalized root mean squared error (NRMSE) and mean absolute percentage error (MAPE) in 38 forecasts in 6 different short term load forecasting cases comprising together 12 different forecasting methods.

#### *3.5. Load Peak Sensitive Validation*

The most valuable forecasts for peak load can be obtained by modeling the actual costs of forecasting errors in the electricity market, in the distribution grid or in both depending on the purpose of forecasting. For most comparisons, this is not practical, because of the complexity and stochastic nature of the costs as shown before. As the cost of the errors is the fundamental reason for the need for accurate forecasts, it is nevertheless important to have at least a general idea on how the costs form and use criteria that reflect the load peak sensitivity of the costs.

Conventional validation statistics cannot solely guarantee the performance of a model in load peak situations. For instance, the recent study on weather forecast-based short-term fault prediction using a neural network (NN) model [27] showed some inherent limitations of the standard MAPE and mean absolute error (MAE) metrics. MAPE does not work due to existence of zero values, i.e., the absence of network faults. MAE does not properly reflect the model performance thoroughly, e.g., in rare peak events. The high fault rate periods are important to predict in order to temporarily increase preparedness to manage them. On the other hand, the results indicate that the index of agreement (IA) [28] may provide a more robust metric for measuring the model performance, including peak events and for model evaluation and comparison in general:

$$IA = 1 - \left(\frac{SSE}{\sum\_{i=1}^{n} \left( \left| \mathcal{Y}\_{i} - \overline{\mathcal{Y}} \right| + \left| y\_{i} - \overline{\mathcal{Y}} \right| \right)^{2}}\right) \tag{3}$$

where *SSE* = *<sup>n</sup> <sup>i</sup>*=1(*yi* − *y*ˆ*i*) 2 , *yi* is the true number of faults, *y*ˆ*<sup>i</sup>* is the predictied number of faults, *y* = <sup>1</sup> *n n <sup>i</sup>*=<sup>1</sup> *yi*, *i* = 1, *n*, *n* is the sample size, and *IA* ∈ [0, 1], higher values of IA indicate better models.

The study [27] also showed that the resampling/boosting of rare faults peaks in the training data can be used to enhance the ability of an NN model to forecast fault events. Figure 5 demonstrates the results of forecasting all types of faults and faults caused by wind, originally presented in [27].

**Figure 5.** Weather-based fault prediction in the electricity network: comparison of NN models in two experiments with and without resampling. (**a**) All faults, (**b**) Faults caused by wind.

NN models without resampling do not predict peaks but have better MAE as they are more accurate for samples of fewer faults, which are prevalent in the data. On the contrary, NN models with resampling are less accurate for samples of fewer faults, which makes MAE higher. However, the models with resampling are able to predict large peaks, which is more valuable from the problem perspective. For the prediction of all the considered types of faults and wind faults, higher IA values correspond to models with oversampling, which are better at predicting peaks. Based on the given liner regression, we see that there is still a tendency to underestimate peaks. Anyway, the index of agreement may be useful as an additional criterion also in short term load forecasting that has similarly challenges in assessing the performance of forecasting the load peaks.

One option to the aforementioned standard evaluation metrics are categorical statistics i.e., to evaluate model's performance in critical load peak situations by discriminating electric load to a category/class (e.g., low and high) and then apply some conventional index for each class separately or only to the peak loads. Discrimination can also be based on variables that affect the load such as the outdoor temperature and electricity market price. Evaluation of the accuracy of the daily peak load forecast is sometimes used as a peak load sensitive criterion.

Accurate peak load forecasting is so important that short term peak loads are often separately forecast, as in [29,30], for example. Peak load forecasting may be less sensitive to the choice of criterion than the forecasting of the full profile, but absolute maximum error (AME) was used in [30] to complement MAPE when comparing forecasting methods. This seems justified although there both criteria rated the compared four methods clearly in the same order.

Based on the discrimination and the contingency table, a set of different standard metrics can be derived including:


where TPR is the true positive rate representing the sensitivity of the model (the fraction of correct predictions) and FPR is the false positive rate, representing the specificity of the model. SI is limited to the range of −1, 1 and for a perfect model SI = 1 [31–33]. With this approach, it is also necessary to have a category for too high predictions, because otherwise too high peak predictions would not be penalized.

Such categorical statistics including probability of detection (POD), critical success index (CSI) and false alarm ration (FAR) are used, e.g., in wind power ramp forecasting as measures of accuracy for forecasts of sudden and large power changes [34,35]. The detection and forecasting of those events is crucial with regard to power production and load balance. However, the ability of forecasting methods to predict those events remains relatively low for short term operation.


A possible way to classify the load to suitable categories could be based on the gradient of the load duration curve. Also, proportions of the observed peak load or time could be considered; for example, those times could be used when the load is over 80% of the peak load or 20% of the highest loads measured.

#### **4. Criteria for other Relevant Aspects than Forecasting Performance**

#### *4.1. Estimation of Confidence Intervals for Short Term Load Forecasting*

Forecasting peak loads too low increases risks and costs as already discussed. Traditionally, these risks are managed by increasing operational margins depending on the situation and based on experience. The operational margin calculation can be the standard deviation (std.) of the short-term load forecast error tuned by a situation dependent coefficient. Forecast error quantiles are also used instead of the std. as the basis of the estimation of the operational margins, because in rarely occurring exceptional situations, such as extreme peak loads, the error distribution may not be normal. For example, Petiau [36] presented a method to estimate the confidence intervals (CI). It is based on the calculation of quantiles of past forecasting errors. CIs quantify the uncertainty of the forecast.

The estimation of the standard errors can be performed, e.g., using the method of bootstrapping. The bootstrapping is a nonparametric approach based on the re-sampling of the data to produce the distribution of re-sampled validation indices [37].

Probabilistic load forecasts provide even more information for risk assessment. Hong and Fan [38] reviewed the methods to produce and evaluate probabilistic forecasts.

Estimation of confidence intervals and the assessment of forecasting performance in exceptional situations are closely related. They both need to focus on exceptional situations where the general assumptions well justified for the errors of accurate forecasts may not be valid.

#### *4.2. Computational Time*

In the model identification, many new forecasting methods need much computational resources. This can be problematic, when online learning is applied. For some methods, this is a relevant problem also in off–line model identification, especially because the models need to be updated due to changes in the grid, market aggregation, customer behavior, new priorities, etc. For example, support vector regression (SVR) has so poor computational scalability that it limits the possibilities to exploit it in 1 or 3 min time resolution short-term forecasting needed in electricity grid operation [11]. Based on a brief discussion and comparison of computation times of their models in short term forecasting, Lusis et al. [4] concluded that model run time in model training may be an important factor when choosing between models.

Overly detailed models that have many parameters have poor forecasting performance as the classical textbooks of system identification, such as [39], have shown. In a different application area, a rather recent comparison of several popular statistical and machine learning methods [40] found out that the methods with the best forecasting performance also had the lowest run time computational complexity. Thus, we conclude that if a forecasting method needs excessive computational resources during operation, it is a sign that it also may have both poor performance and unnecessary computations. That is not always the case, however. Some forecasting methods, such as Gaussian Process Regression models, are more computationally intensive during operation, with no connection to overfitting and produce confidence intervals without the need for bootstrapping, which is a big advantage in the considered short-term load forecasting cases.

Different methods also require very different amounts of working time from different types of experts. Physically based forecasting models require good knowledge of the domain and classical model identification. Many forecasting methods, especially many ML methods, require much expertise and time in the tuning of method parameters and model structure to the specific case. Also, the requirements for preprocessing the data vary.

In order to be able to compare the costs and benefits of the forecasting methods, the computational complexity and the need for expertise and working time need to be assessed in terms of monetary costs. The computational complexity has several dimensions such as the need for computational operations and different types of memory. How easily and efficiently the method suits for parallel processing also affects the cost.

#### **5. Discussion**

Electricity market costs caused by the forecast errors are quite stochastic due to the impact of very few very high price peaks. Thus, using the observed market costs as identification and tuning criterion is not feasible as the results of the related case study demonstrate.

The results suggest that improving the short-term forecasting accuracy of the aggregated active demand loads further from our best recent methods [11] now may not give adequately significant savings in the expected value of the imbalance costs to justify much method development investments. It seems more important to mitigate the risks caused by high imbalance price peaks. A possible reason is that the one-price consumption imbalance fees of the case study may not reflect adequately the costs of imbalance to the power system. A more important reason is that the forecasting errors of the aggregated distributed flexible load are rather independent and small as compared to the forecasting errors of the overall balance of the balance responsible party responsible for the flexible load.

In the electricity markets, such forecasting errors that are much smaller than the balancing errors of the actor or its balance responsible party have very small impact on the total balancing error. Thus, it is much more important to focus on large errors of the component forecasts. With them, the balancing error costs depend typically linearly on the size of the balancing error. The balancing error costs depend on the related rules and prices of the particular electricity market but in general they tend to reflect only marginal linearized costs in the powers system, when the balancing errors are assumed not to affect the prices. The actual costs for the system operator grow much faster than linearly and that may be the actual case for the competitive market actors if the impacts on balancing power price and risk–hedging costs are not ignored.

In power distribution networks, the costs of forecasting errors concentrate very much on the load peaks of the network. Network losses depend on the square of the current. The aging of the components is rapidly sped–up by temperature increase caused by the losses. Thus, during low load, the impact of load forecast errors is insignificant. During high load peaks, the load forecasting error requires additional operational margin that is expensive. Thus, it is crucial to focus on the forecasting errors during network peak loads that may be at times that are different from times of the peaks of the component load forecast.

Here our focus is in model verification and validation criteria. In model identification or learning there are more limitations regarding the type of the performance criteria. Some model identification methods or tools may accept only certain criteria types. Some others do not have such strict limitations, but the form and properties of criteria always affect convergence and computational efficiency together with the identification model and method.

It was shown that the actual cost of short-term load forecasting errors is a highly non-linear and asymmetric function of the forecasting error. For example, the penalties for underestimating the load should be higher than for overestimating it. The cost of the errors is also highly load-dependent. Arguably, the right thing to do would be to construct a loss function that is specific to the forecasting problem at hand, and have the machine learning optimize that domain-specific loss function directly. Here we only propose, using domain-specific criteria to complement or refine a common loss function such as RMSE rather than replacing it completely. We do not yet experiment with such domain specific load functions, but only aim to explain what they should include and why they are needed. Many standard forecasting packages may not yet support the possibilities to use domain specific loss functions and adding them there could make them better available to more practitioners. The properties of the loss function affect the properties and number of the solution and the convergence of the method identification. These are difficult to assess if there is no analytic form of the loss function available. We show in Section 3.1 why the prices of the electricity markets and ancillary service markets are not directly applicable in a domain specific loss function, because rarely occurring and very unpredictable price peaks dominate the value of the loss function. Similarly, most of the grid costs tend to concentrate to a very few short overload periods. Thus, domain specific loss functions for the short-term load forecasting model identification may create substantial potential risks. The domain specific loss functions need to be designed and considered carefully. We initially assume that directly using domain specific load functions in identification should be used in parallel with other approaches rather than instead of them. Then the good and weak sides of the approaches can be better detected, and the strengths of the different approaches combined. Experimenting with the use of problem-specific loss functions also in the learning phase is a topic worth considering for future research.

#### **6. Conclusions**

The main findings were evident but also way too often ignored. (1) It is important to consider the actual costs and other consequences of the prediction errors when selecting criteria for short term load forecasting. (2) The behavior of the actual costs can be very nonlinear. Even quadratic cost criteria may underestimate the growth of the costs with the size of the error and the load peak. (3) Often, the costs stem mainly from peaks in the loads and in the market prices. For such cases, criteria that normalize each error to the simultaneous measurement, such as MAPE, can be very misleading. (4) The results do not support using any of the commonly used criteria to compare methods across different short-term forecasting cases. (5) The costs in real application cases may be so stochastic that their direct use in validation may not be appropriate. (6) Model development effort, expertise need, and computational complexity are also often relevant when selecting short term forecasting methods.

The research results by the authors suggest that integrating different short-term forecasting methods together into hybrids can combine their mutual strengths, thus improving performance and robustness without excessive development effort. The main challenge of such model integration is that it requires expertise with many different modelling methods.

Our aim was to show that the selection, development and analysis of the performance criteria deserves attention. Blindly using so-called standard criteria may mislead the development, tuning and selection of the short-term forecasting methods in real applications.

**Author Contributions:** J.I. contributed regarding the literature search, the introduction, and Section 4.1 on confidence intervals, and checked and amended Section 3.1 on electricity market costs. J.K. wrote Section 3.3. on criteria in load forecasting-based control of domestic energy storages. C.B. and H.N. wrote Section 3.5. on peak load sensitive validation. P.K. defined the paper scope and focus, collected most of the data and wrote most of the text. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was part of the project Analytics funded by the Academy of Finland.

**Acknowledgments:** The authors wish to thank the earlier projects, including their funders, partners and supporters for enabling the use of the data and forecasts that were necessary for this study. They also thank Tuukka Salmi for some useful comments.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## **Integration of Demand Response and Short-Term Forecasting for the Management of Prosumers' Demand and Generation**

### **María Carmen Ruiz-Abellón 1, Luis Alfredo Fernández-Jiménez 2, Antonio Guillamón 1, Alberto Falces 2, Ana García-Garre <sup>3</sup> and Antonio Gabaldón 3,\***


Received: 14 November 2019; Accepted: 13 December 2019; Published: 18 December 2019

**Abstract:** The development of Short-Term Forecasting Techniques has a great importance for power system scheduling and managing. Therefore, many recent research papers have dealt with the proposal of new forecasting models searching for higher efficiency and accuracy. Several kinds of artificial intelligence (AI) techniques have provided good performance at predicting and their efficiency mainly depends on the characteristics of the time series data under study. Load forecasting has been widely studied in recent decades and models providing mean absolute percentage errors (MAPEs) below 5% have been proposed. On the other hand, short-term generation forecasting models for photovoltaic plants have been more recently developed and the MAPEs are in general still far from those achieved from load forecasting models. The aim of this paper is to propose a methodology that could help power systems or aggregators to make up for the lack of accuracy of the current forecasting methods when predicting renewable energy generation. The proposed methodology is carried out in three consecutive steps: (1) short-term forecasting of energy consumption and renewable generation; (2) classification of daily pattern for the renewable generation data using Dynamic Time Warping; (3) application of Demand Response strategies using Physically Based Load Models. Real data from a small town in Spain were used to illustrate the performance and efficiency of the proposed procedure.

**Keywords:** short-term load forecasting; demand response; distributed energy resources; prosumers

#### **1. Introduction**

There is a large literature related to short-term forecasting in the context of electric energy and this topic also has a great interest in many other fields. In fact, the proposal of new forecasting methods is daily increasing because of their applicability to dispatch unit commitment or market operations [1]. In this sense, short-term load forecasting models have always been a key instrument for carrying out such operations, although in recent years, with the increasing integration of power plants based on renewable energy with high variability (mainly wind and solar), forecasting models for these kinds of power plants has gained the attention of researchers and utilities. Photovoltaic (PV) systems are the most widespread renewable based power generation systems (they stand for more than half of the total installed capacity in power plants based on renewable sources) with a large number of small-scale installations on medium or low-voltage grids, right next to residential consumers.

The search for more accurate and faster forecasting methods, both in load and in PV power generation, has resulted in a set of efficient techniques that can be divided into three different categories: time-series approaches, regression based, and artificial intelligence methods (see [2,3]).

In the field of short-term load forecasting there are some examples of classical time-series approaches with auto-regressive integrated moving average (ARIMA), seasonal ARIMA (SARIMA) and ARIMA with exogenous variables (ARIMAX) (see [4–6]). Regression-based models (see [4,7]) are also widely used in the field of short-term load forecasting, including non-linear regression [8] and nonparametric regression [9] methods. Examples of artificial intelligence methods include fuzzy logic [10,11], artificial neural networks (ANN) [12–14], ensemble methods based on regression trees [15,16], and support vector machines (SVM) [17,18]. Recently [19], a new hybrid model was proposed to improve accuracy that performs well in electricity load forecasting.

On the other hand, the development of short-term PV power forecasting models has followed a parallel path. Thus, some published works use classical time-series approaches [20], regression methods [21], fuzzy logic models [22], ANN-based models [23], ensemble methods [24,25], and support vector machines [25]. A comparative study of the forecasting performance of different models of the above-mentioned approaches for the same PV plant is presented in [26], and in which the best model, among those studied, changes according to the data available for the training process.

Undoubtedly, fitting and computation velocity improvements are desirable, but at the same time, it is essential to take advantage of current forecasting methods. The main objective of this paper was not to propose a new short-term forecasting method, but to illustrate how some recent ones can be combined to predict electricity consumption and photovoltaic (PV) generation, in order to propose efficient strategies of Demand Response (DR) for an aggregated load of consumers. On the other hand, DR acts a regulator or damper to correct excursions of net demand of Power System buses with demand and generation (i.e., "prosumers") due to punctual errors of forecasts both in demand, but specially in renewable generation, reducing the own volatility of this last resource.

Political and regulatory scenarios in several countries support the development of the so-called Distributed Energy Resources (DER), i.e., the integration of demand flexibility, energy storage, and generation (mainly Renewable Energy Sources, RES), which facilitates the de-carbonization objectives of power systems by 2030–2050 [27]. For example, the European Commission (EC) is concerned with a necessary increase of flexibility of demand involved with the integration and exploitation of DER possibilities. The Direction General of Energy (DG ENER) reported, in a public dissertation [28], that the theoretical level of Demand Response in 2017 was 100 GW, but only 21 GW were activated (75% of them through incentive based options, i.e., the so-called explicit DR). The policy scenario for 2030 is 160 GW of theoretical demand potential with 52 GW activated (24% of peak load demand, assuming it will reach 570 GW).

To accomplish this forecast, this future scenario makes necessary an increase of 300% in DR resources in a decade, and this seems a difficult task if future forecasts fail again in the trends about the evolution of DR [29]. In this way, it is important to consider and enhance DR. Moreover, the net benefit of the overall deployment of DER and RES could reach €5600 million/year for the EU economy (i.e., generate up to 1% Gross Domestic Product increase over the next decade). The potential in the Spanish case is 17 GW; around 50% of this potential could be explained by DR and RES in small and medium customer segments. For these reasons, DR policies in this paper are centered in these segments. This participation also involves the capability of aggregators and system operators to develop more accurate forecasts both on demand and generation and the necessity of making this information (forecasts) easily accessible to customers in order to increase their participation and engagement in markets (mainly in energy markets but also in Ancillary Service markets). More accurate forecasts could allow customers to take advantage of the retributions of energy markets and avoid possible penalties due to imbalances between generation and consumption. Due to the size of demand and generation, forecasts are more difficult and can represent a barrier for customers, especially in tasks involved with RES forecasting. This fact makes demand flexibility a potential tool to change this scenario.

The proposed methodology can be summarized as follows:


Among the wide variety of machine learning methods, we have chosen random forest to predict the electricity consumption with a horizon of 24 h due to its proven efficiency in short-term load forecasting [15]: high accuracy, fast computing (even for parameter tuning), and easy understanding of feature importance results. Prediction results obtained with random forest for this dataset showed great accuracy; therefore, other forecasting methods were not really needed.

In the case of forecasting hourly PV generation, several machine learning methods were applied and compared, searching for the most accurate. Specifically, linear regression, neural networks, random forest, gradient boosting, and support vector regression models were developed and tuned choosing the optimal values for their parameters by means of genetic algorithms. Unlike hourly load forecasting, the goodness-of-fit measures of the predicted PV curves showed lower accuracy. Regarding the clustering method applied to the PV curves, the dynamic time warping distance [30] and average linkage were selected for the classification stage.

The rest of the paper is organized as follows: Section 2 describes in detail all forecasting and clustering methods used, as well as DR strategies applied in this paper. Section 3 present the results obtained for load and PV forecasts, explaining the application of clustering and DR to minimize the effects of forecasting errors. Finally, some conclusions and future developments are stated in Section 4.

#### **2. Materials and Methods**

#### *2.1. Methodology Overview*

Day-ahead markets represent the most active markets in terms of economic value of transactions, but other markets have experienced a noticeable growth (for example Intraday Markets in France and Belgium with growth rates of 54.5% and 82.9%, respectively, in 2018). Real-Time Markets have facilitated the integration of renewables in USA markets in the last decade, and wider-scale markets with later gate closure would facilitate the integration of renewable in other systems (e.g., Balancing Markets, and specifically, Reserve Replacement). The integration of demand-side resources in markets presents both risks and opportunities for Balance Responsible Parties (BRP, responsible for its imbalances) and Balance Service Providers (BSP, i.e., the provision of bids for balancing) and need the development of new and more integrated methodologies. The main idea of this research work is providing new tools to aggregators for a best management of demand and generation in markets, both in the short-term (around 24 h) and in the very short-term (from several minutes to 1 h), to evaluate net demand unbalance, while the aggregator or other parties take into account gate closure times. To perform this task, the proposed methodology takes profit from different databases which should be able for DR (demand, customer, RES, and weather). From these databases, this work applies different methodologies to obtain well fitted 24 h forecasts for demand and PV generation, and with these predictions aggregators evaluate bids and offers to be sent to Intra-Day or Real Time Energy Markets (with the objective of minimizing the energy costs for customers or prosumers). Logically, both models (demand and PV) exhibits errors and these errors can involve penalties in the markets

because other agents (BSP, LSE) should change their energy balance and buy or sell energy in the very short term. Considering that PV-forecasts usually have a greater error than demand-forecast, and that a fast model is needed to manage the potential flexibility of demand in the aggregator-side, a simple and very short-term PV forecast model is developed based on historical recordings of PV generation (in the site) and available real-time measurements (Information and Communication Technology, i.e., ICT devices). This model and the results of cassation processes in short-term energy markets provide a reference signal for flexible resources (only DR resources are considered in this paper). With the help of different tools and scripts from a DR-toolbox (e.g., segmentation, classification, disaggregation and modeling), the aggregator can evaluate the "demand baseline" for different end-uses in the short-term and can propose a control signal to change demand according to its requirements. This demand is simulated and evaluated hour by hour with several indices of performance (and modified in some cases) to fit the demand packages offered to energy markets (i.e., net demand). In other cases, when the power system tackles for flexibility, the aggregator can provide additional flexibility to energy and ancillary services markets, agents or Transmission Operators.

Figure 1 presents an overview flowchart, which depicts the methodologies and tools to be used through the paper.

**Figure 1.** Methodology flowchart.

A quantitative analysis for demand-side flexibility seems necessary thorough the definition and the evaluation of some indicators (i.e., DR indices in Figure 1) that allows to score the flexibility and performance of loads being controlled, basically at the aggregated level. These indicators converge with the idea of some voluntary schemes in the EU that intend to express the "readiness of a building" (in this case the readiness of the load inside buildings). According to these proposals [31], these indicators mean: "readiness to adapt in response to the needs of the occupants, readiness to facilitate maintenance and efficient operation, and readiness to adapt in response to the situation of the energy

grid". Taking into account this last requirement, a score is performed through the indicators to be described in Section 3.4.3.

#### *2.2. Characteristics of the Customers: Demand, Photovoltaic Generation, and End-Uses*

All electricity customers from a small town (4400 inhabitants), sited in the north Spain, have been selected for simulation purposes in this work. These customers include residential, commercial, and industrial clients, although most of the power consumption is due to the residential ones. Basically, this group corresponds to average residential customers in the south countries of the EU. The rated power per customer ranges from 3 to 8 kW. The climate is continental, and winter temperatures range from 0 to 13 ◦C and in summer from 13 to 29 ◦C.

Regional investors built a PV plant in the vicinity of the town, with a significant capacity with respect to its power demand. The PV plant is composed of two-axis solar trackers with a rated power of 1.9 MW, and it is connected to the same power substation that links the town to the grid.

Hourly load and photovoltaic generation data from 1 October 2008 to 31 March 2011 (both included) were available. These data were obtained from the electric utility distributor and corresponded to hourly average power measurements in the substation. It is worth mentioning that it has been very difficult to obtain real data corresponding to a considerable customer group that can act as prosumers (consumers and producers); thus, we had to manage data not as recent as desired.

Figure 2a shows the winter and summer loads for two selected workdays monitored at the distribution level (substation) that supplies power at 13.2 kV to the distribution transformer centers (CT) of customers (basically residential and commercial supply). Figure 2b shows the average temperature in the area for the same two selected workdays. Figure 2c plots the hourly PV power generation on the peak production days (days with the highest energy generation) of January and June 2010. The PV power generation values in the central hours of the day can mean an important fraction of the town consumption (30–40%). The selection of January and June as representative months is due to the fact that June corresponds to the month with the highest PV generation, while January corresponds to the lowest PV generation and the highest energy consumption. In Figure 2, it is also shown the average profiles of demand, temperature and PV generation for the period in which data is available (from 1 October 2008 to 31 March 2011). It is remarkable that the average PV generation profile (Figure 2c) is lower compared to the other ones (June and January). This fact can be explained because months' profiles are representing the peak power days, while the average profile includes days in which there are no PV generation due to adverse weather conditions.

**Figure 2.** Load, Temperature and PV Generation profiles: (**a**) example of winter and summer Load Curves in the power substation; (**b**) example of winter and summer temperature behavior; (**c**) example of winter and summer power generation in the PV plant on peak production days.

The use of DR portfolio for damping both the errors in the evaluation of demand in short-term and the intrinsic variability of PV generation sources need the evaluation of DR potential and its flexibility in each customer segment. First, this evaluation must be based on the knowledge of end-uses for an average customer. The first alternative to know demand composition behind-the-meter is the use of the information provided by Smart Meters (SM) and then apply some Non-Intrusive Load Monitoring (NILM) methodology, for example [32]. This last approach involves the full development of capacities of available home automation technologies, considering the increasing deployment of Smart Meter in several countries around the world [33]. Some of these methodologies have been reported by authors in previous papers [34] to obtain end-use disaggregation/profiles in residential segments and their real flexibility when DR policies are applied (i.e., DR validation).

In some cases, and from a practical point of view, it is possible that some problems arise for a practical implementation of DR based on end-uses, for instance: small customers do not have yet any SM, confidentiality of data is in question, Data Exchange Platforms (DEO) are not fully developed, and the availability of data is scarce [35] or the aggregator has access to meter data but without the necessary granularity or quality (i.e., it is usual to have data with granularity ranging from 15 min to 1 h which usually makes much more complex the identification of loads through NILM methods). In this way, an alternative access to demand data should also be considered by aggregators to accomplish the evaluation of DR potential. This alternative is based on periodic surveys performed by international or national Energy Agencies, for example EIA (data from 2015, [36]) in the United States or the Joint Research Centre (data from 2016, [37]) in the European Union. In this way, a residential "average" EU or USA customer (and its end-use share) can be estimated according to these data. In the case under study, available reports from the Institute for Energy Diversification and Energy Savings (IDAE, Spain) and the Spanish Government [38] have been analyzed. Table 1 depicts the main end-uses for Spain, EU-28 countries and the USA. Notice that in European Mediterranean countries, the Air Conditioning load represents a higher percent (66% of households have this appliance and the increasing trend is quite solid). A similar trend can be reported in the USA, because 87% of homes use air conditioning. It accounts for 12% of annual residential energy expenditures and is a large factor in fluctuations in residential electricity use. Heat Pumps (HP) exhibit similar trends according to EIA estimations [36]: the share of heated homes using HP increased from 8% to 12% in a decade (from 2005 to 2015). At the same time, the share of homes using electricity for water heating (WH) increased by 7% (to 46%). Due to this fact, both loads (HP and WH) have been considered to evaluate demand flexibility in this work. Moreover, winter period has been selected for simulation purposes in the following paragraphs, because demand in winter is higher than in summer and the climate in this Spanish area is more restrictive for PV generation possibilities.

**Table 1.** Main end-uses share in the residential sector according to some estimates in the USA [36], EU-28 [37], and Spain [38] adapted and updated from [39].


\*: Appliances and Electronics.

To obtain some representative profiles, it seems necessary to evaluate load dynamics, and the service the customer obtains from them. Figure 3a,b shows some real end-use load profiles for a household belonging to the overall customer demand, previously shown in Figure 2. In this case, feedback from everyday activities [40] of the customer is important to refine profiles, improve DR&EE (Demand Response and Energy Efficiency) success and gain customer interest in energy concerns. Regulations can help aggregators to establish load profiles. Figure 3a shows an average HP consumption profile in winter, as in this study, DR simulations to balance generation are focused on this period. Figure 3b shows an average water flow use to determine WH requirements extracted from EN 15316-3-1, Section 5 (EU normative). Figure 3c shows the proposed end-use profiles for an average customer.

**Figure 3.** Daily end-uses and customer service: (**a**) HP profile; (**b**) daily use of residential Water Heater. (**c**) Daily profiles for main end-uses (winter). Acronyms: EH: Heating (Electric heaters and Heat pumps), WH: Water Heating; CO: cooking; LIG: lighting; FR: fridges; WM: washing machines; DW: dishwasher; OV: Oven; PC: computers; DRY: dryers; OT: Others; and CUST: overall customer demand.

The procedure for obtaining end-uses profiles (Figure 3) could be explained as follows. In the first place, the aggregator needs to recover basic information about customer daily overall demand (aggregated or not) through Smart Metering (Figure 1, left bottom side). This information, alongside weather databases and public reports of energy household demand and share of end-uses in terms of energy, allows the aggregator the calculation of "household baselines". At this point, the aggregator is able to run and refine Physically Based Load Models (PBLM) both at elemental and aggregated levels (i.e., include inputs/outputs for these models). With PBLM and average weather inputs the aggregator obtains "end-use baselines" for each end-use with relevant potential for DR (e.g., HVAC, space heater, WH, or thermal ceramic heaters, Figure 1), and their average daily demand in each season/month. Finally, "elemental baselines" (kWh and profiles) are modulated through coefficients (considering weather conditions and customer behavior) to fit the "overall baseline" for the customer.

#### *2.3. Short-Term Forecasting Methods*

In this section, the forecasting methods used to predict hourly load and PV generation are described.

#### 2.3.1. Random Forest

Random forest is an ensemble method based on regression trees whose efficiency in load forecasting has been widely illustrated [15]. Being based on regression trees makes random forest a flexible method in case of complex and non-linear relationships, and as an ensemble method, it can provide low bias and reduces the variance of predictions.

Some other ensemble methods based on regression trees are bagging, conditional forest or boosting, whose efficiency in load forecasting has been shown in different papers (see for instance [41]).

Random forest is a generalization of bagging (bootstrap aggregating), but only a random sample of "*mtry*" predictors can be chosen at each split of the regression tree. This approach will reduce the variance of the estimations more than bagging, mainly in the case of correlated predictors. In this paper we have decided to use random forest to predict the electricity consumption instead of conditional forest of boosting due to its simplicity in parameter calibration and fast computation.

The efficiency of random forest depends on a suitable selection of the number of trees N and the number of predictors "*mtry*" tested at each split. However, random forests will not overfit when N increases, thus, we can focus on calibrating only the parameter "*mtry*". The calibration of the parameter and the random forest method have been implemented throughout the R package "caret", see [42].

#### 2.3.2. Stochastic Gradient Boosting (SGB)

Another ensemble method based on regression trees is the stochastic gradient boosting. It has been successfully used in short-term load [43] and PV power [44] forecasting applications. The SGB method is based on the sequential construction of additive regression models, usually in the form or regression trees with a maximum tree size. At each iteration, the models are fitted, by least squares, to a random sample of pseudo-residuals of the previous stage. Thus, the SGB method applies a gradient descent algorithm, reducing the error (difference between output value and expected value) at each iteration.

The develop of an SGB forecasting model requires the selection of the values for a set of tuning parameters which include the number of trees (also called as iterations), the interaction depth, the shrinkage or learning rate, the minimum number of observations in terminal nodes of the trees, and the bagging or sampling fraction. Unlike the random forest method, SGB models with many trees are prone to overfitting; thus, that number must be carefully chosen. The complexity of the SGM model is related to the interaction depth which corresponds to the maximum size of each tree. The shrinkage parameter manages the influence of each sequential tree on the final value provided by the SGM model. The minimum number of observations in terminal nodes or leaves of the trees limits the observations used to provide their mean value as the response of that terminal node. Finally, the bagging fraction corresponds to the fraction of the training dataset observations randomly selected to build each tree (small values reduce the possibility of overfitting, but increase the model uncertainty). For a detailed description of the SGB method, see [45].

#### *2.4. Time-Series Clustering*

Clustering is an unsupervised technique whose objective is to separate objects (represented by a multivariate dataset) into homogeneous groups (called clusters), such that objects in the same cluster have high similarity among them, but low similarity with the objects in a different cluster. It is considered an exploratory technique very useful by itself or as a previous step for other kind of data analysis. Depending on the way the clusters are generated, clustering methods can be divided in two big sets: hierarchical methods and divisive methods. In addition, the resulting clusters are determined by the distance or similarity measure and the linkage method selected.

A special case is time-series clustering, where each object to be grouped corresponds to a sequence of values as a function of time. One of the main advantages of clustering time-series is that it allows the discovery of hidden patterns in time-series datasets. Generally, three different objectives can be considered in this context: finding similar time series in time, in shape, or in change (structural similarity). The selection of a suitable distance measure is essential and depends on the objective pursued. Interesting surveys in the field can be found in [46,47].

In this paper, we have focused on similarity in shape to cluster the daily curves of photovoltaic generation. Dynamic Time Warping (DTW) distance, described below, was chosen for that purpose. Regarding the nature of the clustering, hierarchical technique together with average linkage were selected. These clustering methods were developed by means of the R package TSclust [48].

DTW distance, introduced by [30], is commonly used for measuring shape-based similarity between two time series, which may vary in timing. The main advantage against other shape-based distances such as Euclidean or Wavelet Transform is its invariance to warping. In our context, daily curves of PV generation are conditioned by sunlight hours, which vary along the different seasons. That makes DTW distance suitable for clustering a set of daily PV curves along different years.

Given two time series (*xi*) *I* = 1, ... ,*m* and (*yj*) *j* = 1, ... , *m*, it starts calculating a *nxm* matrix *D* = (*Dij*) with the distance between every possible pair of point *xi* and *yj* in the two time series, *Dij* = *d*(*xi,yj*), *I* = 1, ... , *n*, *j* = 1, ... , *m*, where *d*(*xi,yj*) can represents the Euclidean or the absolute distance. According to [30], a warping path *w* is a contiguous set of matrix elements which defines a mapping between (*xi*) and (*yj*) that satisfies the following conditions:


The objective in DTW distance is to find the shortest warping path. Due to its high computational cost, different approaches have been proposed to optimize the calculation (see [49]).

#### *2.5. Demand Response Strategies*

Demand Response policies have been used by ISOs since the early 1980s. In the first years of DR, the objective was to achieve a more rational planning and operation of resources. In recent years, with the development of energy markets and the increasing share of RES in the generation mix, DR becomes more centered in the customer and in the integration of the available RES potential. Demand Response can be divided into explicit and implicit DR. Implicit DR means the change of demand due to prices whereas explicit DR involves the change of demand when System or Distribution Operators (i.e., ISO or DSO) forecast and declare an event into the system in the short-term.

To respond to these events or prices, the most common policy is to limit demand. This reduction can be performed through the cycling of power supply (the supply is switched ON and OFF alternatively following a rectangular wave *u*(*t*)). If the natural "cycling" of the end-use being controlled, *m*(*t*) (the operating state of the control device), is greater than *u*(*t*), the DR action is effective (notice that an appliance can describe cycles or not, for example a fridge or an inverter heat pump, but every load has its operating state *m*(*t*) with respect to rated power). Considering that, in practice, demand measurements are discrete (every 5, 15 or 60 min) and it is necessary to define average values in a time period [*t*, *t* + *k*]. Mathematically, Equation (1):

$$m(t) = \frac{P\_{cu}(t)}{RP\_{cu}}; \overline{m}(t, t+k) = \frac{1}{kRP\_{cu}} \int\_{t}^{t+k} P\_{cu}(t)dt; \ \overline{u}(t, t+k) = \frac{t\_{ON}}{k} \tag{1}$$

where *m(t*, *t* + *k*) and *u(t ¯* , *t* + *k*) are mean values of *m*(*t*) and *u(t*) in the time period k, respectively, and *tON* is the time in this period where a "representative" (average) load remains switched *ON* and demands power.

The models to be used (to obtain *m*(*t*) and apply *u(t*)) are PBLM, a methodology proposed first to solve problems such as cold load pickup. The main reason for this choice is that these models are "white" [50] or "grey" [51] models which allow physically explaining the dynamics of the appliance and its environment and consequently foreseeing its changes. In this work, PBLM "grey" models for HVAC (Heating, Ventilation and Air Conditioning) and WH loads (heating and ventilation) previously proposed in [39] have been used. Figure 4 shows an electrical-thermal equivalent for this model for heating loads (a broader explanation of parameters can be found in several references [52,53]).

**Figure 4.** Example of PBLM for: (**a**) HVAC (Heating, Ventilation and Air Conditioning); (**b**) WH (Water Heater).

The main features of these models are: they consider heat gains and losses, for instance solar radiation (Hsw, Hw) or internal gains due to inhabitants (Hr) or appliances (Ha) (Figure 4a); the model takes into account heat storage from the specific heat of external walls (Cw), indoor masses (Ca, C1 and C2, especially important for WH) or roof/ground (Crg); and it considers the control mechanisms which drive appliances (for instance thermostats *m*(*t*) and DR policies *u(t*)). Moreover, their state variables are temperatures: indoor (Xi), walls (Xw), roof/ground (Xrg) for HVAC loads (Figure 4a), and X1, X2 for the stratification of water in the reservoir ("hot" WH1 and "cold" WH2 sub-tanks, Figure 4b), that is to say, characteristics that allow the evaluation of energy flows and storage capabilities (i.e., the indirect capacity of storage in the envelope of buildings), the direct storage in WH or the loss of customer service due to the application of DR (i.e., internal or hot water temperature).

These models are individual ones and need a further aggregation to reach a minimum demand level (size of reduction packages) established by specific regulations of electricity markets to bid or offer into these markets (e.g., from 100 kW to 1 MW depending on each specific service or market [54,55]). This task is often developed by energy aggregators.

To achieve these packages, the aggregator needs to rise ON-time to increase demand of each specific end-load whereas a decrease of demand requires a reduction of ON-time. The second alternative (the traditional one) is easier because the aggregator only needs to manage the rate of switching-OFF and switching-ON times of the power supply to load. This is easy to perform through hardware by classical methods (e.g., ripple control of WH in Germany, [56,57]) or home automation methods (e.g., controlled plugs through Z-Wave protocols [58] and universal software platforms for control [59]).

An important concern for the practical implementation of modern DR policies is the Automated Demand Response (ADR), because customer manual control does not fit the requirements both of accuracy and reliability of response. It is imperative for the success of ADR the development of standards for the communications between operators, aggregators and their customers' automation equipment [60] and the feedback from them. Open automated demand response protocol (OpenADR [61]) represents a good example of such a standard. Every day, more and more devices are certified to use OpenADR 2.0 protocols, and especially Smart Thermostats, but this certification is not necessary if some gateway assumes the role of "last-mile" controller and is compliant to receive and transmit OpenADR protocols. For example, home automation platforms such as Universal Devices ISY994 Series [62] allows the communication of residential customers with OpenADR, sending consigns and commands to home automation technologies working with different protocols (Zwave+, Insteon/X10, Zigbee Pro, Amazon Echo or Google Home). Other platforms, from well-known manufacturers, such as ABB SACE's Emax 2 Power Controller, develop similar functions [63] but at building scale. Examples of ADR capabilities of grid-integrated buildings and building microgrids, architecture, and standards can be found in [60].

The rate of change is defined to the PBLM software by PWM waves: the carrier wave being tried has a frequency of 0.833 mHz (i.e., 1 cycle every 20 min) and the modulating waveform is the desired decrease in the average value in *m*(*t*, *t* + 20 min) according to deviations between 24-h PV forecasts (see Section 3.2) and 1-h PV forecasts used in markets (Figure 1).

The reasons for choosing 20 min have physical and technical senses. The first can be explained from the point of view of load service in the case of HVAC: if a harsh control is needed, switch-OFF times greater than 20 min can cause thermal fluctuations in the dwellings, easily noticeable by consumers (this can produce a lack of customer engagement in DR). The second reason is the so-called "lock-out" or mechanical delay of heating and air conditioning units. This mechanism is used to prevent a rapid recycling of the compressor avoiding mechanical damages. From the point of view of DR, it can cause an additional delay when applying ON/OFF and thermostat control signals. To evaluate the effect and characterize (from a statistical point of view) this process, some residential HP appliances (rated power from 1 kW to 3 kW) were monitored by authors. Changes in customer demand due to control actions have been recorded by an electronic meter and several Z-wave wall plug switches which send data to PCs using an USB gateway. Results depict that ON latency time ranges from 20–60 s and OFF latency times range from 10–40 s [64]. That is to say, the minimum ON-time should be in the range of one minute to limit inherent errors due to latency.

The first alternative (i.e., the increase of demand) is a less traditional option for DR [65]. Several reasons explain both the lack of use of these alternatives and their real interest. For instance, the increase of demand requires the control of thermostats. This control is more expensive than the supply control because smart thermostats are expensive. The cheapest option (e.g., Z-Wave) cost around € 150–200, whereas a remote switch costs around € 40–60. Fortunately, modern appliances include control of temperature though mobile-phones or PC, and these alternatives can be used for control (notice that some of them are compliant with well-known DR standard protocols [61]). In other cases, where the control device is intrusive (this is the case of WH), the cost of thermostat is similar, but the same maintenance (labor cost) is needed to include this option in the appliance. Over the last few years, some HPWH manufacturers in the USA have included these options for large units (200–500 L reservoir/storage tanks), for example [66].

The control of the thermostat (up or down, according to season, and usually used for pre-heating or pre-cooling policies in the dwelling being conditioned) has been proposed as a "virtual-storage" resource [67] for customers to take profit from Time of Use (ToU) tariffs or to "prepare" loads to face to DR events policies and maintain customer service (i.e., internal temperatures of houses or dwellings).

Usually, these policies have been used before the load is controlled, but the proposal in this paper is to use them continuously to adapt demand to changes in the forecasted PV generation. The control of the thermostat is evaluated and changed, if necessary, ±0.5 ◦C every 20 min. The reason for selecting this value is that 0.5 ◦C is a usual value for the change of temperature settings in home smart-thermostats.

In this way, the proposed control strategy *u*(*t*, *t* + *k*) for heating is done by Equation (2):

$$u(t,t+k) \begin{cases} > m(t,t+k) \rightarrow X^{\text{sup}} = \begin{cases} X^{\text{sup}} + \Delta X; \ u(t,t+k) > u(t-k,t) + \frac{db}{2} \text{ and } X^{\text{sup}} < X^{\text{lim}} \\\ X^{\text{sup}}; \ u(t,t+k) > u(t-k,t) + \frac{db}{2} \text{ and } X^{\text{sup}} \ge X^{\text{lim}} \\\ X^{\text{sup}}; \ u(t-k,t) - \frac{db}{2} < u(t,t+k) < u(t-k,t) + \frac{db}{2} \\\ X^{\text{sup}} - \Delta X; \ u(t,t+k) < u(t-k,t) - \frac{db}{2} \\\ < m(t,t+k) = \begin{cases} P\text{WM}(\Delta P V \text{gen}); \ X\_{i}(t-k,t) > X^{\text{sup}}\_{i} \\\ u(t-k,t); \ X\_{i}(t-k,t) < X^{\text{sup}}\_{i} \end{cases} \end{cases} \tag{2}$$

where *Xsup* is the upper temperature of load's thermostat, which is set as a simple hysteresis cycle with dead-band *db* (usually ranging from 0.01–0.03 pu), and *Xlim* is the maximum reasonable temperature inside the dwelling (for example 22–23 ◦C in the case of HVAC, in winter) or the maximum temperature of water inside the tank (68 ◦C), which avoids risk of burns if a mixing valve is not used for the control of hot water pipeline. *Xi serv* is a minimum service level for the appliance (a minimum comfort temperature inside the dwelling, for example 16 ◦C, or a minimum temperature of hot water inside the WH, for example 36 ◦C).

Basically, Equation (2) means that load control is done by a double control. In the case of heating (electric heaters or HP) if the load must go up, the thermostat goes up until it reaches the maximum allowable value (*Xlim*). Otherwise, if demand must fall to balance a decrease in PV generation (with respect to 24 h forecast), the thermostat or the supply is controlled to reduce demand. Notice that a "baseline", (i.e., load demand evaluated without control *m*(*t*, *t* + *k*)) is also needed as reference for controlled load. This baseline also comes from PBLM models (Figure 1).

#### **3. Results and Discussion**

#### *3.1. Prediction Results for the Electricity Consumption*

In this subsection, we provide 24-h-ahead predictions for the electricity consumption of the Spanish town in order to apply them to the context of Demand Response. For that, the ensemble method random forest described above was applied and some other aspects such as predictors importance or parameter selection was also developed.

On the one hand, it is well known that hourly loads of the previous days are the most important factors in short-term load forecasting. On the other hand, temperature is a factor that might affect the electricity consumption (cooling and heating of buildings). Therefore, prediction of hourly temperature for the location of the town under study was also used as an input in the load forecasting model, obtained as explained in the Section 3.2. In addition, several calendar variables such as the hour of the day, the day of the week, the month of the year and holidays have been taken into account in the design of the load forecasting model. Table 2 depicts the 49 predictors used for the load forecasting model: 23 dummies for the hour of the day, six dummies for the day of the week, 11 dummies for the month of the year, one dummy for holidays, one predictor of the forecasting temperature and seven predictors of historic loads (lags 24 h, 48 h, 72 h, 96 h, 120 h, 144 h, and 168 h).



Before any analysis, a previous data filtering has been developed in order to detect and substitute missing cases or measurement errors. Moreover, in all cases the training period for model fitting ranges from 1 October 2008 to 31 November 2010, whereas the test period ranges from 1 January 2011 to 31 March 2011.

Three different measurements, given in Equations (3)–(5), were used to obtain the accuracy of the forecasting models: the root mean square error (RMSE), the R-squared (percentage of the variability explained by the forecasting model), and the mean absolute percentage error (MAPE).

The root mean square error is defined by:

$$\text{RMSE} = \sqrt{\sum\_{t=1}^{n} \frac{\left(y\_t - \hat{y}\_t\right)^2}{n}} \tag{3}$$

The R-squared is given by:

$$\text{IR}-\text{squared} = 1 - \frac{\sum\_{t=1}^{n} \left( y\_t - \hat{y}\_t \right)^2}{\sum\_{t=1}^{n} \left( y\_t - \overline{y} \right)^2} \tag{4}$$

The mean absolute percentage error is defined by:

$$\text{MAPE} = \frac{100}{n} \sum\_{t=1}^{n} \left| \frac{y\_t - \hat{y}\_t}{y\_t} \right| \tag{5}$$

where *n* is the number of data, *yt* is the actual load at time *t*, *y*ˆ*<sup>t</sup>* is the forecasting load at time *t*, and *yt* is the mean value of the actual load. A slightly variants of this measure are the mean absolute error (MAE) and the cumulative absolute error (CAE).

Taking into account that the accuracy for special days (weekends and holidays) is usually lower than for regular days, the above goodness-of-fit measurements were obtained separately for each group of the test data.

Parameter tuning in random forest mainly refers to select an optimal number of trees (*ntree*) and an optimal number of predictors considered at each split (*mtry*). In fact, the selection of parameter *ntree* is quite easy because higher values do not lead to overfitting; thus, only a high enough value is needed. The optimal *mtry* = 12 was obtained by means of 10-fold cross-validation with three repeats and using a random grid with nine values (among the 49 possible).

Table 3 shows the goodness-of-fit measures for the training and test datasets after applying random forest with *ntree* = 200 and *mtry* = 12, and even for regular and special days separately. Furthermore, the importance of each predictor in the forecasting model has been obtained through the node impurity, getting that the electricity consumption at the same hour of the previous week (predictor *LOAD\_lag\_168*) is the most important predictor and that the following five most important ones are also historical loads. Temperature was in the 11th position of importance.

**Table 3.** Goodness-of-fit measures for regular and special days in the training and test datasets.


Figure 5 represents the evolution of the goodness-of-fit measures for each hour of the day, where the best accuracy is reached early in the morning.

As an example, Figure 6 shows the actual and predicted load for a complete week in the test dataset.

**Figure 5.** Goodness-of-fit measures by hour of the day in the test dataset: (**a**) using RMSE; (**b**) using MAPE.

**Figure 6.** Actual and forecasting load for a week (14–20 February 2011).

#### *3.2. Prediction Results for the Photovoltaic Generation*

In this subsection, we describe the short-term PV power forecasting model able to offer 24-h-ahead predictions for the PV plant placed in the town under study in the context of Demand Response. As mentioned above, this forecasting model is based on the SGB method, which allowed the creation of the forecasting model with the lowest RMSE from a set of models developed with techniques such as linear multivariate regression, artificial neural networks, random forest, and support vector machines. The SGB method was selected because it achieved the lowest RMSE with a five-fold cross-validation procedure with the training dataset. All the forecasting models used the same training dataset, and their parameters were optimized following a similar procedure to the described, for the SGB, in the following paragraphs.

Table 4 shows the explanatory variables used to develop the PV power forecasting model. The dependent or output variable was the hourly power generation in the PV plant for each hour of the day (only daylight hours were considered). The explanatory variables include hourly weather predictions obtained with the Weather and Research Forecast (WRF) mesoscale model [68], a numerical weather prediction (NWP) model able to produce forecasts for a geographical region with the desired spatiotemporal resolution. In order to produce the weather forecasts, the WRF model is started every day with the initial and boundary conditions provided by the forecasts of the GFS model, an NWP model with global coverage, run and maintained by the National Centers for Environmental Prediction (NCEP) from USA. From the values provided by the GFS model with a 1◦ × 1◦ (latitude–longitude) spatial resolution for the 00:00 Universal Time Coordinated (UTC) cycle, the WRF model provided predictions of weather variables over the region where the town under study is located with a time resolution of one hour, and a spatial resolution (distance between points of the grid of analysis) around 12 km. The forecasts of the desired weather variables for the location of the PV plant, or for the location of the town, were calculated by bilinear interpolation from the forecasts for the four nearest grid points. For a real operation, these weather predictions can be available for a new day before dawn and include all the forecasts for the 24 h ahead.


**Table 4.** Explanatory variables for the PV power forecasting model.

The WRF model provided the hourly values of most of the 17 explanatory variables of Table 4 (variables from *swflx* to *clear*). The variable *swflx* corresponded to the global horizontal irradiance. Wind speed and direction were included because of the effect they could have on the temperature of the PV panels and, therefore, in their efficiency. The *aghi* variable matched to the average value of the forecasts for two consecutive hours of the *swflx* variable, that is, the average value of the global horizontal irradiance throughout the last hour. The *aip* variable corresponded to the average value of the irradiance on the PV panel throughout the last hour and it was calculated considering the characteristics of the PV panel with two-axis trackers and the solar geometry, as the aggregation of the direct normal and total diffuse irradiances on the tilted surface of the PV panels. The direct normal irradiance was obtained with the Erbs model [69] using the values of *aghi* and the total diffuse irradiance was calculated by means of the King model [70]. The variables *h1* and *h2* were used to code the hour (on UTC hour basis).

In order to select the best structure of the forecasting model, an optimization methodology was used based on the genetic algorithm (GA) with advanced generalization capabilities. This methodology is the GA-PARSIMONY [71], which allows the selection of parsimonious models. The main difference of this methodology with respect to the conventional GAs is a rearrange in the ranking of the individuals based on their complexities, so that individuals with less complexity (in this case, models with a less complex structure) are promoted to the best position of each generation. The promotion of less complex models with respect to the rest of individuals in the same generation with comparable fitness, allows the obtainment of models with improved generalization capability.

The GA-PARSIMONY methodology is implemented in the R package GAparsimony [72], which was the tool used to optimize the PV power forecasting models. In the case of the SGB model, the optimization process could choose the number of trees (in the range 20–250), the maximum interaction depth (range 3 to 8), the shrinkage value (range 0.001 to 0.25), and the minimum number of observations per terminal node (range 2 to 8), as well as select the input variables among those available (Table 4). The bagging fraction was fixed in 0.5. The fitness function corresponded to the negative value of the average RMSE obtained with five-fold cross-validation and three repeats with the training dataset. The complexity of the forecasting models evaluated in the optimization process was ten times the number of input variables used by the model plus the square of the maximum interaction

depth. The number of individuals per generation was 50, the maximum number of generations 100, and the re-rank error value 0.1 (individuals with lower complexity and difference in the fitness value lower than the re-rank error were promoted to top positions in the ranking of each generation). The final model, achieved after the optimization process, used only 11 input variables (*swflx*, *hr*, *pres*, *prec*, *mod*, *clear*, *cfm*, *temp*, *dir*, *cfl,* and *h2*), had 182 trees, its maximum interaction depth was 6, its shrinkage value was 0.1197, and the number of minimum observations in a terminal node was 3. Table 5 shows the goodness-of-fit measures for the training and test datasets, where the RMSE, R-Squared and MAPE values are calculated using Equations (3)–(5), taking as *yt* the actual PV generation power at time *t* and *y*ˆ*<sup>t</sup>* the forecasted value for such hour. The high MAPE values are mainly due to very low actual PV generations in early and late daylight hours, when a small forecasting error can correspond to a very high absolute percentage error value.



Figure 7 plots the actual and forecasted hourly PV power generation values for a week in the testing dataset. Notice that the forecast is carried out each day before dawn and, up to now, no correction is applied along the day.

**Figure 7.** Actual and forecasting PV power for a week (14–20 February 2011).

#### *3.3. Classification Results of Photovoltaic Curves*

The classification stage of the proposed method has been carried out in three steps: firstly, the predicted daily curves of PV generation corresponding to the training period (from 1 October 2008 to 31 December 2010) are clustered into homogenous groups using DTW distance and average linkage; secondly, the "desired" cluster is selected (the one whose predicted PV curves better fit the real PV curves) and a centroid curve for each resulting cluster is obtained; finally, each predicted daily curve in the test period (from 1 January 2011 to 31 March 2011) is classified into the nearest cluster by computing

its DTW distance with each centroid curve. Those days in the test dataset of which the predicted PV curves are classified into the "desired" cluster will be the ones selected for applying DR policies.

First step described above implies the hierarchical clustering of 822 time series (number of days in the training dataset) of length 24 (hourly data). The resulting dendrogram provided five possible groups, whose centroid curves are given in Figure 8 (observe that the main difference among the curves falls on the magnitude of the generated energy, except for the fifth cluster). For each of the five resulting clusters, we computed the median of the percentage error for all days in the clusters, obtaining 19.36% for Cluster 1, 39.95% for Cluster 2, 77.41% for Cluster 3, 57.09% for Cluster 4, and 48.15% for Cluster 5. Therefore, Cluster 1 provided lower fitting errors than the rest of clusters, and hence, it was considered the "desired" cluster for our purpose. As the desired cluster provides lower errors, those forecasting PV curves of the test dataset that are classified into the desired cluster are expected to better fit the real PV curves than the ones that are classified into any of the not desired clusters.

**Figure 8.** PV centroid curves of each cluster in the training dataset.

The results obtained in the final step of the classification stage were the following. A total of 31 days in the test period were classified into Cluster 1 (the "desired" cluster), whose dates are given in Table 6 (recall that all days refer to year 2011) and some examples comparing the real PV and predicted PV (24 h ahead) are given in Figure 9.


**Table 6.** Dates of the test period (2011) classified into Cluster 1 (and therefore selected for DR actions).

**Figure 9.** Comparisons of actual PV and 24 h forecast PV generation for some days in Cluster 1: (**a**) day 3; (**b**) day 4; (**c**) day 9; (**b**) day 28.

#### *3.4. Results for Demand Response Strategies*

#### 3.4.1. Very Short-Term PV Adjusted Forecasting

As was seen in Sections 3.1 and 3.2, the main accuracy problem when obtaining a 24 h-ahead forecast of net demand is due to errors in the prediction of PV generation (compares Figures 6 and 7), because this forecast is carried out each day before dawn, and no correction is applied during the day. The problem of balancing net load with respect to 24 h predictions should be taken into account by aggregators, because any imbalance can produce an important money flow from aggregators to Balance Service Providers (BSP) or Load Serving Entities (LSE). For this reason, a very short-term correction has been proposed. The idea is similar to the procedure used by ISOs to correct demand when some power event is declared in the system taking into account measurements of demand before the correction (i.e., the generation of customer's baselines or CBL, for example [73]). In these methods (used for the retribution of demand response in reliability programs), the adjustment factor is obtained by means of the first two hours of the four-hour period prior to the commencement of the reliability event. In our case, the methodology to correct the forecasting values should balance accuracy and fast computation (it takes less than a minute to have enough time for the load control processing). In this stage, we propose the determination of an adjustment factor by means of the first 60 min of actual PV records of the one-and-a-half-hour period prior to the forecast window (it is assumed that PV generation has an SM able to record every minute and that sends this information to the aggregator). This adjustment factor is evaluated through the Equation (6):

$$af(d,t) = \frac{\sum\_{k=30}^{k=90} PVA(d, t-k)}{\sum\_{k=30}^{k=90} PVF(d, t-k)}\tag{6}$$

where *af*(*d*, *t*) denotes the adjusted factor for the time t of the current day d used to fix generation forecasts, *PVA*(*d*, *t* − *k*) is the actual PV generation *k* min before, and *PVF*(*d*, *t* − *k*) is the predicted 24 h PV generation for day d at time *t*–*k*. Then, a first approximation of the forecasted PV baseline (*PVBL\_aux*) is computed in the time interval [*t* − 30, *t*] to obtain [*t*, *t* + 30] values (that is, predictions are corrected in a time-window corresponding to very short-term according to the reduction of imbalances. For the participation of customers in other markets, the window can be enlarged according to gate closure times) through Equation (7):

$$\text{PVBL\\_aux}(d, t + k) = af(d, t) \times \text{PVF}(d, t + k); k = 1, 2 \dots \text{30 min} \tag{7}$$

To improve the goodness of this 1 h-ahead forecast, *PVBL* is compared with the average of historical values of PV generation in the last two weeks:

$$PV\_{upl}(d,t) = \frac{1}{450} \sum\_{j=1}^{15} \sum\_{k=-15}^{k=15} PVA(d-j, t+k) \tag{8}$$

where *PVupl* is the upper limit considered as acceptable for any correction through the adjusted factor *af*(*d*, *t*). Therefore, Equation (7) is improved by Equation (9):

$$PVBL(d, t+k) = \begin{cases} \text{PVBL\\_aux}(d, t+k), \text{ if } \text{PVBL\\_aux}(d, t+k) < \text{PV}\_{\text{upl}}(d, t+k) \\ \text{PV}\_{\text{upl}}(d, t+k), \text{ otherwise} \end{cases}; k = 1, ...30 \tag{9}$$

Results showed that the proposed correction by Equation (9) suits well its objective (Table 7 depicts the MAPE of the 24 h-ahead forecasts (*PVF*) and the 1 h-ahead forecasts (*PVBL*) for some representative days in Cluster 1). As expected, 1 h-ahead forecasts outperform 24 h-ahead forecasts, but achieve a significant improvement on days when the most serious errors took place (days 14, 16, 17, and 23). In some cases, a small increase of errors is reported (days 5 and 26).

**Table 7.** Improvement in PV forecast attributable to the adjustment given in Equation (9).


#### 3.4.2. Balancing Net Demand through DR

Once the predictions (for customers' demand and PV generation) were calculated, and the clustering process selected the "desired" days for applying DR, the next objective was trying to adapt the net demand (difference between customers' real demand and real PV generation) to the predicted net demand made for the day ahead (Figure 1).

In order to achieve this aim, DR policies were applied to two flexible loads: WHs and HVACs. The reason of choosing these loads is their facility for implementing control strategies by changing the thermostat temperature and their ability to act as thermal energy storage systems (by doing preheating of water in the case of WH and precooling and preheating of rooms/walls in the case of HVAC), see Section 2.5. DR policies were applied through PBLM and aggregation models [52], with the aim of

ensuring that the final net consumption is adapted in a significant extent to the profile of net energy demand predicted the day before, so that it was not necessary to trade additional resources into the wholesale electricity market or to pay BSP.

The planning of DR actions that have to be performed was obtained hour-by-hour. That is to say that DR actions for each next hour were planned, taking into account the differences between predictions made for 24 h ahead and predictions made for 1 h ahead. As forecasts for electricity consumption are more accurate than PV forecasts, in this study, the predictions made for 1 h ahead only estimated the PV generation, thus, DR strategies only acted in order to manage the PV forecasting error. This fact means that the control was performed in the period in which there was PV generation, that is, approximately, from 8 a.m. to 9 p.m.

The PV variations between 24-h predictions and 1-h predictions must be compensated with changes in the consumption of WHs and HVACs. As WHs and HVACs represent the 17.9% and 42.9% of the total consumption, respectively (see Table 1); 25% of the PV variations was managed by WHs loads, and the rest (75%) was assigned to HVACs.

To demonstrate the ability of the loads (HVAC and WH) to adapt their consumption and the capacity of minimizing variability between predictions and real consumption and generation, the 31 days obtained from the "desired" cluster of PV curves have been simulated. Two different examples (days 14 and 23) have been selected to illustrate the application of DR strategies and its results, being explained in detail. Later, in Tables 8 and 9, overall results and indicators for a set of representative days in Cluster 1 will be shown.

**Table 8.** Results for the net energy consumption (day 23).



**Table 9.** Results for net energy consumption (day 14).

Figure 10a,b presents the results for day 23 (21 March 2011) in which the PV generation forecasts made 24 h in advanced overestimate the real PV energy generated. As can be seen in Figure 10a, the forecasts made 1 h in advance are much more accurate; thus, they can be used as a baseline curve to plan the DR actions (Table 7 presents overall results). Figure 10b shows the effects on the net consumption forecasts. In order not to trade additional resources, the objective is to reduce the consumption in the way that the final actual net consumption matches the 24-h-ahead forecasts.

Taking into account that there is PV generation only from 8 a.m. to 9 p.m., control actions will be applied to WH and HVAC only during this period. Figure 11 presents the variations in each predicted load demand that have to be performed to obtain the desired net consumption profile.

Figure 12 depicts the load consumption after DR control strategies and the desired consumption profile from 24-h forecasts. As can be seen, DR actions work properly and the differences between the final and the "desired" load consumption have been significantly reduced.

**Figure 10.** Differences between 24-h-ahead and 1-h-ahead forecasts (day 23): (**a**) PV generation; (**b**) Net consumption.

**Figure 11.** Load demand variations between 24-h and 1-h forecasts (day 23): (**a**) WH; (**b**) HVAC.

**Figure 12.** Load consumption after DR actions and 24-h forecasts (day 23): (**a**) WH; (**b**) HVAC.

By modifying the WH and HVAC load consumption, it is also changed the temperature of water and rooms respectively. These variations can cause some comfort problems for users, thus, aggregators should assure that there are no large temperature variations and that the customer comfort is always guaranteed. This effect is considered in double control strategies defined in Section 2.5. by Equation (2).

Figure 13 shows the temperature profile of the loads. In the case of WHs (Figure 13a), the temperature is always above 45 and 50 ◦C in the cold and hot sub-tank, respectively [74]. Figure 13b shows the temperature of the rooms, the temperature of the internal walls, the temperature of the external walls, and the external ambient temperature. As can be seen, the internal temperature was always above 16 ◦C, while the maximum internal and ambient temperature (external) were 19.5 and 13 ◦C, respectively; the difference between internal and external temperature was always above 5.5 ◦C.

**Figure 13.** Loads' temperature profile after DR actions (day 23): (**a**) WH (state variables X1 and X2, Figure 4b); (**b**) HVAC (state variables *Xi*, *Xw*, *Xrg*, and input *Xext*, Figure 4a).

Finally, Figure 14 shows the net profiles for the 24-h-ahead forecasts compared with final net consumption after DR actions and the real net consumption if no DR action was performed.

**Figure 14.** Net consumption profiles (day 23): 24-h forecasts, after DR actions and without DR.

It is clearly demonstrated that DR actions significantly reduce the differences with 24-h-ahead net profile, efficiently compensating the forecasting errors and balancing the final net load consumption. Table 8 presents some numerical results from this example. All variables were calculated during the control period (from 8 a.m. to 9 p.m.).

As can be deduced from the Table 8, the total net consumption of the day is reduced from 45.96 MWh if no DR action is taken to 40.37 MWh if DR actions are applied. The differences between the 24-h-ahead forecasts and the final net consumption are also shortened: cumulative absolute error (CAE) reduces from 6.02 MWh without applying DR to 2.96 MWh with DR actions. In the same way, the percentage of error, understood as the rate between the CAE and the net consumption for 24-h-ahead forecasts, decreased from 14.27% (without DR) to 7.02% (with DR), which is a 51% of reduction.

In the next paragraphs, a different example of the DR strategy will be presented. In this case, day 14 (14 January 2011) was analyzed, where the 24-h-ahead forecast predicted less PV energy than the final real PV generation; thus, the aggregator has bought more energy than is necessary in Electricity Markets required. In order to not waste this energy, it was used to increase the temperature of WH and HVAC, exploiting their capacities to act as thermal energy storage systems. Figure 15a presents the 24-h-ahead and 1-h-ahead forecasts compared with real PV generation data. As can be seen, the 24-h-ahead forecasts have underestimated the PV generation, while 1-h-ahead forecasts have much more precision. Figure 15b shows that it is necessary to increase the final net demand to adapt the 1-h-ahead profile to 24-h-ahead forecasts, and in this way, consume the "excess" of PV generation.

Figure 16 depicts the results from the application of DR policies to the flexible loads (WH and HVAC). In both cases, the consumption after DR actions follow the target, to match its energy consumption with the 24-h-ahead predicted energy consumption for each load. Thus, it is fair to say that the DR control was working efficiently.

**Figure 15.** Differences between 24-h-ahead and 1-h-ahead forecasts (day 14): (**a**) PV generation; (**b**) Net consumption.

**Figure 16.** Load demand variations among 24-h forecast, 1-h forecasts and final load consumption after DR (day 14): (**a**) WH; (**b**) HVAC.

As mentioned before, the increase in the energy consumption of both WH and HVAC was used to rise the temperature of the water inside the tank (in the case of WHs) and the temperature inside the rooms (HVAC). In the case of the WHs (Figure 17a), the temperature increases in the cold sub-tank from 49 to 59 ◦C and in the hot sub-tank from 58 to 63 ◦C (always ensuring that, for health security issues, the temperature is not above 68 ◦C in any WH), while in the HVACs (Figure 17b), the internal temperature of the rooms increased from 17 to 20.5 ◦C, whereas the maximum ambient temperature (external) was 11.5 ◦C.

Figure 18 depicts the 24-h-ahead forecasts for the net energy consumption. The final net energy consumption profile after the application of DR strategies is also shown, as is its comparison with the net energy that will be consumed if no DR action is taken. The graph shows that DR actions reduced the differences between the 24-h forecasts and final net consumption, minimizing the necessity of selling back to Electricity Markets the PV generation surplus or to pay additional charges (or penalties) for unbalance in markets.

Table 9 presents some numerical results from this example. All variables were calculated only during the control period (8 a.m.–9 p.m.). Results show that the net energy consumption with DR strategies increased to 49.43 MWh compared with the 47.23 MWh consumed without the application of DR, and reached 50.13 MWh for 24-h forecasts. The variations between forecasts and final net consumption were reduced from 4.13 MWh to 1.79 MWh, reducing the percentage of error by 57% (from 8.25 to 3.58%). The maximum power peak difference was also reduced by 44%.

**Figure 17.** Loads' temperature profile after DR actions (day 14): (**a**) WH (state variables X1 and X2, Figure 4b); (**b**) HVAC (state variables Xi, Xw, Xrg, and input Xext, Figure 4a).

**Figure 18.** Net consumption profiles (day 14): 24-h forecasts, after DR actions and without DR.

Table 10 shows overall results for net energy consumption of a representative part of the 31 days in Cluster 1. Day 16 (Figure 19a) obtained the greatest reduction in the error (from 31.63 to 6.7%), whereas day 28 (Figure 19b) was the one in which the percentage of error increased most (from 6.55 to 9.99%). Notice that this rise in the error was not due to DR actions, but because of the lack of accuracy in the prediction of customers' load profile of that day (day 28), because only deviations of PV generations were corrected by means of DR (see also Figure 9).



**Figure 19.** Net consumption profiles: 24-h forecasts, after DR and without DR: (**a**) day 16; (**b**) day 28.

Finally, Figure 20 presents the cumulative absolute error for the 31 days in Cluster 1 when comparing the target (the 24-h-ahead forecast) with the final net consumption in two different scenarios (without DR and after DR actions). It can be seen that, in general, errors were reduced after DR, except in some particular days where they increased slightly. Notice that the set of days selected in Tables 10 and 11 (among the 31 days belonging to Cluster 1) represent different scenarios (low, medium, and high reduction of the errors as well as increasing of them after DR policies).

**Figure 20.** Cumulative absolute error (CAE) between the final net consumption (without DR and after DR actions) and the target (the 24 h-ahead forecast of the net consumption).


**Table 11.** DR performance and flexibility indicators.

#### 3.4.3. Analysis of DR Flexibility

As mentioned in Section 2.1, a quantitative analysis for demand-side flexibility has been performed thorough some indicators defined and calculated at an aggregated level.

*Energies* **2020**, *13*, 11

The first indicator of flexibility refers to signals' dynamic. This is done through an indicator that gives an idea about the variation of a signal and it is very close to the "mileage" score used for the verification of assets in Ancillary Services. In this way, the "signal\_mileage" is the absolute sum of movement of the analyzed signal in a given time period with respect to the average value of the signal, in our case daily:

$$\text{signal\\_mileage} = \frac{\sum\_{\text{kk}=\text{ini}+1}^{\text{end}} \text{abs}(\text{signal}(\text{kk}) - \text{signal}(\text{kk}-1))}{\sum\_{\text{kk}=\text{ini}+1}^{\text{end}} \text{signal}(\text{kk})} \tag{10}$$

where signal refers to the target foreseen for balancing PV generation through load (variable "balance") or the load demand (with DR or without DR, i.e., the variable "baseline" of demand or the new demand with DR, in this case the variable "demand\_DR"). These indicators give, respectively, an idea of the variability of demand (hourly, daily,...) for the segment under study and the "amount of work" involved through "balance" signals to match PV/demand forecast errors.

A second indicator is the "mileage\_ratio", which measures the relation of the value of mileage of the balance signal sent to flexible demand versus the value of changes in demand in the steady state (without DR). This indicator gives the aggregator a first insight with respect the effort that demand is forced to yield to match PV generation in the short-term:

$$\text{mileage\\_ratio} = \frac{\text{balance\\_mileage}}{\text{baseline\\_mileage}}\tag{11}$$

The third indicator represents the symmetry of the effort required from demand to follow the energy balance signal (i.e., the overall increase of flexible demand versus the reduction of demand). As has been discussed in previous paragraphs, in some cases, it is more difficult for the load to increase in demand than achieve a reduction in demand (for instance, electric heating in winter). For these reasons, the positive changes in demand were evaluated with respect to negative changes of demand. Mathematically:

$$\text{symmetry} = \frac{\sum\_{\text{kk}=\text{inh}}^{\text{end}} \text{abs}((\text{demand\\_DR(kk)} - \text{baseline}(\text{kk})) > 0)}{\sum\_{\text{kk}=\text{inh}}^{\text{end}} \text{abs}((\text{demand\\_DR(kk)} - \text{baseline}(\text{kk})) < 0)} \tag{12}$$

Finally, the aggregator calculated a daily performance score that reflects the load resource's accuracy in increasing or decreasing its demand to provide balance in response to balance dispatch signal. The performance score calculation evaluates each resource's accuracy in following the balance signal, that is to say:

$$\text{performance} = \text{abs}\left(1 - \frac{\sum\_{\text{kk}=\text{ini}}^{\text{end}} \text{abs}(\text{demand\\_DR}(\text{kk}) - \text{baseline}(\text{kk}))}{\sum\_{\text{kk}=\text{ini}}^{\text{end}} \text{abs}(\text{balance}(\text{kk}) - \text{baseline}(\text{kk}))}\right) \tag{13}$$

These indicators have been evaluated through and Table 11 presents the main results.

A brief explanation of the results can help the reader to better understand the physical meaning of these indicators. It is also interesting to consider the results of Section 3.4.2 for days 14 and 23. Table 11 shows that the days 14 and 23 require a noticeable effort from flexible demand (3.12 and 4.14 for "balance\_mileage" values that are above the average effort). Steady state fluctuations of demand are similar for both days (5.72 and 5.28, see demand mileage column). This index (mileage\_demand) is of interest to reflect unusual changes of demand pattern.

Column five in Table 11 presents the symmetry of the effort. Day 14 requires a net increase of demand (symmetry = 7.06), whereas day 23 basically requires a strong shaving of demand (symmetry = 0.038). The score for the performance of flexible demand depicts that flexible loads follow with enough accuracy their targets (performance indicator is around zero). Moreover, from Table 11, the aggregator can deduce that flexible demand fails in days 5 and 28 (performance has the greatest values and over 1), but these days do not represent a big problem, because the effort required from demand (balance\_mileage) is low (1.52 and 2.08). These results and the results previously discussed in Section 3.4.2 help both the customer and the aggregator to familiarize themselves with the demand response and the usefulness of short-term predictions to manage their new role as prosumers or energy aggregators.

#### **4. Conclusions**

Energy issues are a main concern for the sustainability of our society. This sustainability is based on the integration of renewable sources and in the development of new energy markets, which should be more customer-centered than in the past. These objectives need the development and validation of additional tools to facilitate this change and to contribute to the effective engagement of customers in these markets, as has happened in telecommunications markets. Technological aspects, such as the forecast of demand and renewable sources or the management of energy, appeared as significant barriers to the effective deployment of new markets in small and medium customer segments, i.e., benefits usually do not balance the complexity for new responsibilities and tasks in these scenarios. Moreover, forecasts get more complex when the level of assets' aggregation decreases, and this makes the above-mentioned objectives more difficult. For these reasons, this work developed and validated both demand and renewable generation forecasting methods at low aggregation levels (in the order of some MW) of the power system (distribution), but focused the methodological effort on the application of these methods to demonstrate the possibility of participation of "prosumers" in markets rather than in the achievement of small improvements in forecasts accuracy (MAPE, RMSE, CAE) through exponential complexity.

The interaction of demand, generation and management models demonstrated that this feedback or linkage among models can balance errors through a "closed control loop" that drove the net demand of "prosumers". In the analyzed scenarios, results showed that 50% of prediction errors can be balanced with naïf correction models (very short term) and a "reduced" portfolio of loads (HVAC and WH) and policies. The paper also demonstrated the ability of these small and medium customers, through demand aggregators, to exhibit in the market the necessary flexibility in demand (up and down) to manage the volatility of renewables and build new power systems and new markets in the horizon 2030–2050.

Further developments are necessary to advance in this approach, for instance: the consideration of the participation of customers in new markets and more complex services (mixed participation in two or more markets or services), the refinement of very short-term models (both in demand and generation), the introduction and synthesis of new end-use PBLM and their further aggregation, the integration and deployment of ICTs in the models and the validation, the hybridization of ESS and demand models, both following PBLM philosophy, to provide more capabilities for flexibility, and the adjustment and improvement of these models in actual customers through pilots. In the medium term, and with these tools, the potential flexibility of these small and medium customer segments could be exploited and used to balance the integration of renewable both in Smart Grids and in conventional Power Systems.

**Author Contributions:** L.A.F.-J., M.C.R.-A., A.G. (Antonio Guillamón) and A.G. (Antonio Gabaldón) conceived and designed the experiments and structure of the paper. A.G.-G. and A.G. (Antonio Gabaldón) programmed the algorithms and developed and wrote the part concerning HVAC and WH modeling, the technical evaluation of prosumers, and the simulation and evaluation of DR targets. M.C.R. and A.G. (Antonio Guillamón) managed the customer database of demand/end-uses and supervised some of the mathematic algorithms for short-term load models. L.A.F.-J. and A.F. managed PV generation database and defined and supervised the mathematic algorithms for short-term PV generation models. All authors have approved the final manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the supported by the AEI/10.13039/501100011033, (Ministerio de Ciencia, Innovacióna y Universidades, Spanish Government) under research projects ENE-2016-78509-C3-2-P, ENE-2016-78509-C3-3-P, RED2018-102618-T, and EU FEDER funds. The authors have also received funds from these grants for covering the costs to publish in open access.

**Acknowledgments:** This work was supported by the Ministerio de Economía, Industria y Competitividad (Spanish Government) under research projects ENE-2016-78509-C3-2-P, ENE-2016-78509-C3-3-P, RED2018-102618-T, the Ministerio de Educación (Spanish Government) under doctorate grant FPU17/02753, and the special support of EU FEDER funds.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
