**3. Probabilistic Load Forecasting Model Generation**

In this study, we based our proposed framework on the ensemble forecasting method. The proposed forecasting framework, as shown in Figure 1, involves the process of demand consumption data and weather information acquisition, data integrity risk reduction, forecast error compensation, point-forecast, and stochastic forecast estimation. The ensemble forecasting method is an integration of a series of homogenous parametric models as a random forecast model [24]. The set of parallel parametric models train and forecast load independently. In this study, the forecasting process is in two stages: point-forecast model generation and ensemble processes. The point-forecast generation process involves mechanisms in generating deterministic forecasts with point-forecast models. This process comprises parametric model selection techniques and data integrity risk reduction methods. With the ensemble process, the model scheme defines strategies for optimal forecast model selection, forecast error compensation, and stochastic forecasts. Since parametric model selection is not the primary consideration of this study, we introduce only three typical parametric models. In the last decade, several parametric models have been selected for demand load predictive modeling. Notable among them are the artificial neural network (ANN), K-means, and Bayesian [28–39] approaches. The choice of these models for demand load prediction is because of their acute sensitivity to sequential data prediction. Our proposed framework is adaptable to multiple parametric or meta-parametric models for the efficient prediction of sequential demand load.

**Figure 1.** Demand load forecast flowchart.

#### *3.1. Data Integrity Risk Reduction*

As part of the study, preprocessing analysis was performed on the dataset to increase the integrity of the dataset. We defined a risk reduction method as part of this study to detect outliers and also compensate for irregular or missing data. The first stage of the risk reduction method deals with data outlier detection. The acquired demand load data contains outliers because of measurement latencies, changes in systems behavior, or fault in measuring devices. In this study, boxplot and generalized extreme studentized deviate (ESD) algorithms were adopted to detect irregularities in the demand load data. These methods are common statistical techniques used to identify hidden patterns and multiple

outliers in data [40]. The technique depends on two parameters, α and *r*, being the probability of false outliers and the upper limit of expected number of potential outliers, respectively. As mentioned by Carey et al. [41], the maximum number of potential outliers in a dataset is defined by the inequality in Equation (1).

$$r < 0.5(n-1) \tag{1}$$

where *n* is the number of observations in the data set. With the upper bound of *r* defined, the generalized ESD test performs *r* separate tests: a test for one outlier, two outliers, and so forth up to *r* outliers.

$$X: \langle \mathbf{x}\_1, \mathbf{x}\_2, \mathbf{x}\_3, \dots, \mathbf{x}\_n \rangle \tag{2}$$

For each *r* separate test, we compute the test statistic, *Ri*, that maximizes -- *xi* − *x* -- for all observations as specified in Equation (3), where *x* and *s* denote the sample mean and the standard deviation, respectively. The observation *xi* that maximises *Ri* is then removed and *Ri* recomputed with (*n* − 1) observations. This process is repeated for all *r* with the outcome test statistics of *R*1,*R*2, ··· ,*Rr*.

$$R\_i = \frac{\max \| \mathbf{x}\_i - \overline{\mathbf{x}} \|}{s} \tag{3}$$

The critical value, λ*i*, is therefore computed for each *r* test statistic in Equation (4) with tail area probability, *p*, defined in Equation (5), where *tp*,*n*−*i*−<sup>1</sup> is the 100*p* percentage point from the *t*-distribution with (*n* − *i* − 1) degrees of freedom. The pair of test statistics and critical values are arranged in descending order. The number of outliers is therefore determined by finding the largest *i* such that *Ri* > λ*i*.

$$\lambda\_i = \frac{(n-1)t\_{p,n-i-1}}{\sqrt{(n-i-1+t\_{p,n-i-1}^2)(n-i+1)}}; \forall \ i \in r \tag{4}$$

$$p = 1 - \left[\frac{a}{2(n - i + 1)}\right] \tag{5}$$

Following the outlier detection is the data imputation method. Missing data as a result of outlier removal or sampling error are compensated for with the data imputation method. Using demand load profiles with missing data can introduce a substantial amount of bias, rendering the analysis of the data more arduous and inefficient. The imputation model defined in this study preserves the integrity of the load profiles by replacing missing data with an estimated value based on other available information. The proposed reduction model in this study is in three stages of imputation: listwise deletion, hot-deck, and Lagrange regression model.

For training analysis, from a dataset *X* with feature variables *M* and number of observations *N*, the listwise deletion method deletes a set of records with missing data values of any of the feature variables. The listwise deletion method produces a reduced dataset, *X*∗ , as defined in Equation (6).

$$X^\* = \left[X\_{1'}^\*, X\_{2'}^\* \cdots X\_j^\*\right]; \forall \ j \in N \tag{6}$$

$$X\_j^\* = X\_j \; : \exists \; l\_{i,j} \cap \exists \; l\_{i+1,j} \cap \dots \cap \exists \; l\_{i+m,j} = 1 \tag{7}$$

$$l\_{i,j} = \begin{cases} 1; & \exists \ x\_{i,j} \forall \ i \in M \\ 0; & \nexists x\_{i,j} \forall \ j \in N \end{cases} \tag{8}$$

This process does not cause bias, but decreases the power of the analysis by reducing the sufficient sample size. For continuous missing data values (i.e., >3, based on the sampling interval), the listwise method could introduce bias. Thus, for a single missing value, the hot-deck technique is adapted—the hot-deck procedure imputes missing values with previously available data of the same feature variable as defined in Equation (9). The hot-deck technique precedes the listwise deletion method.

$$X^\* = \left[X\_{1'}^\* \; X\_{2'}^\* \cdots X\_j^\*\right]; \; \forall \; j \in N \tag{9}$$

$$X\_j^\* = \left[ X\_{1,j'}^\* X\_{2,j'}^\* \cdots X\_{m,j}^\* \right]; \forall \ m \in M \tag{10}$$

$$\mathbf{x}\_{i,j}^\* = \begin{cases} \mathbf{x}\_{i,j}; & \exists \, \mathbf{x}\_{i,j} \forall \, i \in \mathcal{M} \\ \mathbf{x}\_{i,j-1}; & \nexists \mathbf{x}\_{i,j} \forall \, j \in \mathcal{N} \end{cases} \tag{11}$$

For multiple data imputation, an ensemble method made up of regression and Lagrange imputation was adapted. There are seasonal, cyclic, and trend patterns in a demand load dataset. With a week-ahead daily load profile, as depicted in Figure 2, the similarity in the patterns of the two load profiles of the same season and day type is evident. Thus, from the immediate prior week, daily load profile information could be used to estimate missing data values in the week-ahead daily load profile.

**Figure 2.** Week-ahead daily load profile.

To achieve this, we estimated the missing data values with a regression imputation method based on the observed values. However, the perfect prediction regression equation is impractical in some situations (e.g., with a limited number of observations). Hence, the standard error associated with the forecast can be reduced with the demand profile of the previous week. Thus, we estimated the deviation and the rate of deviation, *R*, of the previous demand load profile. From this, we defined a Lagrange imputation model with the load deviation rate as defined in Equation (12). The initial predicted missing values are modified with the Lagrange model, as shown in Figure 3.

$$\begin{array}{c} \mathcal{Y} = \frac{(R-R\_{2})(R-R\_{3})\cdots(R-R\_{n})}{(R\_{1}-R\_{2})(R\_{1}-R\_{3})\cdots(R\_{1}-R\_{n})} \mathcal{Y}\_{1} +\\ \frac{(R-R\_{1})(R-R\_{3})\cdots(R-R\_{n})}{(R\_{2}-R\_{1})(R\_{2}-R\_{3})\cdots(R\_{2}-R\_{n})} \mathcal{Y}\_{2} + \cdots + \frac{(R-R\_{1})(R-R\_{2})\cdots(R-R\_{n-1})}{(R\_{n}-R\_{1})(R\_{n}-R\_{2})\cdots(R\_{n}-R\_{n-1})} \mathcal{Y}\_{n} \end{array} \tag{12}$$

**Figure 3.** Estimated missing data values.

The estimated error margin for each continuous range of imputation is (MAPE: 0.22, MAE: 7.14) and (MAPE: 0.08, MAE: 8.24) for the first and second ranges, respectively. With the estimated margin errors, it is evident that the reduction method could be used to compensate for data irregularities.
