1. Introduction
Electricity load forecasting has been attracting research and industry attention due to its important role in the energy management system, which is the basis for energy production and distribution and supply, as well as an important component of intelligent power systems. Accurate short-term electricity load forecasting not only helps to promote energy saving and emission reduction projects in the power system, but also helps to operate the power system reliably and safely. According to the time scale, electricity load forecasting can be divided into ultra-short-term electricity load forecasting, short-term electricity load forecasting, medium-term electricity load forecasting, and long-term electricity load forecasting. Among them, short-term load forecasting predicts the electricity load values for the next few hours or days [
1], and the forecast’s results are used as a basis for planning the mobilization of power system units. The volatility of the electricity load affects the optimal dispatch of the power system, and with the large-scale grid integration of distributed energy sources [
2], the difficulty of forecasting short-term electricity load accurately has increased further due to the dramatic increase in volatility and nonlinearity in the electricity load data.
In recent years, deep learning methods have achieved great success in the field of short-term electricity load forecasting because of their ability to model complex systems [
3,
4,
5]. However, there is still a large gap in the application of traditional offline deep learning methods to the industry. Traditional offline deep learning models all historical data, that is all data should be available during training. For smart meters [
6], only a part of the data is available in the early stage of training, and the data will be delivered continuously, which requires the traditional offline model to retrain the model by combining historical data and newly arrived electricity load data. Obviously, this is not realistic; it requires much wasted computing resource. However, the distribution pattern of smart meter data may change due to the addition of new electrical equipment or changes in the consumption patterns of residents, that is concept drift occurs [
7], which will lead to the degradation or even failure of the prediction performance of existing offline models. Therefore, in the electricity load forecasting task, we need to continuously learn new data generated by the electricity meter without retraining the entire dataset. Therefore, researchers have turned their attention to online learning [
8]. This learning mode, which only uses new data to update the model in the process of prediction, greatly reduces the computational burden of the power system and has higher prediction accuracy than traditional offline learning models.
However, most of the current online learning models use the newly arrived electricity load data for regular or quantitative model update [
9]. At this time, if the new data and historical data have the same or similar data patterns, there will be invalid updates, which does not only affect the computing resources. Real-time response puts forward higher requirements and has little effect on the improvement of prediction accuracy. Therefore, it is necessary to design a more reasonable online prediction model based on the actual changes of the data.
Therefore, in this paper, an incremental ensemble short-term electricity load prediction model based on sample domain adaptation is proposed, which effectively addresses the above problem using two-stage concept drift detection [
10,
11] and transfer learning based on sample domain adaptation. In summary, the main contributions of this paper are as follows:
We propose to combine the significance of change in the distribution of current electricity load data and change in model prediction performance to detect concept drift.
We design the cumulative-weighted-sampling-based Tradaboost (AWS-Tradaboost) for building the new base model, which solves the problem of inadequate training of the model due to insufficient concept drift samples.
We develop a novel strategy for updating the weights of the ensemble model.
The rest of this paper is organized as follows.
Section 2 is an overview of related work on short-term electricity load prediction.
Section 3 describes the proposed AWS-DAIE in detail.
Section 4 presents the experiments and corresponding results. Conclusions are presented in
Section 5.
2. Related Work
Researchers at home and abroad have proposed a large number of short-term load forecasting methods and theories. The existing research is mainly divided into two categories: traditional forecasting methods and artificial intelligence forecasting methods.
Traditional short-term electricity load forecasting methods include the grey model method, fuzzy forecasting method, time series analysis method, and so on. Zhao et al. [
12] proposed to optimize the parameters of the grey model GM(1,1) by using the ant colony optimization algorithm and introduced a rolling mechanism to improve the accuracy of electricity load forecasting. Mamlook et al. [
13] explored the effect of different parameters including weather, time, and historical data with random perturbations on load forecasting using the priority and importance of fuzzy sets and used fuzzy control logic to reduce forecasting errors. Common time series analysis methods include autoregressive moving average (ARMA), differential autoregressive moving average (ARIMA), generalized autoregressive conditional heteroskedasticity (GARCH), and so on. Reference [
14] argued that there are different levels of noise disturbances in ARIMA forecasting short-term electricity loads, which require re-identification of the model before estimating parameters for the forecasting task, and that the model is able to determine the limit level of noise that the model can tolerate before crashing.
Artificial intelligence methods have emerged in recent years and have been widely used in short-term electricity load forecasting in power systems due to their powerful ability to model complex relationships and adaptive self-learning capabilities. Typical ones include support vector regression (SVR), long short-term memory networks (LSTM), gated recurrent units (GRU), time series convolutional networks (TCNs), and so on. Han et al. [
15] extracted meteorological features affecting wind and PV power generation using nonlinear effects and trend correlation measures of the copula function and modeled wind and PV power generation based on LSTM, which is capable of medium- and long-term wind/PV power generation forecasting using limited data samples. Jung et al. [
16] used the attention-based gated recurrent unit (Attention-GRU) to model electrical loads in order to tap more key variables in short-term load forecasting tasks and experimentally demonstrated that the prediction performance of the model can be significantly improved when the inputs are long sequences. Gong et al. [
17] determined the order sequence of the model in the electricity load by periodically analyzing the correlation of the electricity load data and the fluctuation characteristics of the customer electricity load data. The Seq2seq model is adjusted by combining the residual mechanism (Residual) and two attention mechanisms (Attention) to achieve better results in predicting the actual electricity load data in a certain place.
The combination of models is a new trend in the field of electricity load forecasting, and common combinations include stacking of models and ensemble learning. The stacking of models can fully utilize the advantages of each model to improve the accuracy of load forecasting. Guo et al. [
18] constructed historical electricity loads, real-time electricity prices, and weather as the inputs of the model in the form of continuous feature maps. CNN was used to cascade shallower and deeper features at four different scales to fully explore the potential relationships between continuous and discontinuous data in the feature maps. The feature vectors at different scales were fused as the inputs of the LSTM network, and the LSTM neural network was used for short-term electricity load forecasting. Ensemble learning is a new paradigm that combines multiple models to improve the prediction results and can obtain better prediction results than an individual model. Wang et al. [
19] used clustering to divide historical electricity load data into multiple partitions, trained multiple LSTM models for different partitions, and finally, used FCC models to achieve the fusion of multiple LSTMs. In this work, the authors used the improved Levenberg–Marquardt (LM) algorithm to train the FCC models to achieve fast and stable convergence. Electricity load forecasting based on decomposition preprocessing has been a hot topic in recent years. Khairalla et al. [
20] proposed a flexible heterogeneous integration model that integrates support vector regression (SVR), back propagation neural network (BPNN), and linear regression (LR) learners. The integration model consists of four phases: generation, pruning, integration, and prediction. Li et al. [
21] used ICEEMDAN to decompose complex raw electricity load data into simple components and aggregated the final prediction results after forecasting each component individually using a multi-kernel extreme learning machine (MKELM) [
22] optimized by grey wolf optimization (GWO). Lv et al. [
23] proposed a hybrid model with the elimination of seasonal factors and error correction based on variational modal decomposition (VMD) and long short-term memory networks in response to the dramatic changes that occur in short-term electricity loads due to various factors, which further complicate the data.
Tang et al. [
24] used the K-means algorithm to cluster the electricity load, grouped similar data into the same cluster, decomposed the electricity load into several components using ensemble empirical modal decomposition (EEMD), and selected candidate features by calculating Pearson correlation coefficients to construct the prediction input. This paper selected the deep belief network (DBN) and bidirectional recurrent neural network (Bi-RNN) as the prediction models.
The models mentioned above all belong to the category of offline learning, and in order to learn from new data, researchers have started to research networks and frameworks based on online learning and incremental learning. Von Krannichfeldt et al. [
25] advocated combining batch processing and online learning to provide a novel online ensemble learning approach for electricity load forecasting by implementing an improved passive aggressive regression (PAR) model to integrate online into the forecasting results of the batch model to ensure the adaptability of online applications. Álvarez et al. [
26] developed online learning techniques for APLF for recursive updating of hidden Markov model parameters and then used the Markov model to model and quantify the uncertainty in forecasting of electricity load data. However, the regular or quantitative update model does not fully consider the actual change of data and requires high real-time response for computing resources, and when the electricity load is in a relatively stationary state, there may be a large number of invalid updates, while also causing a large amount of valuable computing resources to be wasted.
In this paper, we performed concept drift detection in terms of data blocks and adapted to concept drift through domain incremental learning and dynamic adjustment of model weights. When different batch data no longer conform to the static homogeneous distribution assumption, domain incremental learning is able to avoid catastrophic forgetting of historical knowledge while updating the model using only newly arrived data [
27]. Concept drift describes the unforeseen shifts in the underlying distribution of streaming data over time [
28]. Concept drift detection techniques have emerged in recent years [
29,
30,
31]; some of them detect concept drift based on the error rate of the model; some detect concept drift using changes in the data distribution; some detect concept drift based on hypothesis testing. To solve the problem that the model cannot be adequately trained due to few concept drift samples, this paper trained the base model by improved transfer learning based on sample domain adaptation [
32], which makes the distribution of drift samples approximate the data distribution of that historical data block by adjusting the data weights of the historical and current data blocks and generates the base model based on concept drift samples in the process of adjusting the weights.
3. Research Methodology
In this paper, an incremental learning short-term electricity load forecasting model based on sample domain adaptation is proposed. First, two-stage concept drift detection (DSCD) was performed on the current mini-batch samples, and then, we trained a new base model using the sample domain adaptive transfer based on cumulative weighted sampling Tradaboost (AWS-Tradaboost) to solve the problem that the model cannot be fully trained due to few concept drift samples. Finally, we propose a novel incremental ensemble weight updating strategy to construct the final short-term electricity load forecasting model. The processing of the model is shown in
Figure 1. The dataset used in this paper was the PRECON dataset [
33], and the detailed description of the dataset will be given in the Research Results.
3.1. Two-Stage Concept Drift Detection
The AWS-DAIE algorithm proposed in this paper only performs incremental updates for concept drift samples, so we need to introduce the two-stage concept drift detection algorithm (DSCD) that we designed. As the name implies, the DSCD algorithm detects concept drift in two stages. The first stage of concept drift detection monitors changes in model performance to determine the degree of adaptation of the current model to new arrivals, while the second stage determines whether concept drift has occurred by checking whether there is a significant difference between the distribution of the current data block and the historical data.
In the first stage of concept drift detection, for each current sample
i, its corresponding sliding window
contains the samples from
to
i. Calculate and compare the absolute value of the prediction error
of
and the absolute value of the prediction error
of
(record the current window and the last full window, respectively); if the absolute value of the prediction error of the window corresponding to sample
i,
, is greater than the absolute value of the prediction error of the window corresponding to a time sample
,
, (
), then the current state will be set to drift warning and start collecting the current samples using the adaptive window
. If the data in the adaptive window
do not reach the preset data amount, the absolute value of the error corresponding to the window corresponding to
m consecutive samples is less than
a times the absolute value of the prediction error corresponding to the first sample of the adaptive window, and the drift warning state is released. Clear the data in the adaptive window, and start the next round of concept drift detection. If the amount of data in the adaptive window reaches a preset size, start the second stage of concept drift detection. In this paper, we refer to the literature [
11] to determine the parameters and select the set of parameter values that make the smallest prediction variance based on a large number of experiments as follows:
a = 1.08,
m = 5,
n = 40.
We collected a certain amount of data under the drift warning state after the first stage of detection, but there may be cases where the data are flagged as a concept drift warning due to noise-induced fluctuations or the model itself is not sufficiently trained. Then, we need to check whether these data are significantly different in distribution from the historical electricity load data, which is the second stage of the concept drift detection. We need to introduce a theorem before introducing the second stage of concept drift detection, namely the paired
t-test: we let
S and
T be two independent time series sample, let
and
denote their sample means values, respectively, and let
and
denote the variances of samples S and T, respectively, given the original hypothesis of
at the confidence level
, when the statistic t follows the following distribution.
Among them, denotes the degrees of freedom of the distribution. The statistic t lies in the rejection domain at the confidence level when , at which point, the original hypothesis is rejected and accepted; at this time, the two samples S and T are judged to be significantly different at the confidence level ; the above theory is the criterion for the second stage of concept drift detection. Firstly, historical electricity load data are divided into m blocks of equal size, so that the training data are represented as , and each block is used to train the base model. Then, we performed down-sampling on these data blocks to obtain ; the size is denote as . Secondly, the paired t-test was performed on the data in the adaptive window and . When the results of the paired t-test between the data in the adaptive window and at confidence level fall into the rejection domain, the current data block is considered to have concept drift. On the contrary, it is considered to be a pseudo-concept drift caused by noise fluctuation or insufficient training of the model. Then, the data in the adaptive window are used to fine-tune the base model with worst prediction performance among ensemble learning, and clear the data in the adaptive window , and continue to monitor the changes of the model prediction performance. Through the above two stages, the concept drift samples is used to train a new base predictor when the true concept drift is detected and is used in the subsequent incremental learning process. The overall process of the two-stage concept drift detection algorithm is shown in Algorithm 1.
Algorithm 1 DSCD. |
Input:D: newly arrived data; t: detecting window size; a: concept drift warning threshold; m: threshold number of released concept drift warning; : adaptive window; n: adaptive window size; : downsampling of historical data Output: concept drift sample
- 1:
= NULL; - 2:
= None; count = 0; - 3:
number = 0; - 4:
for sample ∈D do - 5:
: sliding window with the last t samples of ; - 6:
= AbsoluteError(); - 7:
= AbsoluteError(); - 8:
if state==None and then - 9:
= U ; - 10:
state = warning; - 11:
count = count + 1 - 12:
else - 13:
break; - 14:
end if - 15:
if state==warning then - 16:
if count < n then - 17:
ifthen - 18:
count = count + 1; - 19:
else number==m - 20:
state = NULL; - 21:
number = 0; - 22:
number = number + 1 - 23:
end if - 24:
end if - 25:
else - 26:
state = driftwarning - 27:
end if - 28:
if state = driftwarning then - 29:
Compute sample mean and variance of and denote as , - 30:
and , ; - 31:
compute double-sided statistics ; - 32:
ifthen - 33:
return concept drift sample ; - 34:
state = None; - 35:
else - 36:
state = None; - 37:
end if - 38:
end if - 39:
end for
|
3.2. AWS-Tradaboost
After detecting concept drift in the previous section, we need to use the concept drift samples to train a new base model. However, the based model may not be adequately trained due to few concept drift samples. This paper used sample domain transfer learning to solve this problem. Tradaboost is a classical sample domain transfer learning algorithm, which continuously reduces the impact of bad data by iterating data weights. During iteration, when the prediction error of a sample in the source domain is large, the influence of the sample on the new base model in the next iteration is weakened by reducing the weight of the sample. It is considered that the training of a sample is insufficient when the model has a large prediction error for this sample in the target domain, and the weight of this sample needs to be increased to better train this sample in the next iteration. The model can better fit the target domain data, and the samples in the source domain data that are closer to the target domain distribution will obtain higher weights through several iteration. However, this algorithm only updates the sample weights according to the current iteration, ignoring the impact of historical iterations on the construction of the current base model. In this paper, we propose an accumulation weight sampling (AWS) method to select the samples with the largest cumulative contribution to the base model training during the iteration process for the next iteration. The current iteration is denoted as c, and we need a two-dimensional list of the weights W of each sample in the historical iterations, so that W is denoted as , where m is the number of samples in the source domain. The algorithm is described in Algorithm 2.
Algorithm 2 AWS. |
Input: weight vector W, source domain sample () Output: the specified number of samples with the largest cumulative contribution
- 1:
Step 1: Calculation of cumulative weighted contribution: - 2:
- 3:
where is expressed as follows: - 4:
- 5:
Forgetting factor: The further away from the current - 6:
iteration, the smaller the contribution to the current base - 7:
model - 8:
Step 2: The k samples with the largest cumulative - 9:
contribution are selected and recorded as
|
We need to calculate the similarity between the current drift sample and each historical data block using dynamic time warping (DTW), and the index corresponding to the historical data block with the highest similarity is recorded as , then return the historical data block corresponding to the index and the base model . We took this historical data block as the source domain dataset Tsource = Dindex; the sample size is denoted as S; the concept drift sample was taken as the target domain dataset ; the data size is T. In Algorithm 3, we first complete a series of initialization operations including: merging the source and target data to form a new training dataset, setting the maximum number of iterations N, and initializing the weights of the merged dataset and the weights of the learning machine . Then, the samples with the largest cumulative contribution to the base predictor training are selected using the AWS algorithm for building the training set of the predictor. Then, calculate the prediction errors of the trained model in the source and target domains, and finally, the weights of the source and target domains are updated in each round of iteration. If the prediction error of the sample in the source domain is larger, it means that the sample is less relevant to the target domain, and the weight of the sample needs to be reduced to reduce its influence. Conversely, if the sample in the target domain has a large prediction error, we need to reduce its weight to improve his impact in the next round, and the final predictor is returned when the number of iterations reaches the maximum N. Therefore, the specific steps of Tradaboost based on cumulative weighted sampling are shown in Algorithm 3.
Algorithm 3 AWS-Tradaboost. |
Input:: original domain sample; : original domain sample; : the index of Tsource; N: the maximum number of iterations; Output: new base model Mnew
- 1:
Initialize: The initial weight vector: - 2:
- 3:
where - 4:
- 5:
fort = 1, 2,..., N do - 6:
- 7:
- 8:
calculate error on , : - 9:
- 10:
- 11:
ifi < S then - 12:
- 13:
else - 14:
- 15:
end if - 16:
update cumulative weight: - 17:
- 18:
end for
|
3.3. Ensemble Incremental Models
In this paper, a model buffer was designed to store all the base models corresponding to the current prediction task, and we chose the temporal convolutional network (TCN) as the base model.
The weights of each base model are the most critical factor affecting the prediction performance of ensemble model. Hence, we designed a novel weight update strategy in this paper, which enables the model to adapt to the current electricity load data. We describe our model weight update scheme in detail.
When concept drift occurred, we collected concept drift samples and trained a new base predictor using the AWS-Tradaboost algorithm in
Section 3.2 and then added the newly trained base predictor to the model buffer pool. The following scheme was designed to update the model weights of each base predictor:
(1) Take out the conceptual drift sample of the adaptive window in
Section 3.1 denoted as
;
denotes the sample size of the drift sample.
(2) Update and normalize the data weights of the conceptually drifted samples. Predict
D using the ensemble model prior to the addition of the new base predictor while calculating the relative error of the prediction results, denoted as
, where
is denoted as follows:
where
represents the predicted value of the ensemble model before adding a new base predictor.
t represents the number of base models of the ensemble model.
We first calculate the weights of the concept drift samples by using the following equation.
denotes the sample size of the concept drift sample. By endowing the concept drift sample, data weights are used to balance the prediction error of the ensemble model after adding the new base predictor. The data weights are normalized:
(3) Construct new concept drift samples
by assigning the above weight.
(4) Evaluate the prediction error of all base predictors
in the model buffer pool on
:
(5) Then, the regression error rate and the regularization error rate are calculated as follows.
(6) Finally, the weights of each base model are obtained as follows:
5. Conclusions
Deep learning has been widely used in the field of short-term electricity load forecasting, but these batch offline models cannot accommodate the concept drift that exists in electricity load data, while the prediction accuracy of the models is subsequently reduced. Regular quantitative update models can adapt to the concept drift to some extent, but there are a large number of invalid updates, which cannot meet the power system’s real-time response needs.
The incrementally ensemble short-term electricity load forecasting model based on sample domain adaptation proposed in this paper can effectively solve the above problems. The model is able to update the ensemble model incrementally only after detecting the concept drift of the current electricity load data. Meanwhile, to address the problem that the base predictor cannot be adequately trained due to the few concept drift samples, this paper fully considered the contribution of historical iterations to the construction of the current base predictor and designed a Tradaboost based on cumulative weighted sampling to better construct the new base predictor. The electricity loads of four households from the PRECON dataset were evaluated, and the proposed algorithm achieved higher prediction accuracy than some current classical offline models, online models, and incremental learning models, which can effectively capture the trend of electricity load and better meet the needs of electricity power systems.
Our research considered concept drift in electricity load forecasting, but did not quantify the extent to which concept drift affects the prediction results of the model. In future work, we will deeply explore and quantify the impact of concept drift on model prediction.