1. Introduction
Accurate load forecasting provides a basis for power system construction planning, dispatching the decision making and production planning of power generation enterprises [
1]. In recent years, with the continuous increase in the scale of new energy grid connection and the increasing popularity of electric vehicles, the power load presents volatility, nonlinearity and randomness, and the difficulty of power grid regulation increases, which puts forward higher requirements for the accuracy of load forecasting.
At present, load forecasting mainly includes statistical analysis methods and artificial intelligence methods [
2,
3]. Statistical analysis methods mainly include multiple linear regression modeling, autoregressive summation moving average and exponential smoothing [
4,
5]. Statistical analysis methods are mainly used to deal with linear and stable load data, ignoring the influence of climate, date type and other factors on load prediction, and the accuracy of load prediction is poor.
With the rise of artificial intelligence, machine learning and deep learning have been widely used in power grid fault identification and load forecasting because of their strong nonlinear fitting ability [
6,
7]. In [
8], based on the decision tree classifier (DTC), an enhanced DTC is designed to obtain better prediction accuracy, but the machine learning algorithm often ignores the time-series dependence, and the prediction effect of long-time series is not as good as that of the deep learning algorithm. The deep learning algorithm has unique advantages in the field of load forecasting with its strong time series learning ability. An LSTM network and a GRU network improved with a recurrent neural network can effectively deal with long, high-dimensional time series [
9]. Reference [
10] proposed convolutional neural networks (CNN) and long short-term memory network (LSTM) fusion network model, which significantly improved the prediction efficiency and accuracy of individual household electric load by using the powerful feature extraction ability of CNN. Reference [
11] constructed a load-forecasting model based on GRU network that effectively improved the forecasting accuracy compared with a LSTM network. Reference [
12] proposed an integrated load forecasting model based on CNN and LSTM that significantly improved the forecasting efficiency and accuracy of multidimensional characteristic loads. In order to improve the prediction accuracy of power loads in different time ranges, [
13] proposed a fusion model of a long-short-term memory network and neural prophet (LSTM-NP). Simulation results show that: compared with traditional prediction methods, LSTM-NP improves the prediction accuracy of three different types of load forecasting. The research in the above literature has achieved good prediction accuracy when dealing with high-dimensional long time series, but most of them are based on the premise of accurate load history data, and the impact of abnormal data on prediction accuracy is not fully considered. Abnormal data destroy the original distribution of the data set and also cause insufficient data redundancy, which hinders the improvement of load forecasting accuracy. Therefore, it is necessary to process the abnormal data and restore the initial data distribution of the historical data set.
Abnormal data processing methods mainly include deletion and filling [
14,
15]. The deletion method is to directly delete the abnormal data and their associated data, which is simple and easy to perform. However, when the data are abnormal in a large area, it may lead to the loss of important information [
16]. Filling methods can be divided into two categories: statistical methods and machine learning methods [
17,
18,
19]. Statistical methods mainly include mean filling, nearest distance filling, and regression filling. The filling results obtained by these methods are relatively stable, but they are easily affected by other types of data, and the filling accuracy is poor. Machine learning mainly includes K-nearest neighbor filling, missing forest, and K-means clustering filling. Reference [
20] introduced linear interpolation, matrix combination, and matrix transfer to improve the random forest; the simulation results show that the improved random forest algorithm has high filling accuracy when filling the power missing data, but the machine learning easily ignores the timing information, and the filling accuracy is low.
The generative adversarial network (GAN) is an unsupervised generative learning model that has been widely used in many areas such as image generation and data filling [
21]. On the basis of GAN, reference [
22] proposed self-attention based on time-series impaction networks, and it improved the filling accuracy of the missing data. In order to improve the reliability evaluation of transmission gears with insufficient data and imbalance, [
23] proposed a conditional generative adversarial network-mean-covariance balancing labeling (CGAN-MBL) model: On the basis of constructing CGAN model, MBL was introduced to improve the authenticity of the CGAN-generated data. The above method could effectively reconstruct the missing data, but it did not fully consider the correlations between other features and missing values. Moreover, JS divergence and Wasserstein distance are mostly used as loss functions, and gradient disappearance is easy to occur in network training.
Most of the traditional modeling methods are based on fixed load data. When new samples are added to the historical data set, it is often necessary to remodel and train the new data set. With increasing amounts of training data, it not only consumes a great deal of training time but also leads to the disappearance of gradient and underfitting phenomenon, which affects the accuracy and efficiency of load forecasting. Therefore, in order to improve the accuracy and efficiency of ultra-short-term load forecasting under abnormal data, the following two problems need to be solved: (1) how to discover the hidden nonlinear relationship between abnormal data and other characteristic data, improve the authenticity of data reconstruction, and restore the integrity of load series; (2) when new samples are added to the load history data set, how to realize the incremental training of the model and improve the efficiency of ultra-short-term load forecasting.
Based on the in-depth analysis of the above literature, this paper proposes an ultra-short-term load dynamic forecasting method based on model incremental training and considering abnormal data reconstruction. Firstly, aiming at the abnormal data in ultra-short-term load forecasting, a load abnormal data processing method based on IF-CGAN is proposed. The isolation forest algorithm is used to accurately eliminate the abnormal data points, and the condition generation countermeasure network (CGAN) is constructed to interpolate the abnormal points. The load-influencing factors are taken as the condition constraints of CGAN, and the weighted loss function is introduced to improve the reconstruction accuracy of abnormal data. Secondly, aiming at the problem of low model training efficiency caused by the new samples in the historical data set, a model incremental training method based on Bi-LSTM is proposed. The historical data are used to train the Bi-LSTM, and transfer learning is introduced to process the incremental data set to realize the adaptive and rapid adjustment of the model weight and improve the model training efficiency. Finally, the simulation analysis is carried out with the real power grid load data in a certain area. The calculation results show that the proposed method can reconstruct the abnormal data more accurately and improve the accuracy and efficiency of ultra-short-term load forecasting.
3. Dynamic Forecasting of Ultra-Short-Term Load Based on Incremental Model Training
When the load data set changes, in order to obtain the latest load information, it is often necessary to re model and train the new data set, and the modeling method is inefficient. Based on the establishment of the Bi-LSTM model, this chapter introduces migration learning to process incremental data sets to realize the rapid adjustment of model weight and improve modeling efficiency.
3.1. Transfer Learning
Transfer learning [
27] applies the knowledge learned from an old task to a different but related new task, avoiding learning new tasks from scratch and shortening the time of learning new tasks. There are two basic concepts of transfer learning: domain and task. The field with sufficient historical sample data is called the source field, and the field with limited data sample size is called the target field. The mathematical model can be expressed as:
where
ds and
dt respectively represent the source domain and target domain;
xs and
ys respectively represent the source domain samples and their corresponding labels; and
xt and
yt respectively represent the target domain samples and their corresponding tags. The task mathematical model of the source domain and target domain is expressed as:
where
ts and
tt represent the tasks of source domain and target domain, respectively, and
f(~) represents the mapping relationship between domain data
x and target value
y. Migration learning is to migrate the source domain mapping relationship
fs(~) to the target domain. When the data distribution of the source domain and the target domain is similar, the target domain model can be modified by fine tuning to obtain the target domain mapping relationship
ft(~).
3.2. Bi-LSTM
When dealing with long time series, RNN has problems such as gradient explosion and disappearance. To solve such problems, Hochreiter proposed a LSTM network. A LSTM network controls the preservation and loss of time-series information through a forgetting gate, input gate, and output gate, so as to effectively avoid the disappearance of gradient caused by long series. The structure of LSTM unit is shown in
Figure 4.
The forgetting gate controls the information flow of the previous time to ensure the backward transmission of effective information. The input gate updates the current data to the storage unit. The function of the output gate is to transmit the information of the storage unit to the next time. The calculation process is as follows:
where
Wf,
Uf,
bf,
Wi,
Ui,
bi,
Wo,
Uo and
bo are the parameters corresponding to the three gates respectively,
Wc,
Uc,
bc are the weight parameters corresponding to the cell state,
xt and
ht are the input of LSTM unit and the output of hidden layer at time
t, which
σ are sigmoid functions, and tanh represents hyperbolic tangent functions.
The Bi-LSTM model is composed of two LSTM networks with opposite directions and shared weights. The principle is shown in
Figure 5. The blue circle represents the forward LSTM network output. The green circle represents the reverse LSTM network output. Bi-LSTM can learn load time series information from both positive and negative directions, integrate past and future load information to update weight parameters, and improve the regression accuracy of long time series.
3.3. Incremental Model Training Method Based on Bi-LSTM
Based on the establishment of a load forecasting model, this paper introduces transfer learning to realize the rapid update of model weight when load data changes. The principle is shown in
Figure 6.
The maximum mean difference (MMD) is mainly used to measure the distance between two different samples [
28]. Therefore, MMD is usually used in transfer learning to evaluate the correlation of data distribution between source domain and target domain. When MMD is small, the data distribution of source domain and target domain is very similar. Adjusting the model structure will reduce the fitting accuracy of the model. The source domain data studied in this paper are the collected historical load samples, and the target domain data are the new load samples. The data distribution difference between the source domain and the target domain is small, so there is no need to adjust the source domain model structure.
The source domain model can be divided into three parts according to its functionality. The shallow network extracts the detailed characteristics of the source domain data, the deep network extracts the overall data information, and the output layer outputs the prediction results. Taking the source domain model parameters as the initial parameters of the target domain model, fixing different Bi-LSTM network layers, and using the target domain data to update the parameters of other network layers, the network parameters can be updated efficiently.
The model incremental training process based on Bi-LSTM is shown in
Figure 7, and the specific steps are as follows.
Step 1: build the source domain model. The method proposed in
Section 2.2 is used to establish the source domain model for the historical load data, and the three parameters of network layers, time step, and iteration times are selected to save the source domain model with the highest prediction accuracy.
Step 2: train the target domain model. Import the source domain model, migrate the source domain model parameters to the target domain, fix the different Bi-LSTM network layers, train other network layers with the target domain data, and save the target domain model with the highest prediction accuracy.
Step 3: output the prediction results. Load the target domain model, normalize the load influencing factors and reshape them into a three-dimensional matrix, input the target domain model, and then inverse normalize the output value to obtain the load forecasting value.
4. Ultra-Short-Term Load Dynamic Forecasting Method Considering Abnormal Data Reconstruction Based on Model Incremental Training
A high-quality load data set is the premise of establishing the prediction model. In this paper, isolation forest and conditional generation countermeasure network are used to deal with outliers to improve the quality of the data set. Secondly, the prediction model based on Bi-LSTM network is constructed, and migration learning is introduced to realize the incremental training of prediction model. The basic framework is shown in
Figure 8.
- (1)
Abnormal data reconstruction based on IF-CGAN.
Firstly, the historical load data are normalized by Equation (20), the isolation forest model is constructed, and the load outliers are screened and deleted. The conditional value of CGAN is the load influencing factor, mainly including time factor (time, rest day), climate factor (wind speed, air temperature, dew point temperature, air pressure, cloud amount) and load factor (the load value is engraved at the same time in the first two days and the load is engraved at the same time in the next two days). The random noise and the conditional value are spliced horizontally and input into the generator. The convolution kernel is used to extract the features and output the generated samples. The generated sample, real load, and condition value are spliced horizontally and input into the discriminator for true and false discrimination. In the process of game between generator and discriminator, the nonlinear relationship between load and influencing factors is excavated. After the game between generator and discriminator reaches balance, the generator is saved and the missing load data are filled:
where
an is the result of data normalization,
amax and
amin are the maximum and minimum values respectively, and
a is the initial value of data.
- (2)
Construction of load forecasting model based on Bi-LSTM network.
After obtaining the complete historical load data, the data are normalized and reconstructed into three-dimensional matrix format of sequence length, time step, and number of sequence features. The load forecasting model is built based on a Bi-LSTM network. The optimizer selects Adam, the learning rate is set to 0.001, the loss function is MSE, and the optimal model is saved according to the prediction accuracy evaluation index.
- (3)
Network incremental training method based on Transfer Learning.
After obtaining the complete target domain data, load the source domain model, fix different network parameters, input the target domain data, train and update the remaining network layer weights, save the optimal target domain model, and output the target domain prediction results.
5. Example Analysis
In order to verify the effectiveness of the method proposed in this paper, the load data from November 2012 to August 2013 in a region of eastern China are used as an example. The sampling frequency of the load data is 15 min, the time span is 10 months, and a total of 28,277 load data are collected. Among them, the load information of 8 months from November 2012 to June 2013 is selected as the source domain data, the load information of July 2013 is selected as the target domain data, and the load information of August are the test set data.
5.1. Evaluation Index Construction
The reconstruction accuracy and
R square are selected as the evaluation indexes of data reconstruction. The closer the reconstruction accuracy and
R square are to 1, the higher the authenticity of model reconstruction. Root mean square error (RMSE) and mean absolute error (MAE) are used as the evaluation indexes of the prediction model. The smaller RMSE and MAE are, the closer the predicted value of the model is to the real value [
13]. The calculation formula of evaluation index is as follows:
where
zg,
zt and
zp are the generated value, real value and predicted value of load respectively, and
zm represents the average value of missing data.
5.2. Comparative Analysis of Abnormal Data Reconstruction Results
In the process of power grid data acquisition, the meter flies away, communication faults and other phenomena occur from time to time, and the obtained data sets easily contain missing and abnormal data. Missing values are randomly generated in the original load data set to simulate missing data caused by communication failures. The value of abnormal data ranges from 1.5 to 1.8 times of the real value, isolation forests are used to simulate the data anomalies caused by flight, and labels are set to verify the identification effect of abnormal data. The number of isolation trees is set to 100, and the containment parameter is set according to the proportion of outliers. An isolation forest model is established to detect outliers in the data set, and the detection results are compared with the real labels.
The detection results of abnormal data are shown in
Figure 9. The blue and black circles represent the normal data and abnormal data of the load respectively, while the red circles represent the detected values of isolation forests. When the red circles coincide with the black circles, it means that isolation forests identify the outliers correctly. As can be seen from
Figure 9, isolation forests can more accurately identify abnormal data caused by meter flight.
The original load data were processed by isolated forest to form missing data sets with missing rates of 10%, 20%, 40%, and 60%, the gradient penalty coefficient λ and the weight of objective function λ1, λ2 are 40, 10 and 0.6 respectively. The complete data under different deletion rates are sent to the CGAN model for training. After the game between the generator and the discriminator reaches balance, the generated samples are output. The filling accuracy and R square are selected as the evaluation indexes of the generated samples, and compared with the mean interpolation method, KNN and random forest. The weight of KNN is set to “distance”, K is set to 8, the number of decision trees of random forest is 50, and the maximum depth is 20.
Figure 10 shows the data-filling effect of different models with a deletion rate of 40%. It can be seen from the figure that CGAN fully excavates the nonlinear relationship between load and influencing factors and can reconstruct the abnormal data more accurately. The generated load samples are closest to the real value. The filling effect of random forest and KNN is poor when dealing with long time series.
From the filling quantitative results of different models in
Table 3 and
Table 4, it can be seen that the
R square sum accuracy of reconstructed data of CGAN model under different deletion rates is higher than that of the other three models, which verifies the effectiveness of the reconstructed data in the CGAN model. In different loss rate data sets, the reconstruction index of mean interpolation method is the least ideal, the reconstruction effect of CGAN model is relatively stable, and the reconstruction accuracy of KNN and RF is similar, especially in high loss rate data, the reconstruction accuracy fluctuates greatly. Under different deletion rates, the reconstruction accuracy of the CGAN model constructed in this paper is improved by 8.1% and 3.5% compared with the other three models.
5.3. Analysis on the Influence of Data Processing Methods on Prediction Results
The data generated by the model in
Section 5.2 are used to interpolate the load anomaly data set, and the Bi-LSTM network is constructed according to the parameter settings in
Section 5.4. Then, the complete data set and the data set obtained by the deletion method are sent to the network to train the Bi-LSTM network to verify the effectiveness of the time series reconstruction of the model proposed in this paper.
After using the above five methods to process the data set with a deletion rate of 40%, the prediction results of the Bi-LSTM network are shown in
Figure 11. It can be seen from the figure that the data set reconstructed by CGAN model achieves the best fitting effect, which verifies the effectiveness of the reconstruction timing of CGAN model. The deletion method destroys the integrity of the data sequence, and the prediction effect of the obtained data set is the worst. Mean interpolation, KNN, and random forest restore the integrity of the sequence to a certain extent. The overall prediction effect is better than the deletion method, but there is still a gap with the CGAN model. The quantitative results of network prediction error of the complete data set obtained under different deletion rates are shown in
Table 5 and
Table 6.
It can be seen from
Table 5 and
Table 6 that for the load data set repaired by CGAN, the prediction error of Bi-LSTM network is the smallest, and the prediction accuracy is higher than that of the other interpolation models. When the deletion rate is 10%, the data set obtained by directly deleting abnormal data is the closest to CGAN; with the increase of data missing rate, the prediction accuracy of the data set obtained by deletion method and mean interpolation becomes clearly worse. On the data set reconstructed by CGAN, RMSE and MAE of Bi-LSTM network are reduced by at least 3.1% and 4.8% compared with the other four processing methods.
5.4. Comparative Analysis of Prediction Results of Source Domain Model
The Bi-LSTM network can effectively avoid the phenomenon of gradient disappearance when dealing with long time series. There are many Bi-LSTM network parameters, and different parameter settings have a great impact on the prediction effect of the network. In this paper, three parameters that have a great impact on the network are selected for research. The ultra-short-term load forecasting experiment is carried out on the data set processed by CGAN model. The source domain data are divided according to the ratio of 4:1. 80% of the data is used as the training set, and the other data is the test set. The prediction error of the test set with different parameter combinations is shown in
Table 7.
As can be seen from
Table 7, when the number of iterations, time step, and batch size of the Bi-LSTM network are set to 200, 15, and 32, respectively, the load information learned by the network is the most abundant and the prediction error is the smallest. If the number of iterations is increased or reduced, the network will appear over fitting or under fitting, resulting in the reduction of prediction accuracy.
The Bi-LSTM network is constructed according to the above parameters and compared with BP network, RNN, and SVR. The number of neurons in the hidden layer of BP network is set to 32, and the RNN parameter setting is consistent with that of the Bi-LSTM network. The kernel function of SVR model is set as Gaussian kernel function, and the penalty coefficient is 1.5. The load forecasting results of a certain day in the test set are shown in
Figure 12.
As can be seen from
Figure 12a, compared with the other two models, the cyclic neural network can better fit the real value, but in terms of the details of the fitting curve and error curve, compared with RNN, the prediction result from the Bi-LSTM network is closer to the real value.
Figure 12b shows the prediction error distribution curves of different models. It can be seen from the figure that the error of Bi-LSTM network is concentrated around 0, and the error fluctuation is smaller than that of other models.
Table 8 shows the comparison of source domain prediction errors of different models. It can be seen from the table that the prediction accuracy of Bi-LSTM model is better than that of other models in terms of RMSE and MAE. Compared with RNN, after setting the same parameters, RMSE and MAE decreased by 3.67% and 7.43% respectively; Compared with BP network and SVR, RMSE decreased by 46.52% and 22.21% respectively, and MAE decreased by 47.82% and 18.87% respectively.
5.5. Comparative Analysis of Prediction Results of Target Domain Model
When the load data set changes, in order to improve the prediction accuracy of the model, it is necessary to extract the load information of the latest data set. However, in order to extract the information of the latest data, it is often necessary to retrain the model on the new data set, which is not only time-consuming, but also with the increase in the amount of data, the gradient of the model will disappear, underfitting and other problems will appear. In contrast, migration learning can quickly update the network weight and realize the efficient incremental training of prediction model.
Load the source domain model saved in
Section 5.4, update the target domain model parameters with the target domain data, and compare the prediction results of the four models of unfixed network layer parameters (BiLSTM-TL0) and fixed network layer 1 to 3 parameters (BiLSTM-TL1~BiLSTM-TL3).
Figure 13a shows the prediction results of test sets obtained by fixing different network layers. It can be seen from the figure that all migration learning models can better fit the load curve, but in the details of the curve, the prediction results of BiLSTM-TL1 model obtained by fixing the network parameters of the first layer are closer to the true value. At this time, the load information extracted by the model is the most abundant. The data distribution of the target domain is similar to that of the source domain, and the amount of data in the target domain is small, so the BiLSTM-TL0 model is prone to over fitting. BiLSTM-TL3 and BiLSTM-TL2 models fix the deep network parameters and cannot effectively extract the overall characteristics of the target domain data.
Figure 13b shows the prediction error distribution curves of different transfer learning models. It can be seen from the figure that the errors of model BiLSTM-TL1 are concentrated around 0, and the distribution frequency with deviation of 0 is higher, which has less error fluctuation compared with other models. Therefore, the prediction accuracy of BiLSTM-TL1 model is better than other transfer learning models.
Table 9 shows the comparison of the prediction error results for the test sets of different transfer learning models. It can be seen from the table that the prediction results of BiLSTM-TL1 are the best in terms of RMSE and MAE. Compared with other transfer learning models, RMSE and MAE are reduced by 2.4% and 4.8%, respectively.
Compare the BiLSTM-TL1 model with the Bi-LSTM network (take the data of source domain and target domain as the network training set), BiLSTM-S model (the source domain model saved in
Section 5.4), BP network and SVR. The parameter settings of Bi-LSTM network, BP network and SVR are consistent with
Section 5.4.
Figure 14a shows the prediction results for the test sets of different prediction models. It can be seen from the figure that the four prediction models can better fit the load change trend. However, from the detailed enlarged figure, it can be found that the predicted value of Bi-LSTM network is closer to the real value. Compared with BiLSTM-S model, the prediction effect of BiLSTM-TL1 model is improved, which is similar to Bi-LSTM network, indicating that there is no negative migration of BiLSTM-TL1 model.
Figure 14b shows the prediction error distribution curves of different models in the target domain. It can be seen from the figure that the prediction error distribution curves of BiLSTM-TL1 model and Bi-LSTM network are very similar, which are concentrated near 0, and the error fluctuation is small compared with other models.
From the prediction results of different model test sets in
Table 10, it can be seen that the comprehensive prediction effect of BiLSTM-TL1 model is the best in the three evaluation indexes of RMSE, MAE, and training time. Compared with a Bi-LSTM network without transfer learning, on the premise of reducing the prediction accuracy by 3.9%, the training time of BiLSTM-TL1 model is shortened by 91.1%, and the training efficiency of the model is improved. Since the BiLSTM-TL1 model uses the latest sample data to update the network weight, compared with the BiLSTM-S model, RMSE and MAE are reduced by 8.8% and 7.3%, respectively. The load prediction accuracy of BiLSTM-TL1 is similar to that of GRU network, but BiLSTM-TL1 saves 87.6% training time. Compared with BP network, the training time of BiLSTM-TL1 model is similar, and the RMSE and MAE are reduced by 16.7% and 15.5%, respectively. Based on the above analysis, the BiLSTM-TL1 model, which introduces migration learning and fixes the first layer network parameters, can realize network incremental training, ensure the prediction accuracy and save a lot of model training time.