1. Introduction
The stable operation and economic dispatch in the power system relies heavily on the accurate forecasting of future loads [
1]. The types of load forecasting can be divided into three categories, namely Short-term forecasting, Mid-term forecasting and Long-term forecasting [
2]. Short-term power load forecasting is mainly used to arrange power generation plans and help relevant departments to establish reasonable power dispatching plans. Loads are strongly stochastic, and there are numerous factors affecting their load characteristics [
3]. With the development of society and social economy, the scale of the power grid is becoming larger and larger, the number of devices keeps increasing, and the collection frequency of a smart power grid system for load is also increasing, which provides a large number of high-quality data sets for a power load forecast and provides a data basis for the application of deep learning in a power grid [
4].
Power load forecasting methods are mainly divided into three categories: first, forecasting methods based on traditional mathematical statistical models [
5,
6,
7,
8], such as time series method [
5,
6], multiple linear regression [
7], etc. Power load data have the characteristics of non-stationarity and strong randomness, and it is difficult to obtain accurate forecasting results by using such methods to forecast it. Second, forecasting methods based on machine learning, such as support vector regression [
9], long short-term memory network (LSTM) [
10,
11,
12], etc. Compared with traditional forecasting methods, forecasting methods based on machine learning have strong fitting ability, so they have been widely used in power load forecasting and have achieved good results. Although the machine learning method has many advantages, in the actual load forecasting, it is difficult for machine learning to deeply extract its features for non-stationary time series data. In addition, the authors in [
13] point out that machine learning has the disadvantages of a difficult selection of hyperparameters and a large consumption of computing resources. Third, there are two methods of combined forecast: (1) using multiple algorithms to forecast and then assigning weights to different algorithms [
14] and (2) the method of decomposing power load data firstly and then forecasting [
15,
16]; common data decomposition methods include empirical mode decomposition (EMD) [
15], wavelet decomposition [
16], etc.
A kind of combined forecasting method uses multiple algorithms to forecast and then obtain accurate forecast results through weight allocation. The authors in [
17] established a Prophet model and LSTM model to forecast the load, respectively, and then used the least squares method to obtain a new model with different weight combinations of the two methods and forecast the load. This type of combination method usually determines its weight allocation according to a certain set of actual data, and can obtain well forecasting results on the experimental load data, but it cannot guarantee that the forecasting accuracy can also be effectively improved in other power load data.
Another kind of combined forecasting method is: first decompose the power load and then forecast. This method fully excavates the data characteristics, decomposes the load into several components, and then models and forecasts the components separately. Finally, the forecasting results of each component are superimposed to obtain the final power load forecasting results. The authors in [
16] used wavelet decomposition to decompose the load series and the ADF (augmented Dickey–Fuller) test was used to select the optimal number of decomposition layers, then a second-order gray prediction model was used to forecast each component, and finally, the final forecasting results were obtained by superimposing the forecasting results of each component. Although wavelet decomposition can decompose power load data, different forecasting effects can be obtained by selecting different basis functions and decomposition layers, which makes this method a priori and increases the difficulty of forecasting. The authors in [
18], used the STL decomposition method to decompose the time series of load data into three parts: trend, period and residual, so as to reduce the interaction between different parts, modeling and forecasting these three parts, respectively, to make the forecasting results more accurate. The STL method can only decompose the data into three parts: trend, period, and residual. For complex nonlinear power data, this method does not decompose the data thoroughly, making the final forecasting result not ideal. The authors in [
15] used an ensemble empirical mode decomposition (EEMD) to decompose the raw power load data into different components from high frequency to low frequency, and then used multiple linear regression (MLR) and gated recurrent unit neural network (GRU) to forecast the low frequency subsequences and high frequency subsequences, respectively, which improved the forecasting accuracy. However, the disadvantage of this method was that the load data were decomposed into 11 subsequences by EEMD, which greatly increased the calculation amount of the forecasting model. The authors in [
19] used EEMD to decompose power load data into high-frequency components, low-frequency components and random components; then, according to the different characteristics of each component, Least Squares Support Vector Machine (LSSVM) with different kernel functions was used to forecast each component. Finally, the final forecasting results of the power load were obtained by superimposing the forecasting results of each component, which effectively improved the load forecasting accuracy. However, the calculation scale of the decomposed forecasting model was too large, and the model needs to be simplified. Therefore, in order to improve the forecasting speed of the model, the model after decomposing the power data should be simplified as much as possible.
In order to decompose the power load data thoroughly and simplify the forecasting model after the data decomposition, this paper proposed a short-term power load forecasting based on the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN)-sample entropy (SE) and the Back Propagation Neural Network (BPNN) and Transformer model. This method transforms the non-stationary time series forecasting problem with strong randomness into multiple and relatively stable time series forecasting problems through the CEEMDAN algorithm, which fully excavates the information in the original power load data; at the same time, SE is used to analyze the complexity of stationary subsequences, and they are superimposed and recombined to form some new subsequences, thereby reducing computational complexity and model complexity. Finally, the BPNN with simple structure, and the Transformer model with strong nonlinear fitting ability are used to build a combined forecasting model. The Transformer model is used to deeply mine the intrinsic information of subsequences with high complexity.
By modeling and forecasting the power load data in a certain area of Spain, and comparing it with six methods, the results showed that the combined forecasting model proposed in this paper had high accuracy and low computational cost.
2. Combined Forecasting Model Based on the CEEMDAN-SE-SE-BPNN-Transformer
This paper combined the data decomposition algorithm with the BPNN and the Transformer model, and proposed a combined forecasting model based on the CEEMDAN-SE-BPNN-Transformer. The CEEMDAN-SE was used to decompose the power load into a series of subsequences with obvious differences in complexity, and then the BPNN and the Transformer model were used to model and forecast the subsequences with low and high complexity, respectively. The sequence forecast results were superimposed to obtain the final load forecast results. The flowchart of the proposed algorithm is shown in
Figure 1.
Since the power load was affected by many factors, it was difficult for a single forecasting method to obtain accurate forecasting results. Therefore, the CEEMDAN algorithm was used to transform the nonlinear non-stationary time series forecasting problem into several stationary time series forecasting problems, and the complex power load was decomposed into a relatively simple sub-series. At the same time, the SE was used to analyze the complexity of the decomposed stationary subsequences, and it was recombined to form some new subsequences, thereby reducing the amount of calculation and the complexity of the model. For subsequences with higher complexity, the Transformer model based on the attention mechanism can pay more attention, dig out the rules to the greatest extent, and obtain more accurate forecasting results; for subsequences with lower complexity, its periodicity is strong, so a simple-structured BPNN was used for forecast, thereby reducing the training time and avoiding the problem of consuming more resources.
5. Conclusions
A short-term load forecasting model based on the CEEMDAN-SE-BPNN-Transformer is proposed in this paper. Through the example simulation, it was demonstrated that the proposed method not only overcame the shortcomings regarding a single model which cannot effectively extract the characteristics of load data, but that it also improved the forecasting accuracy effectively.
In order to improve the accuracy of short-term load forecasting, a combined forecasting method was proposed based on the CEEMDAN, SE, BPNN, and Transformer model. The CEEMDAN algorithm was used to transform the nonlinear non-stationary time series forecasting problem into several stationary time series forecasting problems, and the complex power load was decomposed into a relatively simple sub-series. At the same time, the SE was used to analyze the complexity of the decomposed stationary subsequences, and it was recombined to form some new subsequences, thereby reducing the amount of calculation and the complexity of the model. For subsequences with higher complexity, the Transformer model based on the attention mechanism paid more attention, dug out the rules to the greatest extent, and obtained more accurate forecasting results; for subsequences with lower complexity, its periodicity was strong, therefore, a simple-structured BPNN was used for forecasting, thereby reducing the training time and avoiding the problem of consuming more resources. The simulation results indicated that the CEEMDAN-SE-BPNN-Transformer forecasting model had a MAPE of 1.1317%, and an RMSE of 304.40, with an overall better forecasting performance than the comparative models.