Short-Term and Medium-Term Electricity Sales Forecasting Method Based on Deep Spatio-Temporal Residual Network

Cao, Min; Wang, Jinfeng; Sun, Xiaochen; Ren, Zhengmou; Chai, Haokai; Yan, Jie; Li, Ning

doi:10.3390/en15238844

Open AccessArticle

Short-Term and Medium-Term Electricity Sales Forecasting Method Based on Deep Spatio-Temporal Residual Network

by

Min Cao

¹,

Jinfeng Wang

²,

Xiaochen Sun

²,

Zhengmou Ren

²,

Haokai Chai

³,

Jie Yan

⁴ and

Ning Li

^3,*

¹

State Grid Shaanxi Electric Power Company Limited, Xi’an 710048, China

²

State Grid Shaanxi Electric Power Company Limited Research Institute, Xi’an 710065, China

³

School of Electrical Engineering, Xi’an University of Technology, Xi’an 710048, China

⁴

State Key Laboratory of Alternate Electrical Power System with Renewable Energy Sources (NCEPU), School of Renewable Energy, North China Electric Power University, Beijing 102206, China

^*

Author to whom correspondence should be addressed.

Energies 2022, 15(23), 8844; https://doi.org/10.3390/en15238844

Submission received: 18 October 2022 / Revised: 15 November 2022 / Accepted: 18 November 2022 / Published: 23 November 2022

(This article belongs to the Section F5: Artificial Intelligence and Smart Energy)

Download

Browse Figures

Versions Notes

Abstract

:

The forecasting of electricity sales is directly related to the power generation planning of power enterprises and the progress of the generation tasks. Aiming at the problem that traditional forecasting methods cannot properly deal with the actual data offset caused by external factors, such as the weather, season, and spatial attributes, this paper proposes a method of electricity sales forecasting based on a deep spatio-temporal residual network (ST-ResNet). The method not only relies on the temporal correlation of electricity sales data but also introduces the influence of external factors and spatial correlation, which greatly enhances the fitting degree of each parameter of the model. Moreover, the residual module and the convolution module are fused to effectively reduce the damage of the deep convolutional process to the training effectiveness. Finally, the three comparison experiments of the ultra-short term, short term and medium term show that the MAPE forecasted by the ST-ResNet model is at least 2.69% lower than that of the RNN and other classical Deep Learning models, that its RMSE is at least 36.2% lower, and that its MAD is at least 34.2% lower, which is more obvious than the traditional methods. The effectiveness and feasibility of the ST-ResNet model in the short-term forecasting of electricity sales are verified.

Keywords:

ST-ResNet; external factors; convolutional neural network; spatio-temporal data; electricity sales forecasting; short- and medium-term forecasting

1. Introduction

With the huge power consumption and the vigorous development of the power industry, the accurate analysis of electricity sales data is of great significance to the national power grid construction and planning. The analysis of the historical data of electricity sales can directly or indirectly examine the management and operation of the power sector, the rationality of the electricity price in the past period, and the power consumption structure of the region. The forecasting of future electricity demand is also the premise and foundation to solve many decision-making problems in the electricity market. Some power companies have carried out the related work of electricity sales forecasting, but the following problems still exist: (1) a lack of power sales forecasting system support; (2) a lack of analysis methods for the influence of internal and external factors on electricity sales; and (3) a lack of support for electricity sales forecasting methods based on a big data analysis.

At present, many feasible methods have been proposed for electricity sales forecasting, which are mainly divided into three categories: The first category is traditional methods, including the moving average model (MA), exponential smoothing method (ES), autoregressive method (AR), support vector machine (SVM) [1,2], etc. The second category is a combination of two or more traditional methods, such as the autoregressive moving average model (ARMA) and autoregressive integrated moving average model (ARIMA). The third category is the forecasting method based on Deep Learning [3], mainly including the recurrent neural network (RNN), sequence to sequence (Seq2Seq), long short-term memory (LSTM), and gated recursive unit (GRU).

The early forecasting of electricity sales was often based on traditional methods. The literature [4] established a quadratic moving average model (MA) when forecasting the short-term electricity sales in a city. In the literature [5,6], quadratic and cubic exponential smoothing (ES) were, respectively, used to forecast the amount of electricity sales in all Chinese parks and a Chinese city, and the ES was used to forecast the electricity sales of a city in the literature [7]. In the forecasting process, the ES would not abandon the past data but give the past data a weight with low impact. The literature [8] established the autoregressive method (AR) based on the residential electricity demand of the recent 10 years in Sri Lanka. The difference between the AR and MA is that the historical data in the AR indirectly affects the current forecasted value. The model of the traditional methods is simple to construct and fast to calculate, but the simple regression or sliding model has great limitations, such as the homogeneity of the variation in the regression model, the change in the season, the climate, and other factors of the electricity consumption area that may cause the change in the regression relationship, thus presenting variability.

The combined model combining two or more traditional methods can take into account the advantages of the components. Some scholars complemented their anti-perturbance methods to data for forecasting electricity sales to make up for the limitations of a single model. The literature [9] proposed an autoregressive moving average model (ARMA), which is composed of a linear combination of the AR and MA, and used it to forecast the wind electricity sales in Portugal and the electricity market in India [10]. The most important feature of the ARMA is its autocorrelation equation and partial autocorrelation equation. It combines the advantages of the MA and AR, with a wide application range and small prediction error. The literature [11] proposed the autoregressive integrated moving average model (ARIMA) model on the basis of the ARMA model and studied several linear and nonlinear models to compare and forecast the electricity consumption in Taiwan. In [12], the ARIMA was also used to forecast the electricity sales in Colombo. The ARIMA can solve a non-stationary series, so the original series will be differentiated in the model. Such integrated traditional methods are more accurate for linear electricity sales data forecasting, but in the actual system, there may be a long-term and short-term dependency and nonlinear relationship among the multiple inputs and outputs [13,14], which makes the forecasting problem more complicated. However, they only support a fixed time dependency [15] and univariate data [16,17], which leads to the greatly reduced forecasting range and accuracy of the traditional methods and fails to achieve ideal results.

Compared with the above methods, the electricity sales forecasting method based on Deep Learning has a stronger fitting ability, so it has a better adaptability to nonlinear data models. With the increase in the dimension and length of the electricity sales data, the advantages of Deep Learning become more obvious. The recurrent neural network (RNN) proposed in the literature [18] forecasted the New England electricity market, and it could save the output of the network in a memory unit so as to recycle the information. The literature [19,20], respectively, used a long short-term memory network (LSTM) to forecast the electricity sales in Spain’s electricity market and seven wind power plants in Europe in the next month. This model introduced a gate mechanism to control the circulation and loss of features and solved the problem of the long-term dependency of the RNN. In [21], the gated recurrent network (GRU), a kind of RNN, was proposed. Both the GRU and LSTM were used to forecast the daily sales of the cash flow, and their effects were similar. However, the GRU is easier to train, so it can greatly improve the training efficiency of the model. In order to solve the limitation of the RNN structure on the sequence length, some scholars proposed a wind power forecasting method based on sequence to sequence (Seq2Seq). The literature [22,23], respectively, used this method to forecast the daily power generation of a Sotavento wind farm in Spain and the power generation of the Kanto region in Japan. This kind of Deep Learning model can input complex data, which is particularly effective for automatically extracting the features of the data. Moreover, their hierarchical structure can extract both long-term and short-term information. However, these methods essentially classify the problem of electricity sales forecasting as a time-series forecasting problem. The future electricity sales can be forecasted only based on the actual time-series data of the electricity sales [24]. However, the actual electricity sales data are often spatio-temporal data taking into account external factors, such as the weather and season as well as the temporal and spatial attributes [25], rather than just a historical time series.

Aiming at the above problems, this paper proposes a method of electricity sales forecasting based on a deep spatio-temporal residual network (ST-ResNet). In order to verify the validity and feasibility of the ST-ResNet model in forecasting electricity sales, this paper makes a comparison between the ST-ResNet model and seven other models for different forecasting intervals and compares their root mean square error. The main contributions of this paper are summarized as follows: (1) In view of the problem that other models are time-series predictions and difficult to fit, this thesis combined external factors and spatio-temporal attributes as predictive inputs to enhance the fitting degree of each parameter of the model. (2) Convolutional neural networks have gradient explosions at depth, so the residual network and convolutional neural network are fused for Deep Learning to avoid the gradient disappearance in the deep convolution.

The rest of this paper is organized as follows: Section 2 gives a detailed analytical description of each component of the model and the main flow of the algorithm. Section 3 compares the forecasting results of ST-ResNet with the seven other models. Section 4 is the conclusion and prospect.

2. Short-Term and Medium-Term Forecasting Model of Electricity Sales Based on ST-ResNet

The deep spatio-temporal residual network (ST-ResNet) consists of four main components: closeness, period, trend, and external components, as shown in Figure 1.

As shown in Figure 1, firstly, the electricity sales data in each time interval of the nine regions are, respectively, converted into a single-channel image matrix. The time axis is divided into three segments representing the recent time (closeness), the near time (periodicity), and the distant time (trend), and these three time properties are modeled. These three parts have the same network structure, namely convolutional neural network and residual unit sequence. This structure captures the spatial dependency between nearby and more distant regions. For the external component, some features (such as weather conditions and holiday data) are manually extracted from the external dataset and put into a two-layer fully connected neural network. Based on the parameter matrix, the output of the first three components is fused, and different weights are assigned to the output of different components in different areas, and then it is further integrated with the output of external components. Finally, the aggregation is mapped to

[- 1, 1]

by a tanh function, which converges faster than the standard logistic function during backpropagation learning.

2.1. Spatio-Temporal Attributes of Electricity Sales Forecasting

Electricity sales forecasting [27,28] refers to forecasting electricity sales according to its own changes and the influence of economic and meteorological factors. The forecasting experiment in this paper is based on short-term forecasting [29,30]. Accurate short-term forecasting is very challenging. As power consumption patterns become more complex, more and more forecasting methods for electricity sales have been proposed. In recent years, existing studies have begun to publish load data together with grid structures. This indicates that it is impossible to ignore spatial information when forecasting electricity sales. Texas is divided into a grid of

3 \times 3

, representing nine regions, numbered 1–9, respectively. The specific division is shown in Figure 2. The dataset contains hourly electricity sales of two-storied houses located in Texas, USA. The dataset contains hourly electricity sales in kWh starting from 1 June 2016 to August 2020. The dataset has marked notes for weekdays, weekends, COVID-19 lockdown, and vacation days in notes category column. Electricity sales during daytime are different from night time. Another dataset contains historical weather report of Texas starting from 1 June 2016 to August 2020. We selected the data from 2018 for predictive analysis. The distribution diagram is as follows.

Figure 3 shows the electricity sales data of nine regions in Texas. Here, the x-coordinate is time, dividing a year by hours, a total of 8760 data, and the y-coordinate is the electricity sales within the current hour, in kWh. The nine regions are distributed in the nine directions of Texas, and the overall trend of their electricity sales is similar. According to the geographical distribution of the nine regions, the more similar the area, the more similar the overall trend of electricity sales. Most of the traditional forecasting methods of electricity sales focus on time information and only consider the historical time value and weather information of electricity sales in a region. The spatial information between neighboring regions and target regions has not been well studied. Therefore, for spatio-temporal data, such as electricity sales data, both temporal and spatial attributes should be taken into account. Spatial attributes include spatial distance and spatial level. The so-called spatial distance refers to the geographical distance between two regions, while spatial level is reflected as the information of nine regions in this paper. There is power transmission between different regions. The electricity input and output of region i within the time interval t are defined as:

\begin{matrix} X_{t}^{i n, i} = \sum_{T_{r} \in P} \{k > 1 | g_{k - 1} \notin i \cap g_{k} \in i\} \\ X_{t}^{o u t, i} = \sum_{T_{r} \in P} \{k \geq 1 | g_{k} \notin i \cap g_{k + 1} \in i\} \end{matrix}

(1)

where

g_{k}

is the transmission grid,

g_{k} \in i

indicating that the grid is in region i.

2.2. ST-ResNet Applied to Electricity Sales Forecasting

2.2.1. Convolution

Texas is a very large state consisting of nine regions. Intuitively, the electricity sales in neighboring areas may affect each other, and convolutional neural network (CNN) [31,32] has a strong ability to capture spatial structure information in layers, which can well integrate this spatial relationship into forecasting. In addition, Texas has been divided into 9 regions, including non-adjacent or even far-away regions, which are also connected by grids, so there is a distant spatial dependency. Limited by its core size, a single convolution only considers spatial near dependency, so in order to capture the spatial dependency of any region, it is necessary to design a multi-layer CNN. The same problem was also found in a video generation study [33] with the same input and output resolution, that is, subsampling will bring loss of resolution. The convolution part of this paper is different from that of the classical CNN in that subsampling is removed and only convolution is used [34]. The convolution component structure is as follows.

As shown in Figure 4, samples of multiple time series at the lower level undergo multi-layer convolution. In addition, the results of the last layer are activated by tanh function. One node in the last layer depends on the information of neighboring nodes in the middle layer, and nodes in the middle layer depend on all nodes (that is, all inputs) in the first layer. This means that one convolution can naturally capture spatial near dependency, while multiple convolutions can further capture distant or even statewide dependency. The closeness in Figure 1 uses the two channel matrices of the recent time intervals to temporal closeness dependence. Let the nearest fragment be

S_{c} = [X_{t - l_{c}}, X_{t - (l_{c} - 1)}, \dots, X_{t - 1}]

, that is, the closeness-dependent sequence. Firstly, connect them with the first axis as a tensor

X_{c}^{(1)} \in R^{2 l_{c} \times I \times J}

, then the first-layer convolution in Figure 1 is:

\begin{matrix} X_{c}^{(1)} = f (W_{c}^{(1)} * X_{c}^{(0)} + b_{c}^{(1)}) \end{matrix}

(2)

where ∗ represents convolution; f is an activation function, such as rectifier

f (z) = m a x (0, z)

;

W_{c}^{(1)}

and

b_{c}^{(1)}

are the learnable parameters of the first layer.

The lowest order Teaxs map with different colors represents the multiple layers of input, and the arrows represent the deepening of the convolution layers and the subsequent focusing of the convolution kernel. The final result is activated by the activation function tanh.

2.2.2. Residual Unit [35]

Forecasting electricity sales data requires a very deep network to capture a wide range of regional dependency. For the nine regions of Texas, more than 15 consecutive convolutional layers are required to capture statewide dependency (that is, each node depends on the input of all nodes at the advanced level). Deep convolutional networks will cause gradient disappearance, which greatly damages the effectiveness of training. Even with the application of activation functions (such as ReLU) and regularization techniques (such as Ioffe and Szegedy), this damage is difficult to eliminate. The plain network is directly stacked with multiple layers, and the image recognition results are tested. The comparison between the training results and the standard verification errors is shown as Figure 5.

It can be seen that the model becomes worse and worse when the number of network layers is deeper and deeper (56 vs. 20). To solve this problem, it is necessary to use residual network learning in the model, which has been proven to be very effective for training ultra-deep neural networks with more than 1000 layers. In the residual structure, there is the following functional relationship for a residual unit:

\begin{matrix} y_{l} = h (x_{l}) + F (x_{l}, W_{l}) & (W_{l} = \{W_{l, k} |_{1 \leq k \leq K}\}) \\ x_{l + 1} & = f (y_{l}) \end{matrix}

(3)

where

y_{l}

is the output of the

l_{t h}

residual unit. Each residual structure is activated by a function, make

h (x_{l}) = x_{l}, x_{l + 1} = y_{l}

, it can be recursively obtained:

\begin{matrix} x_{L} = x_{l} + \sum_{i = l}^{L - 1} F (x_{i}, W_{i}) \end{matrix}

(4)

The input of the

L_{t h}

residual unit can be expressed as the sum of the input of a shallow residual unit and all the complex mapping results. Denote the loss function as

ε

, and calculate the back propagation to obtain:

\begin{matrix} \frac{δ E}{δ x_{l}} = \frac{δ E}{δ x_{L}} \frac{δ x_{L}}{δ x_{l}} = \frac{δ E}{δ x_{L}} (1 + \frac{δ}{δ x_{l}} \sum_{i = l}^{L - 1} F (x_{i}, W_{i})) \end{matrix}

(5)

When calculating the gradient optimization based on backpropagation neural network method, because the backpropagation uses the chain rule to calculate the gradient of the hidden layer, the gradient value will be multiplied by a series of times, resulting in a severe attenuation of the gradient of the shallow hidden layer, which is also the origin of the gradient vanishing problem. Obviously, there is no successive multiplication caused by the chain rule, that is, the origin of vanishing gradient no longer exists. In ST-ResNet shown in Figure 1, the residual unit of L layers is superimposed on the first convolutional layer, and the superposition formula is shown in (6):

\begin{matrix} X_{c}^{(l + 1)} = X_{c}^{(l)} + F (X_{c}^{(l)}; θ_{c}^{(l)}) \end{matrix}

(6)

F is the residual function which is the twice joint of ReLU + convolution, as shown in the Figure 6.

In this paper, the electricity sales data

S_{p} = [X_{t - l_{p} \cdot p}, X_{t - (l_{p} - 1) \cdot p}, \dots, X_{t - p}]

of

l_{p}

in the same time period several days before the target time period are taken as the periodic original sequence, and the corresponding output is

X_{p}^{(L + 2)}

. The electricity sales data

S_{q} = [X_{t - l_{q} \cdot q}, X_{t - (l_{q} - 1) \cdot q}, \dots, X_{t - q}]

of the same time period

l_{q}

in the weeks before the target time period are taken as the original sequence of the trend, and the corresponding output is

X_{q}^{(L + 2)}

.

2.2.3. External Component

The external component processes the second and third parts of the dataset and converts the data of external factors, such as festival information, weather, temperature, and wind speed, into metadata by one-hot coding. Holidays and weather conditions are binary vectors, and then the temperature and wind speed are scaled to [0, 1] by min–max normalization method. In form, two fully connected layers are superimposed on the obtained feature vectors, and the first layer can be regarded as the embedding layer with activation function for each subfactor. The second layer is used to map the output of the previous layer into the same shape as the output of the previous three components for easy fusion.

2.2.4. Fusion

According to Figure 1, the first three components correspond to the output of time closeness, period, and trend components, respectively. In this subsection, a fusion method based on parameter matrix is adopted to fuse the first three components, whose fusion formula is shown in (7):

\begin{matrix} X_{R e s} = W_{c} \cdot X_{c}^{(L + 2)} + = W_{p} \cdot X_{p}^{(L + 2)} + = W_{q} \cdot X_{q}^{(L + 2)} \end{matrix}

(7)

where

X_{c}

,

X_{p}

,

X_{q}

represent the output of three components,

W_{c}

,

W_{p}

,

W_{q}

are learning parameters which can adjust the degree of the influence by closeness, period, and trend;

X_{R e s}

is the output of the fusion of the three components. Then, this output is directly added with the outputs of the external components and activated using the tanh function as follows:

\begin{matrix} \hat{X_{t}} = \tan h (X_{R e s} + X_{E x t}) \end{matrix}

(8)

\hat{X_{t}}

is the target forecasting, which is obtained by adding and activating the two components. The tanh function is a hyperbolic tangent, ensuring that the output value is between −1 and 1, and the output has positive and negative values. It is a symmetric function centered at 0 with fast convergence speed and is not prone to loss value oscillation.

2.3. Comprehensive Process

Considering the above, the flow chart of the proposed electricity sales forecasting method based on ST-ResNet is shown in Figure 7.

3. Simulation and Experimental Verification

3.1. Experimental Environment

The code of the network model adopted in this paper is all implemented using the TensorFlow framework. The network model and data processing are based on PyCharm 2020, which is produced by JetBrains, a company based in Prague, Czech Republic. The operating system is a 64-bit Windows operating system with 8.00 GB of RAM and a Intel(R) Core(TM) i5-8300H 2.30 GHz processor, and the third-party modules, such as pandas, cv2, and numpy, are mainly adopted.

3.2. Data Introduction

The data used in this paper come from the electricity sales dataset of the nine regions in Texas on the Kaggle website [36], which are the COAST, EAST, FWEST, NORTH, NCENT, SOUTH, SCENT, WEST, and ERCOT. It includes the holiday data and weather data of the whole year of 2018 in the nine regions, with 8760 data in each region and the same number of datasets in the nine regions. The model established in this paper is mainly applied to the short- and medium-term forecast, so the follow-up measurement is the same as in the experiment. The time intervals for the ultra-short-term, short-term and medium-term forecast are, respectively, 1 h, 1 day, and 1 week, and the corresponding data are all the residential electricity sales in this area within the previous hour. The weather and temperature conditions are the actual weather conditions of a year recorded in the dataset. The weather types are divided into 16 weather types, such as sunny morning/sunny afternoon, sunny morning/light-rain afternoon, and the corresponding numbers are 0–15, respectively. The weekdays and weekends correspond to 0 and 1, respectively. After the 8760 time-series data are put into the neural network, they will be divided into the training set and testing set according to the ratio of 8:2. The setting of the datasets is shown in Table 1.

3.3. Experimental Results

In this subsection, three comparison experiments are carried out to compare the accuracy of the forecasting results by inputting the time series in the interval of hours (ultra-short term), days (short term), and weeks (medium term) into ST-ResNet and the seven comparison models, respectively. The improved X13-ARIMA-SEATS(X13) [27] and LSTM-NN [19] are selected for the ARIMA and LSTM, respectively. In this paper, the mean absolute percentage error (MAPE), mean absolute difference (MAD), and root mean square error (RMSE) are used to measure the predictive effectiveness of the model, with lower values representing better forecasting results. Let the residual at time t be

e_{t} = | Y_{t} - X_{t} |

, then these forecasted values are defined as follows:

\begin{matrix} MAPE & = \frac{1}{n} \sum_{t = 1}^{n} \frac{| e_{t} |}{\hat{X_{t}}} \times 100 \\ MAD & = \frac{1}{n} \sum_{t = 1}^{n} | e_{t} | \\ RMSE & = \sqrt{\frac{1}{n} \sum_{t = 1}^{n} {(e_{t})}^{2}} \end{matrix}

(9)

3.3.1. Comparison of Forecasting Results of Ultra-Short-Term Electricity Sales

When making ultra-short-term forecasting, the total dataset is shown in Table 1. Taking the forecasting results of the last 120 h as an example to compare the advantages and disadvantages of the models, the corresponding forecasting results of the eight models are as follows.

The specific forecasting results are shown in Table 2:

It can be seen from Figure 8 that the electricity sales data change regularly in a 24 h cycle, and the electricity sales during the day are generally higher than that at night. By analyzing the data in Table 2, the ARMA model has a better forecasting performance compared to the MA and ES models. The MAPE of X13 is about 5% lower than those of these two models, and the RASE and MAD are also much lower than those of them. This is because the ES and MA are more suitable for smooth data, while the data of the electricity sales dataset is not smooth. It can be seen from several MA and ES forecasting charts that the forecasted results are quite different from the real results. X13 has an autoregressive module on the basis of the MA, which can exclude two kinds of perturbance at the same time and is not easy to be affected by abrupt data. However, there are still large errors in the forecasting results of X13. The accuracy of the RNN and other Deep Learning methods in forecasting non-smooth data, such as electricity sales, is much higher than that of the MA and other traditional methods. Compared with several Deep Learning methods, the RNN has the worst forecasting effect of the electricity sales, while the LSTM-NN, GRU, and Seq2Seq have little difference in the effect, which is slightly better than the RNN model. At the same time, it is easy to see that the forecasting effect of ST-ResNet is significantly better than the other Deep Learning methods in the three situations. Its RMSE is at least 37.7% lower than that of the other four Deep Learning methods, and it also has great advantages in terms of the MAPE and MAD data. This is because ST-ResNet not only establishes models for historical time series but also introduces spatial relationships and external factors so that the parameters and variables of the model can be better fitted during training. The error obtained on the testing set is smaller and the accuracy is higher.

3.3.2. Comparison of Forecasting Results of Short-Term Electricity Sales

For short-term forecasting, a total of 365 data were selected in the dataset at an interval of one day, among which 300 were the training set and the remaining 65 were the testing set. The corresponding forecasting results of the eight models are as follows.

The specific forecasting results are shown in Table 3:

According to the analysis in Figure 9, the weekly electricity sales are mainly concentrated on working days, and the data image shows periodic fluctuations in the week unit. The RMSE and MAD of the Deep Learning methods such as the RNN, in Table 3, are still significantly lower than those of the traditional models such as the MA, and the MAPE is even reduced by about 10%. In the Deep Learning methods, the gap between the RNN and the other methods becomes larger. This is because the size of the sample data is less than that of the ultra-short-term forecasting, and the RNN has the problem of a long-term dependency, which may increase the forecasting error due to the disappearance or explosion of the gradient. The RMSE of the ST-ResNet method is about 36% lower than that of the LSTM-NN, GRU, and Seq2Seq; about 38% lower than the MAD; and about 2.7% lower than the MAPE, which still has a great advantage.

3.3.3. Comparison of Medium-Term Electricity Sales Forecasting Results

For the medium-term forecasting, a total of 53 data were selected from the dataset at an interval of one week, of which 30 were the training set and the remaining 23 were the testing set. The corresponding forecasting results of the eight models are as follows.

The specific forecasting results are shown in Table 4:

In Figure 10, the data of the electricity sales generally show a downward trend, which is affected by the seasons. According to Table 2, Table 3 and Table 4, the forecasting effect of several models is affected by the size of the dataset and the forecasting interval and becomes worse. The MAPE of the traditional models, such as the MA, increased by about 3%, while the MAPE of the Deep Learning methods, such as the RNN, increased by about 1%, while the ST-ResNet only increased by 0.3%. This is because the proposed method divides the time series from three attributes, namely the closeness, period, and trend, and analyzes the data with different time spans. This also proved the validity and feasibility of ST-ResNet in the ultra-short-term, short-term, and medium-term forecasting of electricity sales.

4. Conclusions

This paper proposes a method for forecasting electricity sales based on the deep spatio-temporal residual network. The innovation lies in taking the time relationship, spatial relationship, and external factors as inputs to train the model at the same time so that the results obtained are closer to the actual data. Moreover, the residual module is integrated into the Deep Learning of a convolutional neural network, which effectively solves the problem of the gradient disappearance caused by a deep convolution. Under the ultra-short-term, short-term, and medium-term forecasting intervals, the comparison experiments of electricity sales forecasting in Texas showed that the forecasting MAPE of ST-ResNet is at least 2.69% lower than that of the classical Deep Learning models such as the RNN, that its RMSE is at least 36.2% lower, and that its MAD is at least 34.2% lower. Compared with the traditional methods, its advantages are more obvious. When the traditional time-series forecasting model is used to forecast the non-stationary dataset, because the model requires a high stationarity of the dataset, if the data smoothness is not processed, the forecasting effect obtained is very different from the actual value. If it is applied to the practical application, it will cause a huge loss. The forecasting results obtained by the RNN and its extended Deep Learning model without considering the actual space conditions and external factors are quite different from the actual results, which cannot be used for the actual electricity sales forecasting.

Although the three experiments have fully demonstrated the superiority of the method in this paper over other methods, we have not analyzed the actual economic benefits and the possibility of future applications. In this paper, the process and effect of applying ST-ResNet to the forecast of electricity sales are described in detail. However, the data of the electricity sales cannot be completely used as the standard index of economy, and more data directly related to the economy should be added. On the other hand, as a model that can handle relatively complex data, our ST-ResNet should be applied to more similar situations, such as predicting the power generation, monitoring peak power consumption, and other power aspects, or image recognition. In the future, we will consider how to apply the ST-ResNet model to long-term forecasting and achieve better forecasting results. In addition, the ST-ResNet model will be considered to be applied to the forecasting of other spatio-temporal data, not only limited to the forecasting of the spatio-temporal data of electricity sales.

Author Contributions

Conceptualization, M.C. and J.W.; methodology, M.C.; software, X.S.; validation, M.C., J.Y. and N.L.; formal analysis, J.Y.; investigation, Z.R.; resources, J.W.; data curation, X.S.; writing—original draft preparation, H.C.; writing—review and editing, H.C.; supervision, J.Y.; project administration, N.L.; funding acquisition, Z.R. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (52177193); the Key Research and Development Program of Shaanxi Province (2022GY-182); and the China Scholarship Council (CSC) State Scholarship Fund International Clean Energy Talent Project (Grant Nos. [2018]5046, [2019]157).

Data Availability Statement

The dataset link: https://www.kaggle.com/datasets/srinuti/residential-power-usage-3years-data-timeseries/code (accessed on 20 July 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, Y.; Wang, D.; Tang, Y. Clustered hybrid wind power prediction model based on ARMA, PSO-SVM, and clustering methods. IEEE Access 2020, 8, 17071–17079. [Google Scholar] [CrossRef]
Zhang, J.; Chu, X.; Huang, X.; Fan, W.; Chen, Y.; Wan, Q.; Zhao, J. A model for photovoltaic output prediction based on SVM modified by weighted Markov chain. Power Syst. Prot. Control 2019, 19, 63–68. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Jiang, H.; Fang, D.; Spicher, K.; Cheng, F.; Li, B. A new period-sequential index forecasting algorithm for time series data. Appl. Sci. 2019, 9, 4386. [Google Scholar] [CrossRef] [Green Version]
Peng, B.; Liu, L.; Wang, Y. Monthly electricity consumption forecast of the park based on hybrid forecasting method. In Proceedings of the 2021 China International Conference on Electricity Distribution (CICED), Shanghai, China, 7–9 April 2021; pp. 789–793. [Google Scholar]
Jiang, W.; Wu, X.; Gong, Y.; Yu, W.; Zhong, X. Holt–Winters smoothing enhanced by fruit fly optimization algorithm to forecast monthly electricity consumption. Energy 2020, 193, 116779. [Google Scholar] [CrossRef]
Sulandari, W.; Suharton, S.; Rodrigues, P. Exponential smoothing on modeling and forecasting multiple seasonal time series: An overview. Fluct. Noise Lett. 2021, 20, 2130003. [Google Scholar] [CrossRef]
Chandrasena, A.U.B. Forecast Electricity Sales in Industrial Sector in Sri Lanka Using Predictive Analytics. Master’s Thesis, University of Colombo, Colombo, Sri Lanka, 2022. [Google Scholar]
Rogus, R.; Castro, R.; Sołtysik, M. Comparative Analysis of Wind Energy Generation Forecasts in Poland and Portugal and Their Influence on the Electricity Exchange Prices. Inventions 2020, 5, 35. [Google Scholar] [CrossRef]
Mukherjee, P.; Coondoo, D.; Lahiri, P. Forecasting Hourly Spot Prices in Indian Electricity Market. Stud. Microecon. 2022, 23210222221108019. [Google Scholar] [CrossRef]
Shao, Y.E.; Tsai, Y.S. Electricity sales forecasting using hybrid autoregressive integrated moving average and soft computing approaches in the absence of explanatory variables. Energies 2018, 11, 1848. [Google Scholar] [CrossRef] [Green Version]
Herath, H.M.R.D.S.; Varathan, N. Statistical modelling of monthly electricity sales in Colombo: ARIMA approach. In Proceedings of the Research Symposium on Pure and Applied Sciences, Kelaniya, Sri Lanka, 12 December 2018. [Google Scholar]
Su, Y.; Zhao, Y.; Niu, C.; Liu, R.; Sun, W.; Pei, D. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2828–2837. [Google Scholar]
Han, Z.; Zhao, J.; Leung, H.; Ma, K.F.; Wang, W. A review of deep learning models for time series prediction. IEEE Sens. J. 2019, 21, 7833–7848. [Google Scholar] [CrossRef]
Lai, G.; Chang, W.C.; Yang, Y.; Liu, H. Modeling long- and short-term temporal patterns with deep neural networks. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018. [Google Scholar]
Petropoulos, F.; Spiliotis, E. The wisdom of the data: Getting the most out of univariate time series forecasting. Forecasting 2021, 3, 478–497. [Google Scholar] [CrossRef]
Braei, M.; Wagner, S. Anomaly detection in univariate time-series: A survey on the state-of-the-art. arXiv 2020, arXiv:2004.00433. [Google Scholar]
Zhang, C.; Li, R.; Shi, H.; Li, F. Deep learning for day-ahead electricity price forecasting. IET Smart Grid 2020, 3, 462–469. [Google Scholar] [CrossRef]
Memarzadeh, G.; Keynia, F. Short-term electricity load and price forecasting by a new optimal LSTM-NN based prediction algorithm. Electr. Power Syst. Res. 2021, 192, 106995. [Google Scholar] [CrossRef]
Shahid, F.; Zameer, A.; Muneeb, M. A novel genetic LSTM model for wind power forecast. Energy 2021, 223, 120069. [Google Scholar] [CrossRef]
Chen, S.; Lan, F.; Liu, M.; Ye, T.; Xiao, K.; Zheng, P.; Chang, Y.; Li, M.; Zhu, S.; Kong, D. Cash Flow Forecasting Model for Electricity Sale Based on Deep Recurrent Neural Network. In Proceedings of the 2019 IEEE International Conference on Power Data Science (ICPDS), Taizhou, China, 22–24 November 2019; pp. 67–70. [Google Scholar]
Zhang, Y.; Li, Y.; Zhang, G. Short-term wind power forecasting approach based on Seq2Seq model using NWP data. Energy 2020, 213, 118371. [Google Scholar] [CrossRef]
Xie, Y.; Ueda, Y.; Sugiyama, M. A Two-Stage Short-Term Load Forecasting Method Using Long Short-Term Memory and Multilayer Perceptron. Energies 2021, 14, 5873. [Google Scholar] [CrossRef]
Bu, F.; Yuan, Y.; Wang, Z.; Dehghanpour, K.; Kimber, A. A time-series distribution test system based on real utility data. In Proceedings of the 2019 North American Power Symposium (NAPS), Wichita, KS, USA, 13–15 October 2019; pp. 1–6. [Google Scholar]
Tian, C.; Zhu, X.; Hu, Z.; Ma, J. Deep spatial-temporal networks for crowd flows prediction by dilated convolutions and region-shifting attention mechanism. Appl. Intell. 2020, 50, 3057–3070. [Google Scholar] [CrossRef]
Zhang, J.; Zheng, Y.; Qi, D. Deep spatio-temporal residual networks for citywide crowd flows prediction. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
Li, K.; Li, J.; Chen, S.; Tang, J.; Wu, J.; Zhang, Y.; Xiao, K. Research on electricity sales forecast model based on big data. In Proceedings of the International Symposium on Cyberspace Safety and Security, Haikou, China, 1–3 December 2020; pp. 316–328. [Google Scholar]
Baker, A.B.; Bunn, D.; Farmer, E. Load Forecasting for Scheduling Generation on a Large Interconnected System; Wiley: Chichester, UK, 1985. [Google Scholar]
Christiaanse, W.R. Short-term load forecasting using general exponential smoothing. IEEE Trans. Power Appar. Syst. 1971, PAS-90, 900–911. [Google Scholar] [CrossRef]
Chen, K.; Chen, K.; Wang, Q.; He, Z.; Hu, J.; He, J. Short-term load forecasting with deep residual networks. IEEE Trans. Smart Grid 2018, 10, 3943–3952. [Google Scholar] [CrossRef] [Green Version]
Fukushima, K.; Miyake, S. Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In Competition and Cooperation in Neural Nets; Springer: Berlin/Heidelberg, Germany, 1982; pp. 267–285. [Google Scholar]
Bhatt, D.; Patel, C.; Talsania, H.; Patel, J.; Vaghela, R.; Pandya, S.; Modi, K.; Ghayvat, H. CNN variants for computer vision: History, architecture, application, challenges and future scope. Electronics 2021, 10, 2470. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Rückauer, B.; Känzig, N.; Liu, S.C.; Delbruck, T.; Sandamirskaya, Y. Closing the accuracy gap in an event-based visual recognition task. arXiv 2019, arXiv:1906.08859. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 13–19 June 2016; pp. 770–778. [Google Scholar]
Kagggle. Available online: https://www.kaggle.com (accessed on 20 July 2022).

Figure 1. The schematic diagram of the ST-ResNet [26] structure.

Figure 2. Regional distribution of Texas.

Figure 3. Electricity sales data for nine regions in Texas: (a–i) correspond to the nine regions in Texas.

Figure 4. Schematic diagram of convolution structure.

Figure 5. Comparison of CNN training results and standard calibration errors [31]: (a) training set error graph; (b) checkset error graph.

Figure 6. Schematic diagram of Resnet structure [26].

Figure 7. Comprehensive flow chart of electricity sales forecasting based on ST-ResNet.

Figure 8. Comparison of ultra-short-term forecasting.

Figure 9. Comparison of short-term forecasting: (a) MA; (b) ES; (c) X13; (d) RNN; (e) LSTM-NN; (f) GRU; (g) Seq2Seq; (h) ST-ResNet.

Figure 10. Comparison of medium-term forecasting.

Table 1. Dataset information.

Number of Region	Time Span		Time Interval	Data Size
Number of Region	Training Set	Testing Set	Time Interval	Training Set	Testing Set
1–9	1 January to 20 October 2018	20 October to 31 December 2018	1 h	7008	1752
External factors (holidays, weather, etc. )
holiday		115 days		2760 in each region
weather condition		16 types (sunny, rainy, etc. )		8760 in each region

Table 2. Comparison of ultra-short-term forecasting results.

Model	MAPE (%)	RMSE	MAD
MA	18.57	0.325684	0.321544
ES	17.43	0.324785	0.318452
X13	13.66	0.274454	0.269157
RNN	5.88	0.211036	0.207486
LSTM-NN	5.06	0.201369	0.194079
GRU	4.73	0.191927	0.182706
Seq2Seq	4.92	0.202349	0.189518
ST-ResNet	2.37	0.131596	0.121347

Table 3. Comparison of short-term forecasting results.

Model	MAPE (%)	RMSE	MAD
MA	20.47	0.356841	0.350157
ES	19.63	0.354894	0.345175
X13	15.37	0.301157	0.297979
RNN	6.74	0.238332	0.224852
LSTM-NN	5.54	0. 210006	0. 201157
GRU	5.32	0.209241	0.190052
Seq2Seq	5.48	0.212361	0.197651
ST-ResNet	2.87	0.133351	0.125134

Table 4. Comparison of medium-term forecasting results.

Model	MAPE (%)	RMSE	MAD
MA	23.04	0.390451	0.375464
ES	22.65	0.382464	0.377642
X13	17.44	0.331189	0.319548
RNN	7.61	0.250189	0.233588
LSTM-NN	6.02	0. 220259	0.211598
GRU	5.86	0.216654	0.199641
Seq2Seq	5.93	0.221742	0.204725
ST-ResNet	3.17	0.137614	0.129157

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, M.; Wang, J.; Sun, X.; Ren, Z.; Chai, H.; Yan, J.; Li, N. Short-Term and Medium-Term Electricity Sales Forecasting Method Based on Deep Spatio-Temporal Residual Network. Energies 2022, 15, 8844. https://doi.org/10.3390/en15238844

AMA Style

Cao M, Wang J, Sun X, Ren Z, Chai H, Yan J, Li N. Short-Term and Medium-Term Electricity Sales Forecasting Method Based on Deep Spatio-Temporal Residual Network. Energies. 2022; 15(23):8844. https://doi.org/10.3390/en15238844

Chicago/Turabian Style

Cao, Min, Jinfeng Wang, Xiaochen Sun, Zhengmou Ren, Haokai Chai, Jie Yan, and Ning Li. 2022. "Short-Term and Medium-Term Electricity Sales Forecasting Method Based on Deep Spatio-Temporal Residual Network" Energies 15, no. 23: 8844. https://doi.org/10.3390/en15238844

APA Style

Cao, M., Wang, J., Sun, X., Ren, Z., Chai, H., Yan, J., & Li, N. (2022). Short-Term and Medium-Term Electricity Sales Forecasting Method Based on Deep Spatio-Temporal Residual Network. Energies, 15(23), 8844. https://doi.org/10.3390/en15238844

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Short-Term and Medium-Term Electricity Sales Forecasting Method Based on Deep Spatio-Temporal Residual Network

Abstract

1. Introduction

2. Short-Term and Medium-Term Forecasting Model of Electricity Sales Based on ST-ResNet

2.1. Spatio-Temporal Attributes of Electricity Sales Forecasting

2.2. ST-ResNet Applied to Electricity Sales Forecasting

2.2.1. Convolution

2.2.2. Residual Unit [35]

2.2.3. External Component

2.2.4. Fusion

2.3. Comprehensive Process

3. Simulation and Experimental Verification

3.1. Experimental Environment

3.2. Data Introduction

3.3. Experimental Results

3.3.1. Comparison of Forecasting Results of Ultra-Short-Term Electricity Sales

3.3.2. Comparison of Forecasting Results of Short-Term Electricity Sales

3.3.3. Comparison of Medium-Term Electricity Sales Forecasting Results

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI