Deep ResNet-Based Ensemble Model for Short-Term Load Forecasting in Protection System of Smart Grid

Chen, Wenhao; Han, Guangjie; Zhu, Hongbo; Liao, Lyuchao; Zhao, Wenqing

doi:10.3390/su142416894

Open AccessArticle

Deep ResNet-Based Ensemble Model for Short-Term Load Forecasting in Protection System of Smart Grid

by

Wenhao Chen

¹,

Guangjie Han

^1,2,*

,

Hongbo Zhu

³,

Lyuchao Liao

¹

and

Wenqing Zhao

¹

School of Transportation, Fujian University of Technology, Fuzhou 350118, China

²

Department of Information and Communication System, Hohai University, Changzhou 213022, China

³

School of Information Science and Engineering, Shenyang Ligong University, Shenyang 110159, China

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(24), 16894; https://doi.org/10.3390/su142416894

Submission received: 1 November 2022 / Revised: 3 December 2022 / Accepted: 9 December 2022 / Published: 16 December 2022

Download

Browse Figures

Versions Notes

Abstract

:

Short-term load forecasting is a key digital technology to support urban sustainable development. It can further contribute to the efficient management of the power system. Due to strong volatility of the electricity load in the different stages, the existing models cannot efficiently extract the vital features capturing the change trend of the load series. The above problem limits the forecasting performance and creates the challenge for the sustainability of urban development. As a result, this paper designs the novel ResNet-based model to forecast the loads of the next 24 h. Specifically, the proposed method is composed of a feature extraction module, a base network, a residual network, and an ensemble structure. We first extract the multi-scale features from raw data to feed them into the single snapshot model, which is modeled with a base network and a residual network. The networks are concatenated to obtain preliminary and snapshot labels for each input, successively. Also, the residual blocks avoid the probable gradient disappearance and over-fitting with the network deepening. We introduce ensemble thinking for selectively concatenating the snapshots to improve model generalization. Our experiment demonstrates that the proposed model outperforms exiting ones, and the maximum performance improvement is up to 4.9% in MAPE.

Keywords:

short-term load forecasting; feature extraction; basic structure; residual network; ensemble thinking

1. Introduction

Due to the complexity of power grid structures and the difference in power consumption habits of different users, short-term load forecasting (STLF) can offer more accurate information to develop adaptation strategies, making it an important research topic in urban sustainable development [1]. STLF focuses on the forecasted value from one hour to several days in the future. Accurate load forecasting results also provide a solid foundation for rational pricing and renewable energy [1]. However, STLF is a complex task affected by external factors such as temperature and holidays [2]. Therefore, researchers have proposed different statistical learning-based methods for STLF.

The statistical learning-based methods often use handcrafted input in order to produce the forecasting result. Mbamalu et al. [3] dealt with electricity load forecasting using an Auto-regressive (AR) model. Huang et al. [4] further used the adaptive Auto-regressive Moving Average (ARMA) method for STLF. Both of the above methods need a stable input series. Specifically, the change trend of the input data presents a stationary pattern, and it is not significantly affected by external factors such as season, weekday, and temperature. Obviously, it is counterfactual with the collected signals. Thus, Contrera et al. [5] presented the Auto-regressive Integrated Moving Average (ARIMA), which has the obvious advantage of transferring the unstable input sequences into stable ones. Thus, it can mitigate the issue that the framework is relatively strict with the collected data.

Another conventional method begins with a module of feature extraction, which produces handcrafted or automatically extracted features to build the mapping between the inputs and outputs of the model. In the training step, all the weights of the model are constantly updated to learn from the training samples and their referenced labels. The trained model is validated on the data sets of other samples to test the performance on the classification/regression tasks. Ceperic et al. [6] developed a hybrid model, which consists of twelve support vector machines (SVMs) for a day-ahead load forecast. Chen et al. [7] designed a wavelet neural network which can capture load characteristics at different frequencies. Santis et al. [8] used Echo State Networks (ESN) and PCA decomposition for STLF to boost the forecasting performance. Bashir et al. [9] used neural networks based on adaptive strategies to improve the forecasting accuracy.

Although the traditional models achieve an acceptable forecasting performance, more accurate forecasting results are still required. Due to the nonlinear nature of the electricity load, it is hard to estimate precisely [10]. The deep learning method is a promising choice for STLF because of its high nonlinear approximation capacity. Specifically, Table 1 presents the comparison between the proposed method and existing ones in different aspects, including input variables and percentage error. Recently, more deep learning-based models are applied for STLF [11,12,13,14]. The Fully Connected Network (FCN) is the basis of all deep learning methods. Ding et al. [15] presented the FCN method to forecast the low-voltage electricity load and verify its accuracy. K. Chen et al. [16] designed a network that achieved a higher forecasting accuracy compared to the conventional STLF methods, and multiple temporal features were considered as the input of the LSTM network to achieve accurate forecasting results. Karim et al. [17] proposed an ensemble model combining the Bidirectional Long Short-Term Memory Network (Bi-LSTM) to forecast the hourly loads. However, the above methods cannot efficiently extract the features capturing the trend of the electricity from the input data, which results in the reduction of the generalization ability. As a solution, we design the multi-scale spatio-temporal sliding window to extract the load and temperature from input data. In addition, as the scale of network increases, several parameters had been set and optimized in the different fully connected layers [18]. Therefore, the models usually suffer from over-fitting and gradient disappearance, which are the main challenges in the applications of the deep learning method [19]. As a solution, Zhang et al. [20] designed a deep Residual Neural Network (ResNet) with the dual skip connections as the backbone of the model. Ko et al. [21] proposed a novel residual learning component by concatenating multiple residual networks and Bi-LSTM layers. Thus, we also adopt the improved residual network in our framework to relieve the potential forecasting accuracy degradation caused by the overfitting of the multiple stacked FCNs. In addition, the ensemble network can obtain more accurate forecasting load than a single one. Yao et al. [22] designed a neural network ensemble to forecast the hourly loads of the building. The test results demonstrate that ensemble thinking has the ability to promote the forecasting performance. However, ensemble thinking still encounters several problems. For example, the researchers cannot combine the accurate methods, and the training period is long [16,22]. These issues prevent the model from taking full advantage of ensemble thinking, which limits the forecasting performance. Thus, this paper designs three-stage ensemble thinking to relieve the problems. The contributions of the paper are as follows:

We propose an effective multi-scale spatio-temporal sliding window to extract the multi-scale load and temperature from the input data. The method can consider the changing trend of the load in the spatio-temporal dimensions and provide the diversified characteristics for the forecast framework to learn the complicated load patterns.
We design a novel ResNet-based deep learning framework to extract the multi-scale features which affect the electricity load during the load forecasting process. Different from the forecast framework mentioned in [16], the proposed base structure in this paper can further learn average input variables, capturing the changing trends of the electricity series. Moreover, we use three-stage ensemble thinking to reduce the entire training time while taking full advantage of the network.

The remainder of this paper is organized as follows: Section 2 shows the framework of the proposed method, which is composed of a feature extraction module, a base structure, a residual network, and an ensemble structure. Section 3 compares the proposed model with the existing ones using four benchmark datasets from North America, New England, Malaysia, and Panama. The conclusions and future work are given in Section 4.

2. Method

2.1. Overall Framework

The proposed framework is presented in Figure 1. First, the feature extraction module is designed with multiple preprocessing operations. As the input layer of the following base structure, its outputs can be fed in order to obtain a preliminary forecasted value. Subsequently, the deep residual network is introduced to generate an output for each snapshot. Ensemble thinking is then used to further promote the forecasting performance.

2.2. Feature Extraction

The appropriate input features are crucial for improving the forecasting results [2]. However, the input variables of the existing methods cannot effectively represent the change trend of the electricity load. Thus, we design the efficient multi-scale spatio-temporal sliding window to extract the essential features from input data. Table 2 presents a series of multi-scale input data including load, temperature and calendar features, after the preprocessing operations of a feature extraction module. The value of h is calculated hourly, since this paper hourly studies the electricity load. More precisely,

P_{h}

represents the forecasted value at the

h^{t h}

hour.

I n p u t (h)

consists of input data with different time steps for forecasting

P_{h}

, which can be listed as follow:

[\begin{matrix} P_{h}^{hour} (h - 1), \dots, P_{h}^{hour} (h - 24) \\ P_{h}^{day} (h - 24), \dots, P_{h}^{day} (h - 168) \\ P_{h}^{week} (h - 168), \dots, P_{h}^{week} (h - 672) \\ P_{h}^{month} (h - 672), P_{h}^{month} (h - 1344), P_{h}^{month} (h - 2016) \\ P_{M}^{day} (h), P_{M}^{week} (h), P_{M}^{month} (h) \\ T_{h}^{day} (h - 24), \dots, T_{h}^{d a y} (h - 168) \\ T_{h}^{week} (h - 168), \dots, T_{h}^{week} (h - 672) \\ T_{h}^{month} (h - 672), T_{h}^{month} (h - 1344), T_{h}^{month} (h - 2016) \\ T_{M}^{d a y} (h), T_{M}^{week} (h), T_{M}^{month} (h), T_{h} \\ Season (h), Weekday (h), Holiday (h) \end{matrix}] .

(1)

P_{h}^{h o u r} (h - 1)

,

P_{h}^{h o u r} (h - 2)

, …,

P_{h}^{h o u r} (h - 24)

are the electricity loads for the past 24 h. When the unavailable values occur in

P_{h},

during the forecast period, the forecasted values are used to replace them, which can help the model correlate the forecasts throughout the day.

P_{h}^{d a y} (h - 24)

,

P_{h}^{d a y} (h - 48)

, …,

P_{h}^{d a y} (h - 168)

can capture the short-run trend of the electricity load in a week. More precisely,

P_{h}^{w e e k} (h - 24)

, …,

P_{h}^{w e e k} (h - 672)

,

P_{h}^{m o n t h} (h - 672)

, …,

P_{h}^{m o n t h} (h - 2016)

demonstrate the trend of the electricity load. The average load features such as

P_{M}^{d a y} (h)

,

P_{M}^{w e e k} (h)

, and

P_{M}^{m o n t h} (h)

are also added to learn the uncertain variation of long/short-term trends in the electricity series. As the temperature is crucial for load forecasting, the related temperature features at different time steps are also added.

T_{h}^{d a y} (h - 24)

,

T_{h}^{d a y} (h - 48)

, …,

T_{h}^{d a y} (h - 168)

can help the proposed method identify temperature trends within the week, as well as their impact on the short-term electricity load.

T_{h}^{w e e k} (h - 168)

, …,

T_{h}^{w e e k} (h - 672), T_{h}^{m o n t h} (h - 672)

, …,

T_{h}^{m o n t h} (h - 2016)

can learn the impact of the long-term temperature variation trend on the electricity series.

T_{M}^{d a y} (h)

,

T_{M}^{w e e k} (h)

, and

T_{M}^{m o n t h} (h)

are also used to capture uncertain long/short change temperature trends. In addition, we add the calendar data including

S e a s o n (h)

,

W e e k d a y (h)

, and

H o l i d a y (h)

. They can help the model capture the periodic and special features in the load series.

2.3. Base Structure

Figure 2 illustrates the base structure which is used to obtain a preliminary forecasted value. Due to its unique framework, the base structure can make the forecasted value at the current moment as an input of the following hour, automatically adjusting the forecast for each hour of the day. More precisely, for

P_{M}^{d a y}

,

T_{M}^{d a y}

,

P_{M}^{w e e k}

,

T_{M}^{w e e k}

,

P_{M}^{m o n t h},

and

T_{M}^{m o n t h}

, the pairs

[P_{M}^{d a y}; T_{M}^{d a y}]

,

[P_{M}^{w e e k}; T_{M}^{w e e k}]

, and

[P_{M}^{m o n t h}; T_{M}^{m o n t h}]

are connected with three fully connected layers. PreO1 is then obtained by passing these pairs through the fully connected layer. After using a similar connection for

P_{h}^{d a y}

,

T_{h}^{d a y}

,

P_{h}^{w e e k}

,

T_{h}^{w e e k}

,

P_{h}^{m o n t h}

, and

T_{h}^{m o n t h}

, PreO2 is achieved. Furthermore, PreO3 is the output of [

S e a s o n

,

W e e k d a y

,

H o l i d a y

] through the fully connected layer. PreO1, PreO2, and PreO3 are concatenated to obtain PreO4. Finally, the base structure concatenates PreO4,

T_{h}^{d a y}

, and

P_{h}^{h o u r}

for the hourly preliminary forecast load.

The activation function used in the base structure can perform the nonlinear transformation of the input features. Duvenaud et al. [23] propose a scaled exponential linear units (SeLU) activation function, which mainly relieves the issue of vanishing or exploding gradients. Section 3 presents the forecasting performance of other activation functions. Finally, SeLU [24] and Linear are used to combine the activation functions for the proposed structure, as shown in Figure 2.

2.4. Deep Residual Network

Although the stacked layers help boost the forecasting performance of the deep neural networks (DNNs), other issues may occur [25]. For instance, in the early stages of DNN development, gradient explosion/vanishment can negatively affect the convergence of the neural network. As DNN continues to develop, degradation and over-fitting also occur.

Therefore, residual learning is used to overcome these limitations. More precisely, residual learning mainly consists of adding shortcut connections to the network, as presented in Figure 3. In DNN, the neural network is given by:

Z (x) = F (x)

(2)

where

F (x)

is the internal calculation of the network,

Z (x)

denotes the forecasted output, and

x

is its input.

Z (x) = F (x) + x

(3)

where

x

denotes the shortcut which is used to connect the input and output.

When the network consists of

m

stacked layers, the DNN learns the relationship between

x

and

Z_{m}

for keeping

Z_{m}

close to

x

. Simultaneously, the m stacked layers are connected by shortcuts to form a deep residual network. The weights of the

i^{t h}

stacking layer are optimized so that

Z_{i}

−

Z_{i - 1}

to 0,

i

= 1, 2, 3, …

m

. Therefore, the residual network outperforms the DNN in terms of weight optimization. Finally, the output of the residual network is calculated as:

Z (x) = x_{0} + \sum_{i = 1}^{m} F (x_{i - 1})

(4)

Several studies aim at improving the original ResNet and achieving high forecasting performance [25,26,27]. Thus, the improved residual network (MResNet) [16] is introduced into the proposed model to generate an output for each snapshot. Specifically, the improved ResNet in this paper consists of 30 residual blocks, which include main and side residual blocks, as presented in Figure 4. The structure of the residual blocks used is shown in Figure 3. The amount of the hidden nodes with hidden layer of each residual block is set to 20. The shortcut path connects the input and output of every 6 residual blocks. The green dots are used to average the inputs from the main and side residual blocks. The averaged values are connected to the following main residual blocks. The additional residual blocks and shortcut connections can efficiently learn the nonlinear relationship between input and output. Thus, the model can capture the crucial information from the input features, which is crucial for boosting the load forecasting accuracy.

2.5. Ensemble Structure

The domain of machine learning admits that an ensemble network can obtain higher forecasting efficiency than a single one [28]. This paper designs three-stage ensemble thinking to relieve these problems. Several snapshots are first saved during the training stage of an individual method, which is the basis for integrating accurate models. The individual model is optimized using an Adam (also referred to as adaptive moment estimation [29]) solver. Accordingly, the individual model can set the optimal learning rate at different training stages. Afterwards, the used approach is similar to the snapshot model presented in [30]. More precisely, the snapshots are taken in the training stage of the individual model when the loss is stable. The second stage of the used ensemble thinking consists of separately training several individual models, which offers the possibility of finding more accurate models. In [16], the model parameters are reinitialized to generate the individual models.

Finally, more accurate snapshot models are integrated to shorten the training time. The individual methods and snapshots taken in the training are considered the hyper-parameters. After confirming the best hyper-parameters, the final forecast result is considered as the average output from all the snapshots. Section 3 details the process of confirming the hyper-parameters.

3. Results

3.1. Test Settings

In this paper, the outperformance of the model is validated on four publicly available datasets: the North American dataset, the New England dataset, the Malaysian dataset, and the Panama dataset. The data comprises three parts: timestamp, temperature, and load information. The actual temperature and load are used as input to evaluate the generalization ability. Furthermore, we detail the impact of the modified temperature on the efficiency. Note that the error values of the compared models are from the results of the related publications.

We use the Adam optimizer [29] with an initial value of 0.001 to train the model. In addition, the proposed method is deployed in Python 3.6 based on TensorFlow 2.10 and Keras 2.3.0. This study uses a distant version for a fair comparison with the existing methods. The model is trained using an Intel Core i7-4500U CPU and 16GB of memory.

3.2. Results of the North American Dataset

The North American dataset includes hourly electricity load and temperature from 1 January 1988 to 12 October 1992. The period of the training set is 1 January 1988 to 31 December 1990. The hourly electricity load for the time ranges uses 1 January 1991 to 12 October 1992 as a testing set, while the validation set is from 1 January 1990 to 31 December 1990. This section first details the confirmation process of the hyper-parameters in three-stage ensemble thinking. It can help the model generate accurate snapshot models while reducing the training time. In addition, we confirm the best combination of the activation functions, which is essential for the improvement of the forecasting performance. Then, we detail the simulation results between the proposed method and the existing ones.

The tuned hyper-parameters include the number of individual methods and snapshot models taken during the training stage. This paper considers the case in which the number of times the individual model is trained varies between 1 and 5. In addition, when epoch = 1200, 1250, 1300, 1350, 1400, 1450, and 1500, snapshot models are saved. The optimal hyper-parameters are selected using four performance metrics including Mean absolute percentage error (MAPE), Mean absolute error (MAE), Root mean squared error (RMSE), and Mean squared error (MSE), as shown in Figure 5. Therefore, four snapshots are separately saved in the training stages between 1200 to 1500 epochs of the three individual models. Finally, the final forecast result is an average value of twelve snapshot outputs.

The activation functions applied to the base structure’s input and the output connection layers are determined after comparing the electricity load forecast results for the conventional activation function, FinalLinear and FinalSeLU. FinalLinear uses Linear to activate the output connected layer and SeLU for the input connected layer, and vice versa for FinalSeLU. Figure 6 presents the comparative results of MAE with different activation functions. For the conventional activation function, SeLU has the lowest average MAE. In addition, Linear records the lowest MAE in the experiment, although the deviation of the forecast results is significant. FinalLinear is used in the base structure and can combine the advantages of both SeLU and Linear. The MAE of FinalLinear is lower than that of SeLU. Moreover, the best forecasting performance of FinalLinear is higher than that of Linear, and the deviation of the results is lower than SeLU.

The comparison of MAPEs, including the proposed method and existing ones, are presented in Table 3. In SVR-PSSA, the inputs are automatically selected using the feature selection approach. The particle swarm search approach (PSSA) is applied to tune the hyper-parameters of SVR [6]. Compared with SVR-PSSA, SVR-CLPS is a SVR model based on comprehensive learning particle swarm (CLPS), which simultaneously performs parameter optimization and input feature selection [31]. The proposed method is also compared with a series of STLF methods based on wavelet transform. WT-ANN comprises the wavelet transform (WT) and ANN [32]. Specifically, the frequency components obtained by WT are inputs of the multiple ANNs to forecast the electricity load. WT-EANN is the hybrid forecast model, which consists of the wavelet transform (WT), evolutionary algorithm (EA), and neural network (NN) [33]. In WT-EANN, the WT can decompose the related electricity series into various frequency components that are forecasted by the combination of the evolutionary algorithm (EA) and the neural network (NN). The inverse wavelet transform can obtain the hourly forecast result. MABCA-WT-ELM denotes the ensemble model, which consists of a modified artificial bee colony algorithm (MABCA), WT, and an extreme learning machine (ELM) [34]. Finally, ESN represents the echo state network (ESN)-based STLF model [35]. Table 3 presents the forecasting results of the baseline models, extracted from their corresponding studies. The proposed method reaches the best MAPE, which demonstrates the model’s outperformance in electricity load forecasting.

A Gaussian noise with a mean value of 0 and a standard deviation of 1 is added to the temperature, in order to test the impact of the noisy temperature on the performance. Table 3 presents the forecast results obtained by different models. The comparison demonstrates that the proposed method achieves a high generalization capacity in the case of noisy temperatures. Figure 7 presents the forecast result within one day.

3.3. Results of the New England Dataset

In this experiment, the New England dataset collected from 1 March 2003 to 31 December 2014 is applied for simulation. The proposed method with the same hyper-parameters is tested on three cases. The training set of the first case is from 1 March 2003 to 31 December 2005. The testing data is collected during 1 January 2006 and 31 December 2006. In addition, the hourly data from 1 January 2004 to 31 December 2007 are applied as the training set of the second case, while its test set is from 1 January 2008 to 31 December 2009. Finally, the third case is applied to test the forecasted value from 1 January 2010 to 31 December 2011, and the period of the corresponding training set is 1 January 2004 to 31 December 2009. Note that the hyper-parameters are tuned using the validation set from 1 March 2003 to 31 December 2003, as shown in Figure 8. More precisely, four snapshots are separately saved during the training stages between 1300 to 1600 epochs of the two separate models. In addition, the network still uses FinalLinear to combine the activation function.

The proposed method is compared with several modified residual networks in the first case, such as WRN [36], CDRN [37], and MResNet [16]. WRN represents a ResNet-based wide residual network (WRN) which has better efficiency and accuracy than the conventional deep residual network [36]. CDRN is a convolution structure-based deep residual network (CDRN), which consists of optimal hyper-parameters and a base structure confirmed by multiple experiments, reaching accurate forecast results [37]. MResNet is a modified residual network (MResNet) in which added shortcuts can boost the generalization ability [16]. The comparison of Table 4 demonstrates that the proposed method obtains the best forecasted results. Concretely, the ensemble model improves the forecast results by 8.6%, 11.7%, and 20.5% compared with the previous models.

We further compare the generalization ability of the proposed model with existing ones such as SIWNN [7], MABC-WT-ELM [34], and ELM-PLSR-WT [38]. SIWNN first selects the electricity load of the similar days as the input. Then, the wavelet decomposition technology is used to decompose the input data into the various frequency components. They are fed to the individual model to produce the forecast result [7]. ELM-PLSR-WT is an ensemble which consists of ELM, partial least squares regression (PLSR), and WT [38]. The forecast results of MAPEs regarding each month are presented in Figure 9. It is observed that the proposed model produces lower forecast errors in most months. Thus, the results indicate that the proposed method can achieve the high generalization capability. Figure 10 illustrates the one-day forecast results and their actual value to display the load data.

Table 5 presents the forecast results of the second case, that is, a comparison between the MAPE, MAE, and RMSE obtained by the existing models and the proposed model. The multi-objective algorithm based on the Follow The Leader algorithm (MOFTLA) decreases the electricity forecast error [39]. BooNN is an ensemble model which consists of multiple ANNs [40]. BNNS is a STLF method based on the bagged neural network (BNN). In BNNS, the final forecast result is obtained by averaging the output from multiple neural networks trained in datasets, which decreases the forecasting errors [41]. NN-EA is an integrated model combining a neural network (NN) and an evolutionary algorithm (EA) [42]. The comparison demonstrates that the proposed method obtains the best MAPE and RMSE during the test period from 2008 to 2009.

Table 6 presents a comparison between ErrCor-RBF [43], MRBF [44], ELM-PLSR-WT [43], and the proposed method. ErrCor-RBF is a radial basis function (RBF)-based offline algorithm [43]. One RBF is supplemented to fit the load in each training epoch of the error correction (ErrCor) algorithm, which eliminates the peak error. Compared with ErrCor-RBF, in MRBF [40], the RBF is trained by machine learning methods such as ELM, SVR, and ErrCor. ERN denotes an integrated network based on ResNet [45]. Experiment results of the MAPE performance index show that the proposed ensemble model achieves higher accuracy than existing models.

3.4. Results of the Malaysian Dataset

The Malaysian dataset is applied to evaluate the forecasting efficiency. The training set is from 1 March 2003 to 31 December 2005, while the testing set is from 1 January 2006 to 31 December 2006.The proposed model comprises twelve snapshots taken between 1200 to 1500 epochs. SeLU and Linear are the activation functions of the base structure.

Table 7 presents a comparison between existing deep learning methods, traditional methods and the proposed model. SVM is applied to evaluate the complexity of STLF, since it is a reasonably simple baseline. Moreover, the proposed method is compared with several existing deep residual network methods, such as WRN [36], CDRN [37], and MResNet [16].

The results shows that the proposed model produces the best MAPE. Concretely, the MAPE index of the proposed method is 4.19%. The existing convolutional network approaches perform significantly worse than the proposed method in terms of forecasting. As shown in rows 2 and 5 of Table 7, Refs. [36,37] correspond to MAPE values of 5.25% and 4.41%, respectively. Furthermore, the proposed model outperforms MResNet [16] and FCN [16]. The reason is that the input data of the proposed method comprises the multi-scale input features, which can learn the short-term change of the load series. Moreover, it demonstrates that the residual network can overcome the over-fitting by the comparative results between MResNet [16] and FCN. Figure 11 presents the 24-h electricity forecast and actual value, clearly illustrating the Malaysian dataset.

3.5. Results of the Panama Dataset

The Panama dataset is applied to further evaluate the forecasting accuracy. Specifically, the dataset spans 3 January 2015 to 27 June 2020 and includes the hourly electricity load and temperature. In addition, the model is trained using data from January 2015 to May 2019, and its testing set includes hourly data from June 2019 to June 2020. The model consists of twelve snapshots for thoroughly evaluating the forecasting accuracy. Its activation functions are also SeLU and Linear.

The forecasting performance is compared with deep learning methods such as CNN, LSTM, and Bi-GRU. LSTM, CNN, and Bi-GRU are standard components applied in the STLF model. Thus, these networks are considered for basic comparisons. Note that the electricity loads and temperature data for the past 24 h are used as input to the above models to forecast the load for the following day. The parameters of the above model are described as follows: (1) CNN: The model has a convolutional layer and four fully connected layers. The amount of convolutional filters is 8, and the number of the convolutional kernel is 1. The number of neurons in the fully connected layer is set to 32/16/8/1. (2) LSTM: The model consists of two hidden layers, and the number of hidden units is set to 16/16. (3) Bi-GRU: The model has two hidden layers, and the number of the corresponding hidden nodes is set to 16/16. However, the STLF method based on CNN should follow the assumption of data space invariance, which is incompatible with the collected load data. DCN [42] is the unshared convolution-based densely connected network, which efficiently relieves this issue. Accordingly, DCN [46] is suitable for comparison with the proposed model.

In Table 8, we can see the proposed ensemble model achieves higher accuracy compared with the existing models. The MAE index is 1.73%. However, the traditional deep learning method produces far inferior forecast results. It can be seen from rows 1, 2, 3, and 4 of Table 8 that MAE corresponds to 4.23%, 4.15%, 3.98%, and 1.89% for CNN, LSTM, Bi-GRU, and DCN [46], respectively. The reason is that the ensemble thinking of the proposed model can selectively integrate the accurate snapshots, which is essential for the improvement of the forecasting performance. Moreover, the proposed method outperforms the STLF model based on a non-shared convolutional network [46]. The reason is that the adopted residual network can efficiently relieve the over-fitting. Consequently, the proposed model has a high performance in deterministic electricity forecasting. Figure 12 presents the forecast results within one day.

4. Conclusions

This paper designs an innovative method with multiple snapshots to forecast the loads of the next 24 h. Each snapshot is output from a network with the fully connected layers and residual blocks. Also, we give an optimal combination of activation functions to further boost the forecasting performance. We present the results on four benchmark datasets and compare them with those of the other mainstream schemes. The experiment results show that the proposed method obtains better forecasting ability than the existing ones by almost 0.6–4.9%.

In future work, we aim at introducing LSTM or Bi-LSTM into the STLF model to relieve the issue of forecasting performance degradation caused by distribution gaps among the electricity load during different periods. Due to the robust nature of feature extraction, CNNs or TCNs are also considered to be the backbone of our further work.

Author Contributions

Methodology, W.C. and W.Z.; Formal analysis, W.C.; writing—original draft preparation, W.C.; writing—review, H.Z. and L.L.; supervision, G.H. All authors have read and agreed to the published version of the manuscript.

Funding

Fujian Key Lab for Automotive Electronics and Electric Drive, Fujian University of Technology, 350118, China. This work is supported by the project of the Fujian University of Technology, No. GY-Z19066.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Public datasets were adopted in this study. In addition, the data can be found here: https://www.iso-ne.com/ (accessed on 30 October 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Nassif, A.B.; Soudan, B.; Azzeh, M.; Attilli, I.; Almulla, O. Artificial intelligence and statistical techniques in short-term load forecasting: A review. arXiv 2021, arXiv:2201.00437. [Google Scholar] [CrossRef]
Almeshaiei, E.; Soltan, H. A methodology for electric power load forecasting. Alex. Eng. J. 2011, 50, 137–144. [Google Scholar] [CrossRef] [Green Version]
Mbamalu, G.; El-Hawary, M. Load forecasting via suboptimal seasonal autoregressive models and iteratively reweighted least squares estimation. IEEE Trans. Power Syst. 1993, 8, 343–348. [Google Scholar] [CrossRef]
Shyh-Jier, H.; Kuang Rong, S. Short-term load forecasting via ARMA model identification including non-Gaussian process considerations. IEEE Trans. Power Syst. 2003, 18, 673–679. [Google Scholar] [CrossRef] [Green Version]
Contreras, J.; Espinola, R.; Nogales, F.J. ARIMA models to predict next-day electricity prices. IEEE Trans. Power Syst. 2003, 18, 1014–1020. [Google Scholar] [CrossRef]
Ceperic, E.; Ceperic, V.; Baric, A. A strategy for short-term load forecasting by support vector regression machines. IEEE Trans. Power Syst. 2013, 28, 4356–4364. [Google Scholar] [CrossRef]
Chen, Y.; Luh, P.B.; Guan, C.; Zhao, Y. Short-term load forecasting: Similar day-based wavelet neural networks. IEEE Trans. Power Syst. 2010, 25, 322–330. [Google Scholar] [CrossRef]
Bianchi, F.M.; De Santis, E.; Rizzi, A.; Sadeghian, A. Short term electric load forecasting using echo state networks and PCA decomposition. IEEE Access 2015, 3, 1931–1943. [Google Scholar] [CrossRef]
Bashir, Z.A.; El-Hawary, M.E. Applying wavelets to short-term load forecasting using PSO-based neural networks. IEEE Trans. Power Syst. 2009, 24, 20–27. [Google Scholar] [CrossRef]
Arif, A.; Wang, Z.; Wang, J. Load modeling—A review. IEEE Trans. Smart Grid 2018, 9, 5986–5999. [Google Scholar] [CrossRef]
Xia, Z.; Ma, H.; Saha, T.K.; Zhang, R. Consumption scenario-based probabilistic load forecasting of single household. IEEE Trans. Smart Grid 2022, 13, 1075–1087. [Google Scholar] [CrossRef]
Dudek, G.; Pełka, P.; Smyl, S. A hybrid residual dilated LSTM and exponential smoothing model for midterm electric load forecasting. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 2879–2891. [Google Scholar] [CrossRef] [PubMed]
Ziel, F. Smoothed Bernstein Online Aggregation for Short-Term Load Forecasting in IEEE DataPort Competition on Day-Ahead Electricity Demand Forecasting: Post-COVID Paradigm. IEEE Open Access J. Power Energy 2022, 9, 202–212. [Google Scholar] [CrossRef]
Jiao, R.; Zhang, T.; Jiang, Y.; He, H. Short-term non-residential load forecasting based on multiple sequences LSTM recurrent neural network. IEEE Access 2018, 6, 59438–59448. [Google Scholar] [CrossRef]
Ding, D.; Benoit, C.; Foggia, G.; Bsanger, Y.; Wurtz, F. Neural network-based model design for short-term load forecast in distribution systems. IEEE Trans. Power Syst. 2016, 31, 72–81. [Google Scholar] [CrossRef]
Chen, K.; Wang, Q.; He, Z.; Hu, J.; He, J. Short-term load forecasting with deep residual networks. IEEE Trans. Smart Grid 2019, 10, 3943–3952. [Google Scholar] [CrossRef] [Green Version]
Karim, M.E.; Maswood, M.M.S.; Das, S.; Alharbi, A.G. BHyPreC: A Novel Bi-LSTM Based Hybrid Recurrent Neural Network Model to Predict the CPU Workload of Cloud Virtual Machine. IEEE Access 2021, 9, 131476–1314951. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; Massachusetts Institute of Technology Press: Cambridge, MA, USA, 2016. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Zhang, Z.; Zhao, P.; Wang, P.; Lee, W.-J. Transfer Learning Featured Combining Short-Term Load Forecast with Small-Sample Conditions. In Proceedings of the IEEE Conference on Industry Applications Society Annual Meeting (IAS), Vancouver, BC, Canada, 10–14 October 2021; pp. 1–8. [Google Scholar]
Ko, M.S.; Lee, K.; Kim, J.K. Deep concatenated residual network with bidirectional LSTM for one hour-ahead wind power forecasting. IEEE Trans. Sustain. Energy 2021, 12, 1321–1335. [Google Scholar] [CrossRef]
De Felice, M.; Yao, X. Short-term load forecasting with neural network ensembles: A comparative study [application notes]. IEEE Comput. Intell. Mag. 2011, 6, 47–56. [Google Scholar] [CrossRef]
Duvenaud, D.; Rippel, O.; Adams, R.; Ghahramani, Z. Avoiding pathologies in very deep networks. arXiv 2014, arXiv:1402.5836. [Google Scholar]
Klambauer, G.; Unterthiner, T.; Mayr, A.; Hochreiter, S. Selfnormalizing neural networks. arXiv 2017, arXiv:1706.02515. [Google Scholar]
Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Zhang, K.; Sun, M.; Han, T.X. Residual networks of residual networks: Multilevel residual networks. IEEE Trans. Circuits Syst. Video Technol. 2017, 28, 1303–1314. [Google Scholar] [CrossRef]
Cao, Z.; Wan, C.; Zhang, Z.; Li, F.; Song, Y. Hybrid ensemble deep learning for deterministic and probabilistic low-voltage load forecasting. IEEE Trans. Power Syst. 2020, 35, 1881–1897. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Huang, G.; Li, Y.; Pleiss, Y. Snapshot ensembles: Train 1, get M for free. arXiv 2017, arXiv:1704.00109. [Google Scholar]
Hu, Z.; Bao, Y.; Xiong, T. Comprehensive learning particle swarm optimization based memetic algorithm for model selection in short term load forecasting using support vector regression. Appl. Soft Comput. 2014, 25, 15–25. [Google Scholar] [CrossRef]
Reis, A.R.; Silva, A.A.D. Feature extraction via multiresolution analysis for short-term load forecasting. IEEE Trans. Power Syst. 2005, 20, 189–198. [Google Scholar]
Amjady, N.; Keynia, F. Short-term load forecasting of power systems by combination of wavelet transform and neuro-evolutionary algorithm. Energy 2009, 34, 46–57. [Google Scholar] [CrossRef]
Li, S.; Wang, P.; Oel, L. Short-term load forecasting by wavelet transform and evolutionary extreme learning machine. Electr. Power Syst. Res. 2015, 122, 96–103. [Google Scholar] [CrossRef]
Deihimi, A.; Showkati, H. Application of echo state networks in short-term electric load forecasting. Energy 2012, 39, 327–340. [Google Scholar] [CrossRef]
Zagoruyko, S.; Komodakis, N. Wide residual networks. arXiv 2016, arXiv:1605.07146. [Google Scholar]
Sheng, Z.; Wang, H.; Chen, G.; Zhou, B.; Sun, J. Convolutional residual network to short-term load forecasting. Appl. Intell. 2021, 51, 2485–2499. [Google Scholar] [CrossRef]
Li, S.; Goel, L.; Wang, P. An ensemble approach for short-term load forecasting by extreme learning machine. Appl. Energy 2016, 70, 22–29. [Google Scholar] [CrossRef]
Singh, P.; Dwivedi, P. A novel hybrid model based on neural network and multi-objective optimization for effective load forecast. Energy 2019, 182, 606–622. [Google Scholar] [CrossRef]
Khwaja, A.S.; Zhang, X.; Anpalagan, A.; Venkatesh, B. Boosted neural networks for improved short-term electric load forecasting. Electr. Power Syst. Res. 2017, 143, 431–437. [Google Scholar] [CrossRef]
Khwaja, A.S.; Naeem, M.; Anpalagan, A.; Venetsanopoulos, A.; Venkatesh, B. Improved short-term load forecasting using bagged neural networks. Electr. Power Syst. Res. 2015, 125, 109–115. [Google Scholar] [CrossRef]
Singh, P.; Dwivedi, P. Integration of new evolutionary approach with artificial neural network for solving short term load forecast problem. Appl. Energy 2018, 217, 537–549. [Google Scholar] [CrossRef]
Yu, H.; Reiner, P.D.; Xie, T.; Bartczak, T.; Wilamowski, B.M. An incremental design of radial basis function networks. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 1793–1803. [Google Scholar] [CrossRef]
Cecati, C.; Kolbusz, J.; Siano, P.; Wilamowski, B.M. A novel RBF training algorithm for short-term electric load forecasting and comparative studies. IEEE Trans. Ind. Electron. 2015, 62, 6519–6529. [Google Scholar] [CrossRef]
Xu, Q.; Yang, X.; Huang, X. Ensemble residual networks for short term load forecasting. IEEE Access 2020, 8, 64750–64759. [Google Scholar] [CrossRef]
Li, Z.; Li, Y.; Liu, Y. Deep learning based densely connected network for load forecasting. IEEE Trans. Power Syst. 2021, 36, 2829–2840. [Google Scholar] [CrossRef]

Figure 1. The proposed load forecast model consisting of feature extraction, base structure, residual network, and ensemble structure.

Figure 2. The base structure in the proposed model.

Figure 3. The residual block of the deep residual network.

Figure 4. The illustration of the improved deep residual network.

Figure 5. The variations of Mean absolute percentage error (MAPE) (a), Mean absolute error (MAE) (b), Root mean squared error (RMSE) (c), and Mean squared error (MSE) (d) with different hyper-parameters on the 1998 North American dataset.

Figure 6. Comparison of MAEs by using different activation functions.

Figure 7. Comparison of forecasted results and actual electricity data within 24 h on the North American dataset.

Figure 8. (a) MAPE, (b) MAE, (c) RMSE, and (d) MSE corresponding to different hyper-parameters in the 2003 New England dataset.

Figure 9. Comparison of monthly forecast results in 2006 on the New England dataset.

Figure 10. Comparison of forecast results and actual load within 24 h on the New England dataset.

Figure 11. Comparison of forecast result and actual load within 24 h on the Malaysian dataset.

Figure 12. Comparison of forecast result and actual load within 24 h on the Panama dataset.

Table 1. Comparison of the proposed STLF method with existing ones in the literature.

References	Method	Input Data	Percentage Error
[3]	AR	Time, load	23.15%
[4]	ARMA	Temperature, load	9.89%
[6]	SVMs	Wind speed, load	5.56%
[8]	ESN, PCA	Load	11.64%
[14]	LSTM	Time, load	7.04%
[15]	Bi-LSTM	Temperature, load	10.3%
Proposed model	FCN, Residual Network, and Ensemble thinking	Season, time, multi-scale load, and temperature	4.19%

Table 2. Input variables for the load forecast of the

h^{t h}

hour.

Table 2. Input variables for the load forecast of the

h^{t h}

hour.

Symbol	Size	Description of the Inputs
$P_{h}^{h o u r}$	24	Powers within 24 h before the $h^{t h}$ hour
$P_{h}^{d a y}$	7	Powers of the $h^{t h}$ hour of every day in the past 7 days
$P_{h}^{w e e k}$	4	Powers of the $h^{t h}$ hour of the 7, 14, 21, and 28 days before the forecasted day
$P_{h}^{m o n t h}$	3	Powers of the $h^{t h}$ hour of 28, 56, and 84 days before the forecasted day
$P_{M}^{d a y}$	1	The average load of the forecasted hour within a week
$P_{M}^{w e e k}$	1	The electricity load obtained by averaging the $P_{M}^{w e e k}$
$P_{M}^{m o n t h}$	1	The electricity load obtained by averaging the $P_{h}^{m o n t h}$
$T_{h}$	1	The temperature of the forecasted hour
$T_{h}^{d a y}$	7	Temperatures of the $h^{t h}$ hour of every day in the past 7 days
$T_{h}^{w e e k}$	4	Temperatures of the $h^{t h}$ hour of the 7, 14, 21, and 28 days before the forecasted day
$T_{h}^{m o n t h}$	3	Temperatures of the $h^{t h}$ hour of 28, 56, and 84 days before the forecasted day
$T_{M}^{d a y}$	1	The average temperture of the forecasted hour within a week
$T_{M}^{w e e k}$	1	The temperature obtained by averaging the $T_{h}^{w e e k}$
$T_{M}^{m o n t h}$	1	The temperature obtained by averaging the $T_{h}^{m o n t h}$
$S e a s o n$	4	One-hot encoding for season
$W e e k d a y$	2	One-hot encoding for weekday/weekend
$H o l i d a y$	2	One-hot encoding for holiday

Table 3. Performance comparison of MAPEs on the data of North American dataset during 1991 and 1992.

Method	Actual Temperature	Noisy Temperature
WT-ANN [32]	2.64	2.84
WT-EANN [33]	2.04	-
ESN [35]	2.37	2.53
SVR-PSSA [6]	1.99	2.03
MABCA-WT-ELM [34]	1.87	1.95
SVR-CLPS [31]	1.80	1.85
Proposed model	1.77	1.82

Table 4. Performance comparison of MAPEs on the data of New England dataset in 2006.

Method	MAPE (%)
WRN [36]	2.64
FCN [16]	2.04
MResNet [16]	2.37
CDRN [37]	1.99
Proposed model	1.62

Table 5. Performance comparison between MAPEs, RMSEs, and MAEs on the data of New England dataset during 2008 and 2009.

Method	MAPE (%)	RMSE (Mwh)	MAE (Mwh)
NN-EA [42]	-	651.8	458.4
MOFTLA [39]	3.07	594.3	458.16
BooNN [40]	1.79	-	-
BNNS [41]	1.75	-	-
Proposed model	1.74	396.5	261.6

Table 6. Performance comparison of MAPEs on the data of North American dataset during 1991 and 1992.

Method	2010	2011
ErrCor-RBF [43]	1.80	2.02
MRBF [44]	1.75	1.98
ELM-PLSR-WT [43]	1.50	1.80
MResNet [16]	1.50	1.64
ERN [45]	1.46	1.54
Proposed model	1.45	1.51

Table 7. Performance comparison of MAPEs on the data of Malaysia dataset in 2006.

Method	MAPE (%)
SVM	11.31
WRN [36]	5.25
FCN [16]	4.69
MResNet [16]	4.59
CDRN [37]	4.41
Proposed model	4.19

Table 8. Performance comparison of the Panama to MAE values.

Method	$MAE (\times 10^{2})$
CNN	4.23
LSTM	4.15
Bi-GRU	3.98
DCN [46]	1.89
Proposed model	1.73

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, W.; Han, G.; Zhu, H.; Liao, L.; Zhao, W. Deep ResNet-Based Ensemble Model for Short-Term Load Forecasting in Protection System of Smart Grid. Sustainability 2022, 14, 16894. https://doi.org/10.3390/su142416894

AMA Style

Chen W, Han G, Zhu H, Liao L, Zhao W. Deep ResNet-Based Ensemble Model for Short-Term Load Forecasting in Protection System of Smart Grid. Sustainability. 2022; 14(24):16894. https://doi.org/10.3390/su142416894

Chicago/Turabian Style

Chen, Wenhao, Guangjie Han, Hongbo Zhu, Lyuchao Liao, and Wenqing Zhao. 2022. "Deep ResNet-Based Ensemble Model for Short-Term Load Forecasting in Protection System of Smart Grid" Sustainability 14, no. 24: 16894. https://doi.org/10.3390/su142416894

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep ResNet-Based Ensemble Model for Short-Term Load Forecasting in Protection System of Smart Grid

Abstract

1. Introduction

2. Method

2.1. Overall Framework

2.2. Feature Extraction

2.3. Base Structure

2.4. Deep Residual Network

2.5. Ensemble Structure

3. Results

3.1. Test Settings

3.2. Results of the North American Dataset

3.3. Results of the New England Dataset

3.4. Results of the Malaysian Dataset

3.5. Results of the Panama Dataset

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI