1. Introduction
As electricity demand grows globally, load demand forecasting has become an important factor in many aspects of energy production and delivery. The time horizons for forecasting are classified as short-, medium-, or long-term. Short-term forecasting (STLF) refers to hourly forecasts, medium-term forecasting (MTLF) for a week to a month, and long-term forecasting (LTLF) for over a year [
1]. STLF is mainly used in the operational phase, while LTLF is used in the planning phase. Before information and communication technologies (ICT) and smart grids were developed, forecasting was based primarily on supply-side aggregated data, in top-down formats at large governmental levels. However, owing to recent the development of smart-grid technology, it has become possible to consider end-user demand through a bottom-up approach [
2], which can now be applied to STLF. Thus, these technologies have expanded their roles by undertaking the responsibility of forecasting load demand from energy suppliers to consumers.
Summer and winter temperatures are becoming more extreme with rapid climate change, and demand is increasing because of the operation of energy-intensive devices such as air conditioners and heating appliances. In addition, load demand is increasing in buildings and parking lots, because of the surge in electric vehicle (EV) sales [
3]. Furthermore, Internet traffic is continuously increasing because of the growing global popularity of smartphones and other Internet communication devices. The Internet makes it possible to find information, send emails, share photos and videos, manage bank accounts, as well as enable access to home network devices remotely. This high demand can also be attributed to the process of traffic delivery and data storage [
4].
From a supplier’s point of view, as renewable energy (RE) replaces energy produced from nuclear power, it has become more important to control supply and demand accurately [
5]. However, the energy supply uncertainty has become an issue, because RE increases energy supply variability according to factors such as season, temperature, precipitation, cloud cover, and wind speed. The changing pattern of supply and demand has a direct impact on power production, as well as on relative energy prices, power rate settings and government policies. Accurate STLF is therefore an important foundation for the economic, administrative and policy sectors.
Thus, the combination of technology developments, environmental issues, and energy policies for EV, RE, and ICT have made STLF a critical issue in energy markets. Poor STLF can cause energy loss when the demand is overestimated, and blackouts when underestimated, which directly affects economic issues. Therefore, various STLF methods have been studied in recent decades.
Forecasting methods are classified into statistical and non-statistical methods, according to the underlying technique. Statistical methods generate mathematical equations from existing historical data, to estimate model parameters and produce predictions. These methods include autoregressive integrated moving average (ARIMA) models [
6,
7], Reg-SARIMA-general autoregressive conditional heteroscedasticity (GARCH) models [
8], exponential smoothing methods [
9], time series models for series exhibiting multiple complex seasonality (TBATS) [
10], regression models [
11], support vector machine (SVM) models [
12,
13], fuzzy models [
14,
15], and Kalman filters [
16].
On the other hand, AI-based techniques are known to have high predictive power. They are mainly suitable for nonlinear data because of their nonlinear and nonparametric function characteristics. Many studies using neural network models have been published [
17,
18]. The recent studies are briefly reviewed here.
For example, Elamin and Fukushige [
19] described the SARIMA model with multiple exogenous variables such as temperature, humidity, and monthly, weekly, and hourly dummies. To explain the cross-effects between weather and seasonal factors, combinations of the main effects are considered as interaction variables. Models with interaction terms improved the accuracy of the forecasts.
Sadaei et al. [
20] presented a combined method based on the fuzzy time series (FTS) and convolutional neural networks (CNNs) for STLF. The multivariate time series of load demands and temperatures were converted into multi-channel images, and the accuracy of the FTS-CNN model was higher than others.
Al-Musaylh et al. [
21] compared multiple data-driven models, such as multivariate adaptive regression spline (MARS), support vector regression (SVR) and ARIMA models, in STLF over forecast horizons. The MARS model showed greater accuracy for 0.5 h and 1.0 h forecasting. However, the SVR model performed better in 24 h forecasting.
Yang and Yang [
22] suggested STLF methods for selecting optimal input features (i.e., feature selection, FS) rather than establishing models. Given that the least squares SVM (LSSVM) can solve complex nonlinear problems, a hybrid model combining the auto correlation function FS model and LSSVM regression was applied in STLF.
Singh and Dwivedi [
23] implemented a follow-the-leader scheme with a neural network model for STLF, to overcome the problem of overfitting in traditional neural network models. The proposed algorithm was found to outperform the artificial neural network (ANN) and genetic algorithm (ANN-GA), ANN and Jaya algorithm (ANN-Jaya), ANN and PSO algorithm (ANN-PSO), and back propagation neural network (BPNN) models.
Li et al. [
24] presented a subsampling strategy for the SVR ensemble forecast method, to improve the accuracy and efficiency in computation. Point estimations were computed, along with confidence levels, to overcome the uncertainty of the forecasts.
Shah et al. [
25] attempted to decompose the log demand into deterministic (trend, multiple periodicities) and stochastic parts. To estimate each element from the components of the log transformed data, the autoregressive (AR), non-parametric AR, autoregressive moving average (ARMA), and vector AR (VAR) models were compared. The results showed that the multivariate time series forecasting was superior in accuracy.
Kim et al. [
26] comprehensively compared multiple time series (i.e. SARIMA, ARIMA-GARCH and exponential smoothing) and AI-based (i.e.,ANN) methods for STLF over 1 h to 1 day forecasting horizons. It was shown that the optimal model was the ANN model with external variables for weather and holiday effects over the time horizons.
Muzaffar and Afshari [
27] studied long short-term memory (LSTM) networks, which are a special type of recurrent neural network, and applied them in learning the long-term dependencies in STLF. Global horizontal, direct normal, and diffused horizontal irradiance, as well as temperature, humidity, and wind speed variables, were considered as potential exogenous variables. Only temperature was applied as a dependent variable, in terms of reducing computational costs. It was shown that LSTM outperforms other methods, such as ARMA, SARIMA, and ARMA with exogenous variables.
Zhu et al. [
28] proposed a new weather forecasting technique generated with the dry-bulb temperature profile, relative humidity, and global solar radiation. Then, some of the ranked influential factors were filtered. The final input variables were grouped and applied in an ANN model with back-propagation.
Reddy [
29] proposed a Bat algorithm-based back-propagation approach for STLF, with weather factors such as temperature, humidity, and dew point; the best results were obtained in a case study considering temperature and humidity.
J. Morley et al. [
30] suggested that understanding Internet traffic usage patterns may lead to simulating the electricity load demand area because Internet networks such as mobile, ICT-related devices, and PCs consume electricity. This phenomenon has become more important as network-based infrastructures grow.
Kim [
31] proposed Internet traffic forecasting models using an AR-GARCH error model with seasonal ARIMA models. This motivated our study to build various forecasting models considering Internet traffic data.
As outlined above, some of the common external variables used in these studies include weather and socio-economic variables. As smart grid technology quickly advances, electronic device usage data, as well as non-electronic data, such as meteorological or economic variables, can be easily accessed by region. Many attempts have been made to keep up with the technologies; however, at the time of writing, no clear studies have considered Internet traffic data to forecast load demand. In this study, we have adopted Internet traffic data as an external variable in an ARIMA-based model, and as a dependent variable in a vector AR with exogenous variables (VARX) model. Although the AI-based models are widely used for producing accurate forecast results, it is difficult to discover inference about the variables. Therefore, we demonstrate several representative statistical forecasting methods, and adopt them in a smart grid environment.
The contributions of this paper are presented as follows.
The existing STLF for load demand is limited to considering only predictor variables such as weather, holidays, and weekends. Thus, we present the effectiveness of considering Internet traffic data as a dependent variable in a multivariate time series forecasting method, and also as an external variable in univariate methods.
Moving-window prediction techniques were used in STLF to determine which models are superior in the interval k unit from the basic 15 min to 2 h forecasting, and whether the superior models exhibit robustness through these time horizons.
The remainder of this paper is organized as follows.
Section 2 introduces the models used in this study.
Section 3 describes the data and analysis.
Section 4 presents the performance evaluations.
Section 5 concludes the paper.
4. Performance Evaluations
This section discusses comparisons of the various models performed using mean-absolute-percentage-error (MAPE) and root-mean-square error (RMSE). These evaluation methods are widely used to evaluate model performance, especially for STLF.
MAPE is defined as
where
is the actual value and
is the forecasted demand at time
. The equation of RMSE is given by
Here we also obtained the accuracy results of the Internet traffic from the VAR model, but given that the main purpose of our study is to forecast electricity load demand, we only discuss the results of the power demand.
Table 7 presents the MAPE results in the validation set at k steps ahead. It shows that the VARX model is superior to other models, through all steps. The second-best model was the ARIMA-GARCH model (3), with temperature, special-day, and Internet traffic variables; it showed higher accuracy than the other ARIMA-GARCH models that did not consider Internet traffic values as an input.
Table 8 shows the validation RMSE values; the performance of the VARX and GARCH-based models showed the same patterns as those for the MAPE. However, in the case of comparing the exponential smoothing method to ARIMA model (1), without any predictor variables, the ARIMA model showed better performance than that of the Taylor’s model. That is, it is preferred to fit ARIMA models for univariate datasets.
Figure 3,
Figure 4,
Figure 5 and
Figure 6 show graphical model performances stratified by day type (
Figure 3 and
Figure 4) and quarter-hour (
Figure 5 and
Figure 6) for the MAPE and RMSE for 1 h and 8 h forecasts, respectively. Here, we only compare three representative models: Taylor’s exponential smoothing method, ARIMA-GARCH 3, and VARX models; and we assume four variables were available: temperature, special day, Internet traffic, and Electricity load demand.
Figure 3 represents accuracy plots categorized by day type for 15 min forecasting. Special days were excluded in the day type stratification because there were no holiday seasons in the test set period. The VAR model shows the lowest error regardless of the day type, in terms of MAPE and RMSE. However, forecasts on weekdays were less accurate in ARIMA and VAR models, while the GARCH model shows the opposite.
Figure 4 shows the accuracy plots by day type for 2 h forecasting. It shows similar patterns to that of the 15 min forecasting, but the VAR model show less accuracy in weekday results. If the forecasting horizons are very short (k = 1), then the VAR model should be suggested. However, if the horizons are short (k = 8), the ARIMA-GARCH model is worth consideration.
Figure 5 shows accuracy plots categorized by quarter-hour forecasting. Notably, the
x-axis sequence of 1 to 96 corresponds to 00:15 a.m. to midnight. Forecasts prove less accurate between 9:00 a.m. to 10:45 a.m. (
x = 32–39) when the morning classes begin. Although the GARCH and VAR models show better performances in the afternoon, the ARIMA model shows continuously poor results until night. The VAR model outperforms during general hours, but the accuracies of the GARCH model diminishes again between 07:00 p.m. to 10:30 p.m. (
x = 72–86).
Figure 6 shows the accuracy plots for 2 h forecasting. The ARIMA and GARCH models show similar patterns to the 15 min forecasting. However, the performance of the VAR model is poor as it is best suited to very short-term forecasting. As seen in
Figure 4, the ARIMA-GARCH model provides higher accuracy than the VAR model.
Figure 7 represents the actual values of the day after the national holiday from the validation set to compare the predicted values from each model. The 15 min (k = 1) forecasting does not show much difference in general, but Taylor’s model showed underestimated in terms of level. The 2 h (k = 8) forecasting also shows that Taylor’s model significantly underestimates predicted values above the others. We assume the main reason for this is the fact that Taylor’s model cannot apply the exogenous variables such as a special day.
5. Concluding Remarks
Accurate STLF is a critical issue for decision makers and power generation companies in terms of policy making and development planning. Thus, many attempts have been made to improve the performance of electricity load prediction. This study examined the relevant time series methods for short-term forecasting of electricity load demand through 15 min to 2 h time horizons, in an institutional campus in Seoul. Taylor’s double seasonal exponential smoothing methods, ARIMA-GARCH models, and the VARX model were used for optimization. In this study, these models provided the lowest MAPEs and RMSEs from 15 min (k = 1) to 2 h (k = 8) forecasting.
The results show that the VAR model is superior to the other univariate models through all steps. Taking the indirect variable as another dependent variable, rather than applying it as input values, provided high accuracy as well as the advantage of time efficiency, with a multivariate model. However, caution must be applied when using the VAR model, by checking the series are stationary and if not, a further cointegration test is required. Sometimes the cointegrated relationship shows up in the same variables with longer data sets, with lower frequency. If this is the case, the vector error correction model is considered the appropriate method. It is known that sometimes it shows strong evidence in the relationship between multivariate variables, depending on the length, or time, unit of the datasets.
The second-best model was the ARIMA-GARCH with Internet traffic, temperature and special-day predictors. It demonstrated that Internet traffic data are useful as input values, even in univariate models. The results were not always good when fitting volatilities, with the GARCH term in the ARIMA models through all steps, even though the ARCH effects tests indicated heteroscedasticity in the data. However, the data in this study were appropriate for STLF, by fitting GARCH models including the Internet traffic usage data.
In buildings that do not offer Internet traffic data, it is worth considering finding a potential dependent variable in a multivariate model such as VARX.
The results demonstrated that weather and holiday characteristics have an impact in demand forecasting. However, even if the external variables were appropriate, the accuracy varies, depending on whether the model fits the volatilities in the data. Although the best-fitted model was the VARX model using electricity load demand and Internet traffic data as multiple dependent variables, the other models still offer great insights for considering explanatory factors. In addition, using the VARX model is fast and time-effective.
Further, we discuss the model performances in depth by stratifying day types and quarter-hour of the days, to compare ARIMA, ARIMA-GARCH, and VAR models with exogenous variables. We show that the forecasts degrade over the time horizon, and the VARX model is not universally superior to other models.
In this study, we mainly aimed to compare the performance of the exponential smoothing methods, ARIMA-GARCH models, and VARX models. However, different adaptations of the models, such as SVM models, fuzzy models, and Kalman filters will be examined in future study.
Other future studies may set the goal of building an optimal and customized forecasting model for each single unit/building, according to building size, age, and type of external wall (for smaller units).