Regression-Based Methods for Daily Peak Load Forecasting in South Korea

Geun-Cheol Lee

doi:10.3390/su14073984

College of Business Administration, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05029, Korea

Sustainability2022, 14(7), 3984;https://doi.org/10.3390/su14073984

Version Notes

Order Reprints

Review Reports

Abstract

This study examines the daily peak load forecasting problem in South Korea. This problem has become increasingly important due to the continually changing energy environment. As such, it has been studied by many researchers over the decades. South Korea is geographically located such that it experiences four distinct seasons. Seasonal changes are among the main factors affecting electricity demand. In addition, much of the electricity consumption in a strong manufacturing country like South Korea is driven by industry rather than by residential customers. In order to forecast daily peak loads of South Korea, in this study we proposed multiple linear regression-based methods where several season-specific regression models (i.e., summer, winter, and all-season models) were included. The most appropriate model among the three models was selected considering the characteristics of the electricity demand, and was then applied to daily forecasting. The performance of the proposed methods were evaluated through computational experiments. Forecasts obtained by the proposed methods were compared with those obtained by existing forecasting methods, including a machine learning method. The results showed that the proposed methods had mean absolute percentage errors around 1.95% and outperformed all benchmarks.

Keywords:

daily peak load forecasting; regression; interaction effects; machine learning

1. Introduction

South Korea is the first former aid recipient to become a member of the Development Assistance Committee (DAC) of the Organization for Economic Cooperation and Development (OECD) [1]. The nation’s economic growth has primarily been achieved through exports based on strong manufacturing industries in areas including electronics, semiconductors, and petrochemicals [2]. Consistent generation of reliable and quality electric power has played an important role in developing and maintaining the country’s strong manufacturing capabilities. Figure 1 shows the 40-year trends of GDP (Gross Domestic Product) at purchasing power parities (unit: billion USD) and electricity generation (unit: billion kWh) in South Korea [3]. As can be seen from the figure, the increase in GDP is almost synchronized with the expansion of electricity generation. Furthermore, the World Bank has recently ranked South Korea in fifth place for Ease of Doing Business rankings [4]. Among the ten sub-categories in the ranking, “Getting Electricity” was the area in which the country performed the best. These results show that an appropriate supply of electricity is one of the key factors affecting the success of a nation’s economy.

Figure 1. Forty years of GDP and electricity generation trends in South Korea [3].

As power generation and consumption occur simultaneously, accurate electric power demand forecasting (known as load forecasting) is considered an essential activity in power system operations. Based on load forecasting results, electricity can be generated to match the load amount to be consumed. It is very important to accurately forecast electrical loads because underestimations can lead to blackouts that either partially or fully paralyze the power system, while overestimations can lead to economic losses associated with the generation of excess power that is wasted. Due to its importance, many studies have examined load forecasting [5,6,7].

In this study, we consider the daily peak load forecasting problem in South Korea. The nation’s overall electricity demand needs to be predicted every day. In 2019, peak loads of South Korea ranged from about 50,000 MW to 90,000 MW, meaning that a certain mix of the nation’s power generators produced that amount of electricity for the corresponding time. Considering that peak load generating units are generally natural gas or petroleum-fueled generators, the last part of the power generation would be expensive. In South Korea, the average unit generation cost using natural gas was about USD 100/MWh in 2019 (http://epsis.kpx.or.kr/, accessed on 4 March 2022). If we overestimate the peak load and over-generate the power by about 1% more than the actual electricity demand, it will additionally cost up to about USD 2 million per day. Moreover, costs due to forecasting errors will continue to increase, as the amount of power generation and the energy cost will continue to increase in the future. Therefore, a large economic benefit can be obtained using a forecasting method that can reduce deviations between actual electricity demands and the forecasts.

To enhance the accuracy of forecasting, we propose regression-based forecasting methods in this study. Although regression is a rather old-fashioned forecasting method, it can be a powerful tool if appropriate and sufficient independent variables are used. In this study, a comprehensive analysis of short-term electricity demand in South Korea is conducted to obtain such independent variables. To identify proper mathematical expressions of regression models with the obtained independent variables, symbolic regression algorithms can be used in general [8,9]. In this study, we reveal several interaction effects between the independent variables and use them when constructing the regression models instead of using symbolic regression algorithms. Furthermore, the proposed methods have procedures where different forecasting models are applied depending on the season of the forecast day. Due to this seasonally adaptive characteristic, the proposed methods are expected to show excellent forecasting performance. To evaluate the performance of the proposed methods, computational experiments were be carried out.

The rest of this paper is organized as follows. In the next section, Section 2, a literature review regarding daily peak load forecasting in South Korea is summarized. Section 3 explores various factors that can affect the daily electricity demand in South Korea. Those factors are the independent variables used in forecasting models. Section 4 introduces the proposed forecasting methods and previous regression models used for daily peak load forecasting. Section 5 introduces benchmarks, including a machine learning method, and presents the results of a comparison between the proposed methods and the benchmarks. The last section concludes the paper with a discussion and future research suggestions.

2. Literature Review

Load forecasting is commonly classified into four categories based on the timeframe of the prediction [6,10,11]: (1) Very Short-Term (less than an hour), (2) Short-Term (one hour to several days), (3) Mid-Term (one month to a season), and (4) Long-Term (a year or more). However, most previous studies have considered predictions for one hour, one day, and one year, that is, short-term and long-term load forecasting have been the main requirements of the industry [6]. Among these time frames, this study considers the Short-Term Load Forecasting (STLF) problem of South Korea, which is used to obtain the electricity demand for operating power systems. Accurate STLF can enable proper allocation of generation resources, with secure and reliable use of power plants and economic dispatch of the power system [10]. In this study, the nationwide daily peak load is the specific target to be predicted. The peak load of a certain day is determined as the maximum hourly load among the 24-h loads of that day. The daily peak load is known to be a nonlinear, nonstationary, and volatile time series, meaning that it is not easy to accurately forecast [12]. As it is critical to maintaining the peak load under the country’s power capacity, Korea Power Exchange (KPX), a public company in South Korea, continually monitors nationwide peak load in real-time and records daily peak loads, which are published on its website [13].

Another criterion classifying the load forecasting problem into two groups (one based on statistical techniques and the other based on artificial intelligence (AI) techniques) is by the technique used for forecasting [7,14]. Due to the importance of the daily load forecasting problem, there have been many studies examining daily load forecasting in South Korea. Several recent studies are summarized here. First, studies have used statistical techniques. ARIMA-based time series models have been used to tackle the forecasting problem. Basic time series models have been modified and enhanced into models such as Reg-ARIMA, TBATS, and AR-GARCH for daily load forecasting [15,16,17]. Regression-based statistical methods have recently been proposed by researchers for tackling forecasting problems [18,19,20,21,22]. These traditional methods have robust and explainable performance. Although statistical technique-based forecasting methods continue to be investigated, AI-based techniques are dominant. For example, Artificial Neural Networks (ANNs) have been used to easily solve daily peak load forecasting problem in South Korea [15,23,24]. Enhanced neural network techniques such as those based on Radial Basis Function (RBF) Networks, Deep Neural Networks (DNNs), LSTM (Long Short-Term Memory), and Polynomial Neural Networks (PNNs) have recently come into the spotlight [14,24,25,26,27,28]. Table 1 summarizes existing studies on daily load forecasting problem according to the techniques used.

Table 1. Summary of forecasting studies on South Korea’s daily electric load.

3. Components Affecting Daily Peak Load

Various components, including trend, temperature, and calendar variables, have been used in previous STLF models [6,7,29]. In the present study, we explore the characteristics of South Korea’s daily peak load while focusing on well-known components of electricity demand used in many existing STLF studies as well as specific features of the considered electricity demand. In this section, trend, seasonal, casual, and interaction effects are considered for the problem. Detailed features of each effect are presented in the following sections. To analyze the characteristics of electricity demand, we used nine years (2010–2018) of data on South Korea’s daily peak loads available from the Electric Power Statistics Information System (http://epsis.kpx.or.kr/).

3.1. Trend Effect

As shown in Figure 1, electricity demand has increased substantially for decades as the country’s economy has grown. Is this trend of increasing demand still valid for STLF? We examine the demand trend in detail by zooming in on the electrical load. Figure 2 shows nine years of daily peak load trends in South Korea. Although there were fluctuations due to seasonal factors within a given year, it can be seen that the demand has steadily increased over the nine years. To more clearly show the trend effect of the demand, we included a dotted red straight line in the figure determined by a simple linear regression equation, y_t = 3.69i_t − 88,647, where y_t is the peak load of day t and i_t is the index number of day t. The slope value of the equation reveals that the daily peak load increased by an average of 3.69 MW per day, which amounted to about 1347 MW per year. Even if the daily peak load forecasting is a kind of STLF, we need to use demand data from one year before the day to be forecasted. In such a case, the gap of over 1000 MW between the peak load of the forecast day and that of a year before the forecast day needs to be corrected by adding the trend effect into forecasting models. Note that peak loads of days that were two or three years before the forecast day could be used in the sample for forecasting, which can make the trend effect more essential. As shown in Figure 2, the trend is reasonably linear. In this case, the linear trend can be implemented in the model by entering integer values into day index variables. For the sample used in Figure 2, i₁ (the index number of 1 January 2010) was 1, i₂ (the index number of 2 January 2010) was 2, etc. The index number of the last day (31 December 2018) of the sample was 3287 (=365 × 7 + 366 × 2).

Figure 2. Daily peak load (blue line) and linear trend (dotted red line) in South Korea from 2010 to 2018.

3.2. Cyclic Effect

Repetitive events affecting the electricity demand within certain periods are considered to represent a cyclic effect. Most repetitive events are related to the season, meaning that such events are already known when forecasting. For example, for each day to be forecasted, the calendar tells us which day of the week it is, which month or season it is in, and whether or not it is a holiday. In this paper, we categorized the cyclic effect into three types: seasonal effect, day of the week effect, and special day effect. We will present each type in the following subsections.

3.2.1. Seasonal Effect

South Korea has four distinct seasons: Spring, Summer, Fall, and Winter. Daily peak loads differ according to the season. Figure 3 shows average daily peak loads according to each season for the nine years studied here. As can be seen from the figure, each year has the same pattern wherein the average daily peak loads of Summer and Winter are apparently larger than those of Spring and Fall. This is a common phenomenon in any country having Summer and Winter seasons, as cooling and heating demands respectively surge in these seasons. This common phenomenon is related to the occurrence of the highest daily peak load in a year. Therefore, in practice, daily peak loads tend to be managed in the Summer and Winter seasons only. Many previous STLF studies have only focused on the Summer or Winter season [19,20,24,27,30,31,32,33].

Figure 3. Average daily peak loads during four seasons in South Korea from 2010 to 2018.

The seasonal effect can appear in different forms other than the four seasons. If daily peak loads vary across shorter periods, years may need to be divided into more than four seasons. For example, we could divide one year into twelve periods, with each period representing one month. Further, one year might be divided into six or twenty-four periods, depending on the degree of variation of the seasonal effect. To indicate the season to which the forecast day belongs, forecasting models can use a categorical (i.e., qualitative) variable. The number of classes in each categorical variable depends on how many seasons the categorical variable represents. For example, with the traditional four seasons, a categorical variable with four classes (i.e., Spring, Summer, Fall, Winter) can be used. Such variables are called ‘calendar variables’ [29].

3.2.2. Day of Week

Another key cycle effect of daily electricity demand is the day of the week, that is, the same day of the week repeats every seven days. Figure 4 shows the average daily peak loads according to the day of the week over the nine years considered. It can be seen that electricity demands on weekends are lower than those on weekdays. Because the majority (about 56%) of the electricity power of South Korea is consumed by industrial customers [11], the electricity demand changes by days on which most workers work versus days on which they do not work. Furthermore, even within the same weekend, Sunday has a lower electricity demand than Saturday. On the other hand, there are no significant differences between electricity demands on different weekdays. None of the five weekdays showed consistent maximum or minimum electric loads across years. Therefore, this cyclic effect can be represented using three types of categorical variables with different numbers of classes: two classes (Weekdays, Weekends), three classes (Weekdays, Saturday, Sunday), and seven classes (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday). Note that we forecast nationwide electricity demand, not just industrial electricity demand, even if the effect of the day of the week mainly comes from the nation’s industrial characteristics.

Figure 4. Average daily peak loads by the day of the week in South Korea from 2010 to 2018.

3.2.3. Special Days

Days when most workplaces are closed other than weekends are often considered to be special days. Therefore, as would be expected from the description above, electricity demands on special days tend to be lower than on other days [14,15,19,20,34,35]. A typical kind of special day is public holidays, which include New Year’s Day, Memorial Day, and Liberation Day. Among them, Lunar New Year’s Day (Seollal) and Mid-Autumn Festival Day (Chuseok) are the two most celebrated national holidays in South Korea. The days before and after Seollal and Chuseok are included as holidays as well. In South Korea, public holidays are set on dates such that they recur every year. Irregularly, a couple of days are designated as temporary or substitute holidays by the Korean government for certain purposes. Even if such holidays are irregular, it is announced in advance whether the forecast day will be a holiday, meaning that it is known information for daily forecasting.

The second type of special day is election days. In South Korea, major election days such as presidential election day and parliamentary election day are designated as legal holidays, which can lead to a reduction in daily electric load. Election cycles are typically four or five years. Figure 5 shows the average daily peak loads of three types of weekdays over nine years. It can be seen from the figure that the daily peak loads of the special days, i.e., holidays and election days, are lower than those of regular weekdays. For a fair comparison, only weekdays that are not special days are considered as regular weekdays. Furthermore, only special days on weekdays are considered in this comparison. As shown in the figure, the special day effect can be easily recognized, as the average electric load on special days is about 80% of that on regular weekdays. Weekends and special days showed similar levels of electric load. To implement this special day effect in the forecasting model, we can use a categorical variable with two classes (i.e., a binary variable).

Figure 5. Average daily peak loads on regular weekdays (green bar) and special days (blue bars) in South Korea from 2010 to 2018.

3.3. Quantitative Causal Effect

In a forecasting model, independent variables other than time can be considered as causal effects of demand. Strictly speaking, all cyclic effects mentioned above are qualitative causal effects. In this section, we will introduce two types of causal effect: weather information and autocorrelation. They are presented as continuous variables rather than categorical variables.

3.3.1. Weather

Weather change in a country is a well-known component that affects electric load. However, the weather change effect already seems to be reflected within the seasonal effect. While a seasonal effect is implemented using categorical variables, a weather effect is implemented with quantitative and continuous variables. Among the various types of weather information, the most essential for load forecasting is the temperature of the target area. Figure 6 shows a scatter plot of the relationship between the daily peak load and mean temperature in 2018; the figure uses the mean temperature of the Seoul area, and other areas might have temperatures different from those of Seoul. About half of the nation’s population lives around the Seoul Metropolitan area. Temperature differences are not very large considering South Korea’s geographic size. All weather information used in this paper can be obtained from the website (https://data.kma.go.kr/, accessed on 5 February 2022) managed by the Korea Meteorological Administration. As shown in the figure, there are two V-shaped relationships between peak load and temperature. If the temperature is greater than about 15 °C, peak load increases as the temperature increases (red and green shaded areas). Meanwhile, if the temperature is less than about 15 °C, peak load increases as the temperature decreases (blue and yellow areas). Among direct/inverse proportional relationships the red/blue area is mostly plotted with data from regular weekdays, while the green/yellow area is mostly plotted with data from weekends or special days. Although we only used temperature data for the Seoul area, their impact on the electric load of the country could be clearly identified.

Figure 6. Relationship between mean temperature and daily peak load of South Korea in 2018. The red and green areas show direct relationship between temperature and peak load while the blue and yellow areas show inverse relationship between them. The plus signs plotted on the red and blue areas are mostly from regular weekdays while the other plus signs plotted on the green and yellow areas are mostly from weekends or special days.

Previous load forecasting studies have used weather information other than the mean temperature of the forecast day, including minimum and maximum temperatures, precipitation accumulation, wind speed, humidity, and vapor pressure. However, the temperature is the weather factor that primarily affects daily electric load. Moreover, for practical application, when any weather information for the forecast day is used, it is necessary to forecast the weather information of the forecast day. In this study, we assume that the weather information of the forecast day is known. Although this is a future datum at the time of the prediction, forecasting tomorrow’s weather information is not regarded as difficult. In particular, temperature forecasts are known to be quite reliable in the short term [7]. Thus, only considering temperature is often sufficient for daily load forecasting.

3.3.2. Autocorrelation

Today’s peak load is not much different from yesterday’s peak load, because the domestic population and industry size do not change overnight. Demand with such characteristics is known as autocorrelation. Daily peak load is a typical demand for which the autocorrelation effect needs to be considered in any forecasting. Thus, previous research using ARIMA-based approaches such as SARIMA, Reg-ARIMA, and AR-GARCH has focused on autocorrelation effects when forecasting the daily peak loads [15,16,17]. When the peak load of a day is predicted, it is assumed that the decision is made at midnight on the corresponding day. Thus, past data such as peak loads one day and seven days before the day to be forecasted can be known and prepared before forecasting. Figure 7 shows the autocorrelation values of daily peak loads according to lag values over nine years. As shown in the figure, the daily peak load of a day is highly correlated with that of the prior day. The autocorrelation value with lag 1 (r₁) is slightly over 0.7, meaning that the correlation coefficient value between today’s peak load and yesterday’s peak load is slightly over 0.7. Today’s peak load is correlated with yesterday’s peak load, and highly correlated with peak loads from one week and two weeks prior. As shown in the figure, autocorrelation values with lag 7 and lag 14 (r₇ and r₁₄) are around 0.8, which can serve as a further proof of the day of the week effect on the daily peak load. Considering these autocorrelation effects, forecasting models must include several previous peak loads in the form of quantitative variables. Specifically, it is necessary to include yesterday’s peak load and the peak load of the week before the forecast day. It may be necessary to include the peak loads for two and even three weeks before the forecast day as input data.

Figure 7. Autocorrelation values of daily peak load data from 2010 to 2018.

3.4. Interaction Effect

Thus far, we have presented several main components affecting daily peak loads in South Korea. Each of these components can be implemented as a single independent variable in the forecasting model as either a categorical or continuous variable. Completely new independent variables are not introduced in this section. Instead, combined effects are shown as pairs of previously introduced variables. Interaction effects, that is, when one independent variable depends on other independent variables, need to be included in forecasting models [29]. Such effects can be seen in the characteristics of daily peak loads. Figure 8 shows changes in the daily peak loads as mean temperature changes (data for weekdays from 2016 to 2018 are used). As shown in the figure, the relationship between the peak load and the temperature differs according to the season. For example, the pattern of the summer mark (+) shows that the peak load increased as the temperature increased in summer, while the pattern of the winter mark (×) shows that the peak load increased as the temperature decreased in winter. One independent variable (i.e., temperature) clearly depended on another independent variable (i.e., season), which is one instance of interaction effects on daily load forecasting. By multiplying these two variables, the interaction effect can be implemented in the forecasting model.

Figure 8. Relationship between peak load and mean temperature across the four seasons from 2016 to 2018.

Another interaction effect between autocorrelation and day of the week effects can be found in Figure 9. The figure presents three scatter plots showing the relationships between peak loads of a certain day t and the day before day t (i.e., day t−1) when the day of the week of day t is Monday, Friday, or Sunday, respectively (for the graph, we used peak load data from 2010 to 2018). As one can see from the figure, patterns of autocorrelation with lag 1 differed according to the day of the week of day t. We omitted Tuesday, Wednesday, and Thursday patterns because they were similar to the Monday pattern. The Saturday pattern was omitted for the same reason, as it was similar to the Friday pattern. From this analysis, when the autocorrelation with lag 1 is considered it can be seen that forecasting models need to include an interaction term implemented by multiplying autocorrelation and day of week variables.

Figure 9. Relationship between the peak loads of two adjacent days by different days of the week.

While several more interaction effects can be considered, the third instance of the interaction effect will be the last one introduced in this section. As mentioned previously, the special day effect typically decreases the electric load. Again, the day of the week is another factor that can influence the degree of the special day effect. Figure 10 shows differences in the amount of decrease in the peak load depending on whether special days are on weekdays or weekends (for the graph, we used peak load data from 2010 to 2018). The average peak load of special days falling on weekdays was about 83% of that of regular days, whereas the average peak load of special days falling on weekends was about 93% of that of regular days. Based on this figure, it is reasonable to include an interaction term in the forecasting model which consists of nonadditive terms, multiplying the special day effect variable and the day of the week variable or the special day effect variable and a binary variable representing a weekday or a weekend.

Figure 10. Average peak loads of regular days and special days when the special days fall on weekdays and weekends.

4. Regression-Based Forecasting Methods

Based on earlier analysis of daily peak loads in South Korea, we propose forecasting methods that could accommodate all effects presented for better prediction. This study introduces a regression-based method, which is a rather traditional way of forecasting with robust and superior performance if features (i.e., independent variables) of daily peak loads can be adequately included in the model. The proposed methods are devised with combinations of existing regression-based forecasting methods, which are classified according to their seasonal specialties. In the next subsection, these existing methods are introduced, after which the proposed methods are presented in Section 4.2.

4.1. Existing Methods

In this section, three existing regression-based forecasting methods are presented. Multiple linear regression models are used in existing methods. We named these regression models all-season, summer, and winter models. While the all-season model can be used regardless of the season, summer and winter models are specialized for summer and winter season forecasting, respectively. Each method is introduced in the following sections.

4.1.1. All-Season Model

The regression model for all seasons includes various effects such as trend, day of the week, autocorrelation, interaction, etc., presented in Section 3. In the model, y_t is the target variable to be forecasted (i.e., the peak load of day t). A total of twelve independent variables are included in the model. The exact equation of the model is as follows:

y_{t} {= β}_{1} y_{t - 1} {+ β}_{2} y_{t - 7} {+ β}_{3} y_{t - 14} {+ β}_{4} w_{t} {+ β}_{5} x_{t} {+ β}_{6} i_{t} {+ β}_{7} m_{t} {+ β}_{8} s_{t} {+ β}_{9} y_{t - 1} x_{t} {+ β}_{10} m_{t} w_{t} {+ β}_{11} s_{t} x_{t} {+ β}_{12} w_{t} x_{t} {+ β}_{0} + ε

(1)

where y_t_−k is the peak load of day 𝑡−k, 𝑤_t is the mean temperature of day 𝑡, 𝑥_𝑡 is the day of the week of day 𝑡, 𝑖_𝑡 is the index number of day 𝑡, 𝑚_𝑡 is the month to which day 𝑡 belongs, s_𝑡 is 1 if day 𝑡 is a special day and 0 otherwise, β₀ … β₁₂ are parameters, and β is the error term of the model.

The autocorrelation effect is considered in the model by including terms, y_t₋₁, y_t₋₇, and y_t₋₁₄. Among weather information, the mean temperature of the forecast day is considered (w_t). The day of the week effect is covered by the terms x_t as well as y_t₋₇ and y_t₋₁₄. Note that x_t is a categorical variable whose elements are Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday. To consider the trend effect, an index variable, i_t, is used. If the number of samples used to estimate parameters is N, the values of i_t_−N, i_t_−N+1, …, i_t₋₁ are set to be 1, 2, …, N, respectively. The month to which the forecast day belongs is identified by m_t, which is another categorical variable with twelve elements: January, February, …, December. The final additive variable is s_t, which is a binary variable for whether or not the forecast day is a special day.

The rest of the independent terms in the model are related to the interaction effect. The model includes four types of interaction effect. The first (y_t₋₁∙x_t) explains that the autocorrelation with lag 1 varies depending on the day of the week, which is illustrated in Figure 9. The second (m_t∙w_t) shows that the temperature effect can be different across the month, which is similar to the interaction effect between the four seasons and temperature (Figure 8) except that it considers a narrower time span. The third (s_t∙x_t) considers differential impacts of special days according to the day of the week, which is similar to the interaction effect between special days and weekends (Figure 10). Lastly, the model considers that the temperature effect can vary according to the day of the week (w_t∙x_t). For more details about the model, including ANOVA results for the regression model, please see Lee [18].

Lee [18] has proposed the all-season model and insisted that these interaction terms can significantly improve the forecasting performance. Among the four interaction effects considered, the interaction term m_t∙w_t contributed the most to the performance improvement. For a country like South Korea which has four distinct seasons, the impact of temperature effect on electric load needs to be considered depending on which month the load is predicted.

4.1.2. Summer Model

Han and Lee [19] have proposed a regression-based method that is specially devised for daily load forecasting for the summer season in South Korea. The summer model includes the same independent variables as in the all-season model as well as several unique independent variables, such as weather information other than mean temperature as well as different seasonal and interaction effects. In the summer model, fourteen independent variables are used for the dependent variable y_t. The equation of the model is shown as follows:

\begin{matrix} y_{t} {= β}_{1} y_{t - 1} {+ β}_{2} y_{t - 7} {+ β}_{3} w_{t} {+ β}_{4} w_{t}^{C} {+ β}_{5} p_{t} {+ β}_{6} x_{t} {+ β}_{7} i_{t} {+ β}_{8} o_{t} {+ β}_{9} s_{t} {+ β}_{10} y_{t - 1} x_{t} {+ β}_{11} o_{t} w_{t} {+ β}_{12} s_{t} x_{t} \\ {+ β}_{13} w_{t} x_{t} {+ β}_{14} w_{t} i_{t} {+ β}_{0} + ε \end{matrix}

(2)

where

w_{t}^{C} = \sum_{i = 1}^{6} w_{t - i}

, p_t is the vapor pressure of day t, o_t is the season to which day t belongs, and the rest of the notations are the same as those in the all-season model.

The detailed explanation here of the terms in this model is focused on terms not appearing in model (1). The summer model includes both mean and cumulative temperature (

w_{t}^{C}

), obtained by summing temperatures of six consecutive days before the forecast day considering the recency effect as presented by Wang et al. [36]. Vapor pressure of the forecast day was considered in addition to the temperature. To cover the seasonal effect, a categorical variable, o_t, was included to the model. Its elements were Spring, Summer, Fall, and Winter.

Among interaction effects, o_t∙w_t is the same as m_t∙w_t in the all-season model except for the fact that the seasonal time span becomes wider. The interaction effect (w_t∙i_t) between the trend and the temperature is the only interaction effect that is completely new in the summer model. This interaction explains the way that the impact of temperature changes over time. Specifically, economic development in South Korea has led to a nationwide spread of air conditioners and cooling systems, which has led to a recent increase in electricity demand in summer [37]. For more details about the model, including ANOVA results for the regression model, please see Han and Lee [19].

4.1.3. Winter Model

The third and last existing regression-based forecasting method proposed by Lee and Han [20] is a regression model for the winter season, which is called the winter model. The winter model has similar independent variables to the all-season and summer models. While the winter model has the most independent variables, there are only a few completely new features. The equation of the winter model is as follows:

\begin{matrix} y_{t} {= β}_{1} y_{t - 1} {+ β}_{2} y_{t}^{C} {+ β}_{3} w_{t} {+ β}_{4} c_{t} {+ β}_{5} x_{t} {+ β}_{6} i_{t} {+ β}_{7} o_{t} {+ β}_{8} u_{t} {+ β}_{9} s_{t} {+ β}_{10} y_{t - 1} x_{t} {+ β}_{11} y_{t}^{C} o_{t} {+ β}_{12} s_{t} x_{t} \\ {+ β}_{13} s_{t} u_{t} {+ β}_{14} s_{t} y_{t - 1} {+ β}_{15} w_{t} x_{t} {+ β}_{16} w_{t} o_{t} {+ β}_{0} + ε \end{matrix}

(3)

where

y_{t}^{C} = y_{t - 7} + y_{t - 14} + y_{t - 21}

, c_t is the highest temperature of day t, u_t is the two-month period to which day t belongs, and the rest of the notations are the same as in the all-season and summer models.

In the winter model, autocorrelation with lags 7, 14, and 21 is reflected by the term

y_{t}^{C}

, which also explains the day of the week effect. Aside from mean temperature, the highest temperature of the forecast day (c_t) is considered as an additional point of weather information in the model. Lastly, this model considers the seasonal effect using the four-season variable (o_t), as in the summer model, as well as a six-season variable (u_t). Hence, u_t is a categorical variable the elements of which are January/February, March/April, …, and November/December, i.e., each element represents a separate and consecutive two-month period. Note that in existing regression models seasonal effects are implemented with several categorical variables, such as m_t, o_t, and u_t, which differ based on how many seasons a year is divided into.

Although the winter model includes seven interaction terms, most are the same as or similar to those in the all-season and summer models. The characteristic interaction terms in the winter model are related to the special day effect; the terms are s_t∙u_t and s_t∙y_t₋₁. One of the biggest holidays in South Korea is Lunar New Year Day, which is called Seollal. Because the days before and after Seollal are designated as public holidays, Seollal typically has three to five days of holiday. Such a long holiday period significantly affects the electricity demand, which results in many terms regarding special days in the winter model. With s_t∙u_t, the impact of the special day effect is counted differently according to the season in which the corresponding special day occurs. Further, the autocorrelation effect is differentially covered by the term s_t∙y_t₋₁ in terms of whether or not a day is a special day. For more details about the model, including ANOVA results for the regression model, please see Lee and Han [20].

4.2. Proposed Methods

Three different regression models have been presented in the previous subsection. In the proposed forecasting methods, the most appropriate model among these regression models can be selected for a given season. In this way, we can expect that the proposed methods should outperform the existing methods described above. Now, we need to determine which model should be selected for a certain day. In this study, we suggest two approaches, i.e., static and dynamic approaches, for selecting the appropriate model for a given time. In the static approach, the model is pre-selected for each period of the given time set, while the proper model is selected immediately before each daily forecasting. The details of each approach are presented in the following sections.

4.2.1. Static Approach

In the proposed forecasting methods, one of the three existing regression models is chosen if the regression model is found to be the most suitable one for the corresponding time. Because different models are applied depending on seasons, the proposed methods can be considered to be seasonally adaptive methods. To be seasonally adaptive, we need to determine which model should be used for each season. To this end, the period of each season needs to be set. In this study, a month was selected as the period for each season, that is, different regression models can be applied every month, and just one regression model is used each month. Now, we need to select the model to be used at each month. Such decisions are made at the beginning of every year in the static approach.

Let

π (M)

be a parameter indicating which regression model is used at month M, where M is the set of all months of the year, i.e., M

∋

{January, February, …, December}.

π (M)

can have one of three values, A, S, and W, representing the all-season, summer, and winter regression models, respectively. For example, if

π (D e c e m b e r) = W

, it means that every day in December, the winter model is used for daily peak load forecasting. In the static way of selecting a model, twelve values of the parameter

π (M)

must be determined at the beginning of every year. Daily forecasting is executed according to the pre-determined

π (M)

during the year.

Consequently, the performance of the proposed methods largely depends on the value of

π (M)

. That is, we need to determine which of the three regression models would perform the best for each month. In this study, we introduce two criteria to determine the values of

π (M)

based on past performances of regression models. The first criterion is the minimum mean forecast error. For example, to obtain the value of

π (J a n u a r y)

, we retroactively forecast all days in a selection of past January months using the three existing regression models. We then select the model that provides the least mean forecast error value. A specific measure of the forecast error will be described in the next section. The second criterion is the largest number of best forecasts. Similar to the first criterion, we retroactively forecast all days in a selection of certain past months with regression models and count the number of days for which each model performs the best in terms of a specific forecast error measure. The model with the largest number of best forecasts is then set to the value of

π (M)

for month M. Once the values of

π (M)

for all months are determined using one of the above two criteria, a forecast of day t can be estimated using the following procedure:

{\hat{y}}_{t} \leftarrow f o r e c a s t (π (m_{t}), t)

where forecast(R, t) is the function that returns the peak load forecast value of day t using regression model R.

4.2.2. Dynamic Approach

In the static approach, the regression model to be used on a certain day of each month is pre-determined. However, in the dynamic approach, the model to be used for each month is not known in advance; rather, a decision is made daily, immediately prior to the actual forecasting. To select the best model for the forecast day, we look at the most recent performance of the three models. In this study, for each model we retroactively forecast the latest seven days and calculate the sum of the absolute forecast error rates of the seven days. Then, the model with the least sum is selected and used for the actual forecasting. With the dynamic approach, the forecast of day t,

{\hat{y}}_{t}

, can be obtained using the following procedure:

Π \leftarrow \underset{R \in \{A, S, W\}}{a r g m i n} \{\sum_{i = 1}^{7} (\frac{|y_{t - i} - f o r e c a s t (R, t - i)|}{y_{t - i}})\}

{\hat{y}}_{t} \leftarrow f o r e c a s t (Π, t)

where

y_{t}

is the actual value of the peak load of day t and forecast(R, t) is the function that returns the peak load forecast value of day t using regression model R.

Compared to the static approach, when deciding which model to use the dynamic approach investigates a relatively small amount of data, i.e., only information for a week. The dynamic approach investigates very recent data (at most seven days away from the forecast day), while the static approach investigates relatively older information (at least a year away from the forecast day). For example, when we forecast the peak load of 1 January 2019, the dynamic approach investigates performance on the seven days, i.e., from 25 December 2018 to 31 December 2018, while the static approach investigates the performance of a selection of past January months, e.g., from 1 January 2018 to 31 January 2018 and from 1 January 2017 to 31 January 2017. There is a trade-off between the two approaches. Which one performs better will be shown using the computational experiments introduced in the next section.

5. Computational Experiments

To validate the performance of the proposed methods, this section describes how computational experiments were conducted to that end. Compared with existing forecasting methods, certain benchmarks are introduced in the following subsection; various test results are then presented in Section 5.2. Before moving on, the performance measure used in this study is introduced. Mean absolute percentage error (MAPE) has been widely used in many forecasting studies. Thus, we used MAPE as the performance measure. MAPE is the mean of APE. An APE value is the relative absolute deviation between the actual demand and the forecasted value. For example, the APE of day t can be obtained from

\frac{|y_{t} - {\hat{y}}_{t}|}{y_{t}} \times 100 %

, where

y_{t}

and

{\hat{y}}_{t}

are the actual value and the forecast of the peak load of day t, respectively. All methods were implemented using R language and tested on a PC with Intel Core i7 CPU and 16 GB of RAM.

5.1. Benchmarks

In this study, we used several existing forecasting methods as benchmarks to evaluate the relative performance of the proposed methods. The three regression-based methods introduced in the paper are methods using all-season, summer, and winter models, which are the main benchmark methods. Because each of these three models showed its superiority over various other previous methods in their studies [18,19,20], the demonstrated superiority of the proposed methods over these three models guarantees that the proposed methods are by far the best methods. In addition, a very naive regression model was tested as a baseline benchmark. This model has independent variables representing very basic features, i.e., trend, autocorrelation, day of the week, and temperature. The equation of the naïve model is shown as follows; notations are the same as those in the above three models:

y_{t} {= β}_{1} y_{t - 1} {+ β}_{2} w_{t} {+ β}_{3} x_{t} {+ β}_{4} i_{t} {+ β}_{0} + ε

(4)

In addition to the regression-based models described above, we added a machine learning (ML) method, XGBoost [38], as another benchmark. XGBoost is known as one of the best-performing ML methods today. For this reason, many recent STLF studies have used XGBoost [39,40,41,42]. XGBoost is a kind of Gradient Boosting Model (GBM), which is a machine learning model that applies the boosting technique to a tree-based model. Specifically, XGBoost has both excellent performance and fast speed due to its parallel computing capability. In summary, this study used a total of five benchmarks: four regression-based methods (naïve, all-season, summer, and winter models) and XGBoost.

5.2. Test Results

In this study, daily forecasting was performed for every day of the year 2019 (365 days) with the benchmarks and the proposed methods. Before this main test, we carried out a preliminary test of the proposed methods using the static approach to determine which regression model should be used for each month, i.e.,

π (M)

. For the preliminary test, the most recent three years before 2019 were predicted daily using the all-season, summer, and winter models, respectively. The best model for each month was then selected according to the two criteria mentioned earlier, the minimum forecast error rate and the largest number of best forecasts. The results of the preliminary test are summarized in Table 2. Most results were as expected, i.e., the summer model performed the best in the summer season, the winter model performed the best in the winter season, and the all-season model performed the best during the rest of the season. Only four outcomes out of twenty-four did not match expectations. Regardless of the expectations, the results of the preliminary test were used in the main test of the proposed methods using the static approach.

Table 2. Preliminary test results of determining

π (M)

for the proposed methods using the static approach.

The results of the main test are summarized in Table 3, which shows all three types of forecasting measures with the five benchmarks and three proposed methods. The values of MAPE, the primary performance measure used in this study, of all methods are listed in the third column of the table. Once 365 daily forecasts in 2019 were obtained using each method, 365 APEs were calculated. The average of these APEs was the MAPE of each method. As shown in Table 3, the benchmarks aside from the naive model showed MAPEs slightly over 2%, while all proposed methods had MAPEs under 2%. Among the proposed methods, the static approach was slightly better than the dynamic approach in terms of MAPE. To show the robustness of the proposed methods, Table 3 presents the standard deviation, 75% percentile, and maximum of APEs for each method in addition to the MAPE. In terms of the standard deviation of APEs, the proposed methods showed smaller variations than the benchmarks. Among the proposed methods, the static approach continued to show better performances than the dynamic approach from the point of view of robustness. In addition, the 75% percentiles of APEs from proposed methods were smaller than those of the benchmarks. Gaps in the 75% percentiles of APEs between the benchmarks and the proposed methods were wider than those of the MAPEs between the two groups. In terms of the maximum APE, the proposed methods were not superior. However, their performance (17.99%) was not far from the best result (16.99%).

Table 3. Several statistics of APEs of daily forecasts in 2019 by eight methods.

The results of the main test listed above show the overall superiority of the proposed methods for forecasting daily peak loads in 2019. To investigate the performance of the proposed methods in further detail, Table 4 summarizes the MAPEs of all tested methods across all months of 2019. In the table, the smallest MAPE in each month is presented as a bold number. Although the proposed methods showed the smallest MAPEs among the tested methods in terms of overall MAPE, they did not show dominant performance month by month. Rather, the winter model and XGBoost had a greater number of months in which they performed better than the proposed methods. However, they showed more variations, such that certain months had MAPEs of more than 3%. Note that the maximum of the monthly MAPEs by the proposed methods was less than 3%.

Table 4. Monthly MAPEs of the tested methods in 2019 (The smallest MAPE in each month is presented as a bold number).

Table 4 presents the forecasting difficulty of each month. Compared to the distinct summer and winter seasons, it seems to be more difficult to forecast the daily peak loads of in-between seasons such as May and September. In May, in particular, all tested methods showed MAPEs of more than 2.50%. The proposed methods had specialties for forecasting summer and winter seasons. However, they did not have such specialties for other seasons such as May or September. In order to enhance the performance of the proposed methods, modules or models for in-between seasons should be embedded.

6. Discussion

South Korea has achieved remarkable economic growth over the past several decades. One of the foundations of such growth has been its adequate and stable electricity generation. The forecasting of load, or electricity demand, is one of the crucial decisions in securely operating the nation’s electric power system. In this study, we proposed methods for the STLF problem of South Korea, with daily peak load being the specific target for prediction. The proposed methods include procedures for properly combining several existing regression-based methods. Considering seasonal and recent performances of existing methods, the proposed methods adaptively choose the method which was determined to be the most appropriate for the corresponding daily forecasting. To validate the performance of the proposed methods, computational experiments were conducted; all 365 days in 2019 were forecasted using both the proposed methods and several benchmarks consisting of existing regression-based methods and XGBoost, a well-known machine learning method. The results of the comparison test showed that the proposed methods outperformed the benchmarks for daily forecasting in terms of MAPE. Although the proposed methods did not show consistently dominant performances in monthly MAPEs, the variations in their errors were smaller than those of the benchmarks. In this study, characteristics of electricity demand were analyzed in depth from various perspectives. The proposed methods were devised to reflect those analyzed results according to the season, which could be said to be the basis for the excellent performance of the proposed methods.

This study has several limitations which provide opportunities for future research. First, the computational experiments revealed a weakness of the proposed methods in that they did not perform the best in certain months, such as May or September. In this study, the analysis of electricity demand was rather biased toward the summer and winter seasons. It is necessary to devise forecasting methods that can address this weakness by analyzing the characteristics of each month or each season. Second, the proposed methods were based on the three existing regression models, which limited their performances. The minimum MAPE from the three existing methods became the lower bound of MAPE attained by the proposed methods due to their structures. In order to break this limitation, new hybrid methods could be developed which embed the best-performing regression-based methods with other contending methods such as XGBoost. Finally, it is necessary to consider recent characteristics of the electricity demand environment. In the computational experiments, when forecasting daily peak loads in 2019 the proposed methods utilized existing methods that performed the best over the past three years (2016–2018). The test results showed that the previously best-performing methods do not always guarantee the current superiority of forecasting. This means that in the meantime, the characteristics of electricity demand have changed to an extent. The environment of electricity demand continues to change due to various causes. Even the speed and amount of change are increasing today. In order to reflect such changes, it is necessary to develop forecasting methods that are suitable for recent daily peak loads.

Funding

This research received no external funding.

Data Availability Statement

Publicly available datasets were analyzed in this study. The dataset of the daily peak load used in this study is available from the Electric Power Statistics Information System (http://epsis.kpx.or.kr, accessed on 10 February 2022). The dataset of the weather information used in this study is available from the Weather Data Service of the Korea Meteorological Administration (https://data.kma.go.kr/, accessed on 10 February 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

The World Bank. Available online: https://www.worldbank.org/en/country/korea/overview#1 (accessed on 13 December 2021).
U.S. Energy Information Administration (EIA). Available online: https://www.eia.gov/international/analysis/country/KOR (accessed on 13 December 2021).
U.S. Energy Information Administration (EIA). Available online: https://www.eia.gov/international/data/country/KOR (accessed on 13 December 2021).
The World Bank. Available online: https://www.doingbusiness.org/en/rankings (accessed on 16 December 2021).
Upadhaya, D.; Thakur, R.; Singh, N.K. A Systematic Review on the Methods of Short Term Load Forecasting. In Proceedings of the 2019 2nd International Conference on Power Energy, Environment and Intelligent Control (PEEIC), Greater Noida, India, 18–19 October 2019; pp. 6–11. [Google Scholar] [CrossRef]
Kuster, C.; Rezgui, Y.; Mourshed, M. Electrical Load Forecasting Models: A Critical Systematic Review. Sustain. Cities Soc. 2017, 35, 257–270. [Google Scholar] [CrossRef]
Hong, T.; Fan, S. Probabilistic Electric Load Forecasting: A Tutorial Review. Int. J. Forecast. 2016, 32, 914–938. [Google Scholar] [CrossRef]
El Hasadi, Y.M.F.; Padding, J.T. Solving Fluid Flow Problems Using Semi-Supervised Symbolic Regression on Sparse Data. AIP Adv. 2019, 9, 115218. [Google Scholar] [CrossRef]
Koza, J.R. Genetic Programming II; MIT Press: Cambridge, MA, USA, 1994; Volume 17. [Google Scholar]
Raza, M.Q.; Khosravi, A. A Review on Artificial Intelligence Based Load Demand Forecasting Techniques for Smart Grid and Buildings. Renew. Sustain. Energy Rev. 2015, 50, 1352–1372. [Google Scholar] [CrossRef]
Son, N. Comparison of the Deep Learning Performance for Short-Term Power Load Forecasting. Sustainability 2021, 13, 12493. [Google Scholar] [CrossRef]
Yu, Z.; Niu, Z.; Tang, W.; Wu, Q. Deep Learning for Daily Peak Load Forecasting–A Novel Gated Recurrent Neural Network Combining Dynamic Time Warping. IEEE Access 2019, 7, 17184–17194. [Google Scholar] [CrossRef]
Korea Power Exchange (KPX), Electric Power Statistics Information System (EPSIS). Available online: http://epsis.kpx.or.kr/epsisnew/selectEkgeEpsMepChart.do?menuId=030100&locale=eng (accessed on 27 December 2021).
Park, J.H.; Shin, D.H.; Kim, C.B. Deep Learning Model for Electric Power Demand Prediction Using Special Day Separation and Prediction Elements Extention. J. Adv. Navig. Technol. 2017, 21, 365–370. [Google Scholar] [CrossRef]
Lee, J.Y.; Kim, S. Forecasting daily peak load by time series model with temperature and special days effect. Korean J. Appl. Stat. 2019, 32, 161–171. [Google Scholar] [CrossRef]
Jung, S.-W.; Kim, S. Electricity Demand Forecasting for Daily Peak Load with Seasonality and Temperature Effects. Korean J. Appl. Stat. 2014, 27, 843–853. [Google Scholar] [CrossRef][Green Version]
Lee, J.-S.; Sohn, H.G.; Kim, S. Daily Peak Load Forecasting for Electricity Demand by Time series Models. Korean J. Appl. Stat. 2013, 26, 349–360. [Google Scholar] [CrossRef]
Lee, G.-C. Regression Based Methods with Interaction Effects for Daily Peak Load Forecasting. J. Manag. Econ. 2020, 42, 77–97. [Google Scholar] [CrossRef]
Han, J.; Lee, G.-C. Forecasting Daily Peak Load in Summer Season. J. Soc. Korea Ind. Syst. Eng. 2020, 46, 25–33. [Google Scholar] [CrossRef]
Lee, G.C.; Han, J. Forecasting the Daily Peak Load of South Korea During the Winter Season: A Case Study on Open Public Data Usage. Korean Oper. Res. Manag. Sci. Soc. 2019, 44, 49–58. [Google Scholar] [CrossRef]
Ryu, J.-Y.; Cha, J.-M.; Lee, B.-R. Evaluation of Weather Information in Forecasting Daily Peak Load of Electricity Demand. J. Korean Inst. Illum. Electr. Install. Eng. 2018, 32, 73–81. [Google Scholar] [CrossRef]
Lee, G.-C.; Han, J. Forecasting Daily Peak Load of Domestic Electricity Demand. J. Ind. Econ. Bus. 2017, 30, 1205–1218. [Google Scholar] [CrossRef]
Bang, Y.-K.; Kim, J.-H.; Lee, C.-H. Daily Peak Electric Load Forecasting Using Neural Network and Fuzzy System. Trans. Korean Inst. Electr. Eng. 2018, 67, 96–102. [Google Scholar] [CrossRef]
Shin, D.-H.; Kim, C.-B. A Study on Deep Learning Input Pattern for Summer Power Demand Prediction. J. Korean Inst. Inf. Technol. 2016, 14, 127. [Google Scholar] [CrossRef]
Jeong, H.-M.; Kim, K.-H.; Park, J.H. Error Correction Algorithm based Radial Basis Function Network for Daily Peak Electric Load Forecasting. Trans. Korean Inst. Electr. Eng. 2019, 68, 221–227. [Google Scholar] [CrossRef]
Hwang, H. Deep Neural Network Model for Short-term Electric Peak Load Forecasting. J. Korea Converg. Soc. 2018, 9, 1–6. [Google Scholar] [CrossRef]
Ahn, J.-Y.; Park, S.-M.; Kim, C.-B. A Study on Neural Network Model for Winter Electric Power Demand Prediction. J. Korean Inst. Inf. Technol. 2017, 15, 1–9. [Google Scholar] [CrossRef]
Yu, J.; Kim, S. Locally-Weighted Polynomial Neural Network for Daily Short-Term Peak Load Forecasting. Int. J. Fuzzy Log. Intell. Syst. 2016, 16, 163–172. [Google Scholar] [CrossRef]
Hong, T.; Wang, P.; Willis, H.L. A Naïve Multiple Linear Regression Benchmark for Short Term Load Forecasting. In Proceedings of the 2011 IEEE Power and Energy Society General Meeting, Detroit, MI, USA, 24–28 July 2011; pp. 1–6. [Google Scholar] [CrossRef]
Jeong, H.C.; Jung, J.; Kang, B.O. Development of ARIMA-based Forecasting Algorithms using Meteorological Indices for Seasonal Peak Load. Trans. Korean Inst. Electr. Eng. 2018, 67, 1257–1264. [Google Scholar]
Cha, J.-M.; Ku, B.-H. A Study on the Summer and Winter Load Forecasting by Using the Characteristics of Temperature Changes in Korean Power System. J. Int. Counc. Electr. Eng. 2014, 4, 293–296. [Google Scholar] [CrossRef]
Kim, K.-H.; Park, R.-J.; Jo, S.-W.; Song, K.-B. 24-Hour Load Forecasting Algorithm Using Artificial Neural Network in Summer Weekdays. J. Korean Inst. Illum. Electr. Install. Eng. 2017, 31, 113–119. [Google Scholar] [CrossRef]
Koo, B.; Kim, H.; Lee, H.; Park, J. Short-term Electric Load Forecasting for Summer Season using Temperature Data. Trans. Korean Inst. Electr. Eng. 2015, 64, 1137–1144. [Google Scholar] [CrossRef][Green Version]
Kim, I.-M.; Ju, L.Y.; Lee, S.; Kim, D. Holiday Effects of Disaggregated Sectoral Demand for Electricity. Korea Energy Econ. Rev. 2016, 15, 99–137. [Google Scholar] [CrossRef]
Oh, S.; Park, S. Development of a Daily Electricity Business Index by using the Electricity Daily Data of the Manufacturing Sector. J. Korean Oper. Res. Manag. Sci. Soc. 2016, 41, 59–74. [Google Scholar] [CrossRef][Green Version]
Wang, P.; Liu, B.; Hong, T. Electric Load Forecasting with Recency Effect: A Big Data Approach. Int. J. Forecast. 2016, 32, 585–597. [Google Scholar] [CrossRef]
Jeong, J.H.; Heo, I. Estimating the Impact of Temperature Change on Electricity Consumption in Seoul. J. Clim. Res. 2015, 10, 193–207. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Aguilar Madrid, E.; Antonio, N. Short-Term Electricity Load Forecasting with Machine Learning. Information 2021, 12, 50. [Google Scholar] [CrossRef]
Oh, J.-Y.; Ham, D.-H.; Lee, Y.-G.; Kim, G. Short-term Load Forecasting Using XGBoost and the Analysis of Hyperparameters. Trans. Korean Inst. Electr. Eng. 2019, 68, 1073–1078. [Google Scholar] [CrossRef]
Wang, Y.; Sun, S.; Chen, X.; Zeng, X.; Kong, Y.; Chen, J.; Guo, Y.; Wang, T. Short-Term Load Forecasting of Industrial Customers Based on SVMD and XGBoost. Int. J. Electr. Power Energy Syst. 2021, 129, 106830. [Google Scholar] [CrossRef]
Zheng, H.; Yuan, J.; Chen, L. Short-Term Load Forecasting Using EMD-LSTM Neural Networks with a Xgboost Algorithm for Feature Importance Evaluation. Energies 2017, 10, 1168. [Google Scholar] [CrossRef]

Figure 1. Forty years of GDP and electricity generation trends in South Korea [3].

Figure 2. Daily peak load (blue line) and linear trend (dotted red line) in South Korea from 2010 to 2018.

Figure 3. Average daily peak loads during four seasons in South Korea from 2010 to 2018.

Figure 4. Average daily peak loads by the day of the week in South Korea from 2010 to 2018.

Figure 5. Average daily peak loads on regular weekdays (green bar) and special days (blue bars) in South Korea from 2010 to 2018.

Figure 6. Relationship between mean temperature and daily peak load of South Korea in 2018. The red and green areas show direct relationship between temperature and peak load while the blue and yellow areas show inverse relationship between them. The plus signs plotted on the red and blue areas are mostly from regular weekdays while the other plus signs plotted on the green and yellow areas are mostly from weekends or special days.

Figure 7. Autocorrelation values of daily peak load data from 2010 to 2018.

Figure 8. Relationship between peak load and mean temperature across the four seasons from 2016 to 2018.

Figure 9. Relationship between the peak loads of two adjacent days by different days of the week.

Figure 10. Average peak loads of regular days and special days when the special days fall on weekdays and weekends.

Table 1. Summary of forecasting studies on South Korea’s daily electric load.

Classification	Authors	Forecasting Techniques
Statistical	Lee and Kim [15]	SARIMA, Reg-ARIMA, TBATS
	Jung and Kim [16]	SARIMA, Reg-ARIMA, SGARCH
	Lee et al. [17]	AR-GARCH, Holt-Winters, Reg-ARIMA
	Lee [18]	Regression
	Han and Lee [19]	Regression
	Lee and Han [20]	Regression
	Ryu et al. [21]	Regression
AI	Park et al. [14]	LSTM, DNN
	Lee and Kim [15]	ANN
	Bang et al. [23]	Neural Network + Fuzzy
	Shin and Kim [24]	ANN, DNN
	Jeong et al. [25]	RBF Network
	Hwang [26]	DNN
	Ahn et al. [27]	DNN
	Yu and Kim [28]	PNN

Table 2. Preliminary test results of determining

π (M)

for the proposed methods using the static approach.

Table 2. Preliminary test results of determining

π (M)

for the proposed methods using the static approach.

	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
1st Criteria ¹	S	W	W	W	A	S	S	S	A	A	A	W
2nd Criteria ²	W	W	A	W	A	S	S	S	A	A	A	W

¹ Minimum MAPE. ² Largest number of best forecasts.

Table 3. Several statistics of APEs of daily forecasts in 2019 by eight methods.

Forecasting Methods		APE (Absolute Percentage Error)
Forecasting Methods		Mean (MAPE)	Standard Deviation	75% Percentile	Max (Max APE)
Benchmarks	Naive model	2.84%	3.80%	3.32%	34.07%
	All-season model	2.11%	2.25%	2.66%	17.69%
	Summer model	2.12%	2.21%	2.80%	19.31%
	Winter model	2.07%	2.06%	2.85%	16.99%
	XGBoost	2.06%	2.32%	2.76%	19.81%
Proposed methods	Static with Criteria 1	1.94%	2.00%	2.51%	17.69%
	Static with Criteria 2	1.93%	1.96%	2.47%	17.69%
	Dynamic	1.96%	2.05%	2.66%	17.69%

Table 4. Monthly MAPEs of the tested methods in 2019 (The smallest MAPE in each month is presented as a bold number).

Methods	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Overall
Naïve model	3.40%	4.25%	1.89%	2.68%	3.27%	2.00%	1.83%	3.23%	3.90%	2.57%	1.78%	3.35%	2.84%
All-season model	1.62%	2.63%	1.83%	2.16%	2.72%	1.90%	2.04%	2.41%	2.40%	2.07%	1.42%	2.13%	2.11%
Summer model	1.62%	2.71%	1.38%	2.42%	2.91%	1.58%	1.61%	2.20%	2.57%	2.63%	1.82%	2.08%	2.12%
Winter model	1.27%	2.15%	1.38%	2.19%	3.02%	1.38%	1.47%	2.43%	3.10%	2.66%	1.97%	1.82%	2.07%
XGBoost	1.92%	2.73%	1.44%	2.05%	2.66%	1.14%	1.69%	3.34%	3.21%	1.35%	1.27%	1.93%	2.06%
Static 1	1.27%	2.15%	1.83%	2.19%	2.72%	1.58%	1.61%	2.20%	2.40%	2.07%	1.42%	1.82%	1.94%
Static 2	1.62%	2.15%	1.38%	2.19%	2.72%	1.58%	1.61%	2.20%	2.40%	2.07%	1.42%	1.82%	1.93%
Dynamic	1.37%	2.39%	1.43%	2.32%	2.95%	1.38%	1.56%	2.15%	2.50%	2.24%	1.40%	1.87%	1.96%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Regression-Based Methods for Daily Peak Load Forecasting in South Korea

Abstract

1. Introduction

2. Literature Review

3. Components Affecting Daily Peak Load

3.1. Trend Effect

3.2. Cyclic Effect

3.2.1. Seasonal Effect

3.2.2. Day of Week

3.2.3. Special Days

3.3. Quantitative Causal Effect

3.3.1. Weather

3.3.2. Autocorrelation

3.4. Interaction Effect

4. Regression-Based Forecasting Methods

4.1. Existing Methods

4.1.1. All-Season Model

4.1.2. Summer Model

4.1.3. Winter Model

4.2. Proposed Methods

4.2.1. Static Approach

4.2.2. Dynamic Approach

5. Computational Experiments

5.1. Benchmarks

5.2. Test Results

6. Discussion

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics